Fairness in machine learning systems

The term “fairness” is used frequently in artificial intelligence (AI) and machine learning (ML). “Fairness” is a critical component of most responsible and ethical AI principles. But what does that imply in practice, and what constitutes a “fair” machine learning system? This brief examines “fairness” in general before delving into the default fairness method in machine learning and its accompanying issues. It concludes with tools and considerations for designing, managing, and using machine learning systems.

Understanding the concept of “fairness”

Fairness is a complex idea to grasp. The trait or state of being fair, mainly fair or impartial treatment, is generally defined as fairness. On the other hand, fairness might mean different things to different people in different situations. Similarly, various disciplines have varied definitions of fairness.

Fairness in the law sector entails safeguarding persons and groups against discrimination and mistreatment, focusing on outlawing discriminatory behaviors, biases, and decisions based on particular protected elements or social group categories. Members of certain groups (or identities) are more likely to benefit. “Often evaluates fairness in light of social interactions, power dynamics, institutions, and markets,” according to social science.

Fairness issues are considered mathematical problems in quantitative sciences (such as math, computer science, statistics, and economics). For a given task or problem, fairness usually corresponds to some form of criteria, such as equal or equitable allocation, representation, or mistake rates.

According to philosophy, fairness principles “depend on a sense that what is fair is also what is ethically decent,” according to philosophy.

Fairness is linked to conceptions of justice and equity in political theory. Definitions might differ even among disciplines. So it’s no surprise that fairness in machine learning algorithms has been a source of consternation.

The concerns with machine learning’s default fairness approach

The primary lens for fairness used by ML researchers and practitioners is a quantitative approach. They concentrate on developing an optimal machine learning model bound by fairness requirements (a “constrained optimization problem”). From the views of law, social science, and philosophy, you can inform the mode model’s restrictions.

Constraints are frequently placed on sensitive, legally protected properties. Researchers and practitioners in machine learning (ML) want the model to perform as well as feasible but still treat people “fairly” when it comes to these sensitive characteristics.

Fairness can be defined at the individual or group level (for example, ensuring that similar individuals are treated equally). In the latter situation, this is accomplished by categorizing people and ensuring that the groups are treated fairly. While several methods define fairness for a group, the most basic is to pursue demographic parity across different subgroups (i.e., each subgroup obtains the good consequence at the same rate/proportion). With demographic parity, belonging to a protected class should not bear the outcome.

This quantitative method has the potential to be problematic. Tightly described approaches do not always capture the complexities and varied perceptions of fairness. While pursuing demographic parity may appear to be a reasonable answer, it is a fundamental approach to fairness that can conflict with other conceptions of fairness, such as justice. Also, even if demographic parity is met based on gender, equality can be off, especially when the race is added on top of gender. It’s also crucial to examine equality regarding how ML systems allocate resources and choose not to allocate resources.

The Compass algorithm controversy

Judges frequently employ the COMPAS algorithm created by Equivalent (formerly Northpointe), predicting which criminals are most likely to re-offend. It used a mathematical approach to fairness, attempting to forecast recidivism for defendants as accurately as possible across all individuals.

While ProPublica discovered that it correctly predicted recidivism for Black and white defendants at roughly the same rate, it was incorrect in different ways for Black and white people. For instance, the Black arrestees not rearrested in a 2-year horizon scored as high risk at twice the rate of white arrestees who were not subsequently arrested.

It also gave a lower grade to white persons who were more likely to commit crimes than black people. The algorithm maintains the current quo by ignoring how and why the policing system has been and continues to be racist toward Black people.

The COMPAS algorithm, according to Northepointe, was fair because it showed the same chance of recidivism across all categories. It is not fair, according to ProPublica, to treat likes alike, especially as race is a protected social group category.

This sort of thinking represents the rationale in government decision-making by treating all citizens equally.

As a result, it failed to meet the quantitative definition of fairness when it was incorrect. Other notions of fairness are also violated by this method, notably from social science and political philosophy. However, there is no apparent “wrong” or “right” answer. Questions arise: Is this the appropriate framing of fairness in this context? Is it suitable for a private sector entity to decide what is fair in a public-sector matter?

Challenges

We put our faith in machine learning algorithms, expecting them to be “fair” and nondiscriminatory, particularly regarding legally protected groups, such as race and gender. But it isn’t that easy. Different definitions and ideas of fairness are rarely explored at the outset of the development of an ML system. Even if alternative definitions or methodologies are considered, there isn’t always a “correct” result for a specific AI system. Furthermore, different actors involved in the ML process from dataset production to algorithm development to AI system deployment may have diverse perceptions and interpretations of fairness.

The legislation provides some direction in the instance of the COMPAS algorithm, but it still leaves leeway for interpretation of what fairness entails. Maintaining the status quo using artificial intelligence systems, as the COMPAS algorithm is designed to accomplish, can obscure, perpetuate, and magnify disparities. Importantly, AI systems can mimic society over time and perhaps accentuate existing imbalances. Again, there is no “correct” solution, and views on “fair” in that situation differ. Many activists stress the significance of focusing on justice.

Selecting a fairness definition/approach necessitates trade-offs, which you must describe to comprehend what an AI system is supposed to do and why and allow for debate. It might be troublesome, especially if it is done without clearly explaining why or selecting a specific strategy. It’s simple, even enticing, to check the box and declare the model ‘fair’ according to the chosen definition or approach.

There are also issues emanating from a technical standpoint: adding more fairness criteria places restrictions on an algorithm, resulting in reduced accuracy.

Fairness is a term that comes into play in terms of how an ML system is utilized. In addition, it is also concerned with how it is built. It may be considered unjust if users cannot observe, understand, or dispute AI system decisions. Furthermore, because ML models are opaque, ensuring “fairness” can be difficult.

Trends to promote justice

In every facet of society, justice is based on fairness. Opportunities and resources are informed and assigned along socially constructed group boundaries, and society is a product of its history (e.g., race, gender, class, sexual orientation, and ability).

A justice approach recognizes how oppressed or marginalized groups have been and attempts to remedy this to increase freedom and possibilities for all. As far as AI systems, a justice approach evaluates how some groups are oppressed or marginalized in the context and investigates how the AI system can advance equity rather than perpetuating a status quo that oppresses or marginalizes specific groups.

Tools

Tools can assist practitioners in navigating the choppy waters of fairness. They can offer direction, formalize processes, and empower individual employees. They also serve as a means of documenting choices so that teams can explain their positions and engage in debate.

Qualitative research methods

Qualitative tools aid in exploring the complexities of justice while also prompting critical conversation and thought. They can help teams imagine the AI system and its role in society, investigate potential fairness-related damages and trade-offs, sketch out how bias might occur, and develop ways to avoid biases. They can also assist in the tracking and monitoring of any fairness-related harms.

Two qualitative tools are highlighted:

Fairness Analytic: Mulligan created this tool to help people talk about fairness throughout the project’s early stages. It helps teams think about what fairness could and should entail for a particular AI system by combining principles of fairness from multiple disciplines.
Co-designed AI fairness checklist: To co-design an AI fairness checklist, a group of Microsoft researchers and academic researchers enlisted the help of 48 people from 12 different technology businesses. The checklist comprises items to consider at various AI system development and deployment stages (e.g., conceive, define, prototype, construct, launch, and evolve). The checklist is intended to be personalized. It assists teams in deciphering terminology, promoting debate, and developing a shared understanding.

They have limitations, just like technological tools. Checklists, for example, can be “gamed” – especially if an organization tends to focus on technical solutions (explicitly or not).

While there are various tools available, users must understand the tools they’re using and what gaps those products fill – and don’t fill. Several mechanisms, both technological and non-technical solutions, are frequently helpful and required.

Quantitative/technical tools

Several AI fairness tools are available to assist engineers and data scientists in examining, reporting, and mitigating discrimination. It also includes eliminating biases in machine learning models. Consider the following scenario:

IBM’s AI Fairness 360 Toolkit: a Python toolkit that focuses on technical solutions via fairness measures and algorithms to assist users in examining, reporting, and mitigating discrimination and bias in machine learning models.
Facebook is working on an internal tool called “Fairness Flow” to detect bias in machine learning algorithms.
Google’s What-If Tool: a tool for analyzing a model’s performance on a dataset, including looking at numerous pre-defined fairness criteria, e.g., equality of opportunity. This tool is helpful since it allows users to experiment with various definitions of fairness.
fairlean.py, a Python package from Microsoft that implements several techniques to reduce “unfairness” in supervised machine learning.

These tools tend to employ a technical lens and focus on technological solutions, regardless of whether they focus on data or the broader AI system lifecycle. Technical solutions are necessary but sometimes overlook essential aspects of fairness. The complexities behind the COMPAS algorithm’s discrimination would not have been caught by a program that relied solely on technological solutions. To understand and minimize biases, a solely technical approach is insufficient. It fosters the deceptive impression that machine learning systems can accomplish “fairness” or “objectivity.”

Considerations

Rather than attempting to make an ML system perfectly fair or “de-biasing,” the goal could be to detect and reduce fairness-related harms to the greatest extent possible. The following questions should always be asked: who is being treated fairly? What context do you mean?
Fairness does not end with the development of an AI system. Ascertain that users and stakeholders can view, understand, and appeal AI system decisions.
Because there aren’t always clear-cut answers, it’s a good idea to keep track of processes and considerations, including priorities and trade-offs.
Identify fairness considerations and techniques upfront, and engage and empower appropriate voices, i.e., experts in the relevant topic and across disciplines in the discussion.
To aid in the facilitation of these processes, use quantitative and qualitative methodologies and instruments. Tools do not guarantee fairness! They are a good practice within the greater holistic approach to prejudice mitigation.

Conclusion

Machine learning’s original sin is bias. It’s in the very nature of machine learning: the system learns from data and is prone to picking up on the human bias tendencies represented by the data. For example, a machine learning hiring system trained on current American employment is likely to “learn” that being a woman is unrelated to being a CEO.

Cleaning the data so that the machine detects no hidden, harmful associations can be pretty tough. Even with extreme caution, an ML system may uncover bias patterns that are so subtle and complicated that they escape even the most well-intentioned human observation. As a result, computer scientists, policymakers, and anybody concerned about social justice focus on ways to keep bias out of AI.

Fairness in machine learning systems

Understanding the concept of “fairness”