We live in a world with an exponential dependence on artificial intelligence and machine learning. From fraud detection to medical diagnostics, we trust computers to handle an increasing number of tasks once managed by their human counterparts.
But in doing so, can we be confident that the decisions and recommendations are made free of bias and discrimination?
At first glance, it is safe to say that computers can’t be biased or show favor. They aren’t free thinkers (not yet anyway) and are merely tools created and used by society. As such, they are no more biased than a camera or a paintbrush. However, the reality is a bit more complicated.
Machine learning comprises the analysis of extensive collections of data, from which we determine an equation that accurately represents that data. In doing so, we can apply new inputs against that same equation to categorize and make predictions.
In its simplest form, we have a single variable x for which we attempt to determine a corresponding dependent variable y. When we plot known data on a graph, we can create a simple binary predictor by drawing a straight line between the two different outcomes.
Those of you recently thrust into homeschooling may recall that the mathematical formula for a line is y = mx + b. Given a value for x, we can then calculate the corresponding y value if we know two other factors. The first is m, representing the slope of the line. The second is b, which represents the y-intercept, or the bias.
Real-world machine learning models are much more complicated than this, involving numerous inputs and resulting in intricate graphs and equations. Even so, the concepts are similar.
The remarkable growth of machine learning has much more to do with Moore’s Law and our ability to capture, store, and analyze vast amounts of data than any specific scientific breakthrough. It is the data that determines the algorithm, and the more data we have, the more accurate our models can be.
The bias within a line equation is merely a label and isn’t representative of the bias we struggle with on a day-to-day basis. However, it is critical for us to understand that there can be (and is) bias within data and we must account for that.
We experience this internally with clients as we analyze user behavior through data-driven design. If the segment of users we observe over-represents or under-represents a specific type of user, it skews our findings and results in a less than ideal design that doesn’t work for everyone. Understanding the diversity of the audience and ensuring fair representation is essential. It provides a balanced system that works for everyone in an optimal fashion.
Humans are susceptible to these types of biases every day. When we consume information that over-represents or under-represents a specific group of people, our internal predictive patterns can become skewed. The difference is that we, unlike computers, have the ability to be self-aware and (ideally) able to observe and identify these biases.
We can make adjustments, like choosing to read an alternative news source or start a conversation with a colleague who has a different perspective or background. It doesn’t mean that humans do this all the time, but we have the ability to.
Machines do not have awareness or choice, and therein lies the danger.
In many cases, the sheer volume makes it difficult to identify and address problematic data, especially in large historical datasets gathered during years of systemic racism, bigotry, and associated policies. This data is hazardous in models that attempt to predict future behavior.
Awareness is Key
While computers are unable to be biased on their own, the data that drives them most certainly can. While it is doubtful that we will be able to remove bias from the system altogether, we can take important steps to address and minimize these imbalances.
- Raise awareness: The first step is acknowledging we have a problem. Do not blindly trust the data or the results under the false pretense that these systems have no bias.
- Ensure fair representation: Question the source and means by which data was collected. Investigate the circumstances by which any group is under or over-represented. When looking for bias within the data, focus on all your data points. While it’s important to fairly represent groups historically affected by bias (culture, race, gender, etc.) you should ensure you address unfair representation in other areas as well (education, wages, etc.)
- Identify biased outcomes: Observe the results to look for categorizations or recommendations that are biased towards a specific group. Question those results to determine if the data set is fair, and to possibly determine if there are other factors at play which may skew the results.
- Adjust your models: There is a reason we have numerous different machine learning models. One size doesn’t fit all, and it’s important to make the adjustments needed to ensure the end results are fair, balanced, and accurate.
- Allow for oversight: Whenever possible, allow for peer review of your data collection techniques, your data, and your outcomes to help look for shortcomings. This is especially important in public and government-funded systems that affect our fellow citizens.
As with most tools, it is up to us as a society to ensure we take these steps to utilize machine learning with a healthy respect for its potential dangers, pitfalls, and shortcomings.