In Short

The Fault in Our Data

Sound Wave
Marina DehnikShutterstock.com

“Siri, call Bonnie Sun.”

I was recently trying to get my voice assistant to call my friend. Siri responded, “Sorry, I didn’t get that.”

I tried again. No luck. I grew frustrated. And then I realized that my friend’s last name is pronounced like sun instead of tsun (with a slight T sound at the beginning and short U sound in the middle), as I’d say in my native Chinese. Finally, in a flurry of frustration, I switched the phone’s settings from English to Mandarin—confident that Siri would at last understand what I was saying. So I tried again … and she still didn’t understand me.

I probably wouldn’t have thought twice about this if I hadn’t noticed how accurately Siri understands my boyfriend, a native English speaker. The episode made me wonder: Why do AI systems treat languages differently? What’s missing during the development stages such that it’s incapable of recognizing people speaking fluently in distinct languages?

The answer, I’ve learned, lies largely in uneven data collection.

TIME magazine recently published a story, based on findings from a UNESCO report, about how voice assistants often bolster gender bias. The story underscores that “the female voices and personalities projected onto AI technology reinforces the impression that women typically hold assistant jobs and that they should be docile and servile.” And as my own aforementioned experience taught me, this bias can also be racial in nature.

Indeed, many researchers have demonstrated that marginalized groups are often disadvantaged by the ballooning presence of AI systems. More specifically, the development of this particular technology has tended to benefit already-advantaged individuals, bringing more convenience to their daily lives while leaving many others without access to some technical features.

The question, then, is: Where do these AI problems come from in the first place?

Many of AI’s biases are essentially invisible—difficult to detect because they’re embedded in data and coding. As algorithms become more sophisticated and “smarter,” sometimes even the programmers can’t explain how the system works. And with some high-level, low-explainability algorithms, such as natural-language processing, people have more difficulty recognizing biases in an AI system.

What this in turn means is that, even though larger volumes and wider varieties of data, coupled with new techniques for analysis, are available, various data defects remain during collection, data-mining, and the processing of algorithms. In particular, there are three primary ways that data analytics can discriminate: oversampling and overrepresenting specific demographic groups, “inheriting” prejudice from past data patterns, and using proxies when selecting individuals.

In simplest terms, oversampling and overrepresentation mean that too much data is disproportionately collected from certain demographic groups. One example is Google’s facial recognition function, which labeled black American women as gorillas due to a lack of black American women’s faces in the training data. (The facial recognition application used white men’s faces to train the machine.) Think of it this way: If the training data is defective, the system can be defective. As they say in some professions, “garbage in, garbage out.”

Another way training data can be defective is when prejudice from previous, already-biased data sets and algorithmic patterns is transferred to future ones. Imagine that a computer-hiring software thinks that men are, on the whole, more qualified for a certain position, and then a man is selected. The computer takes this selection into consideration, and might think: Yes, men are more suitable for this role. But that might not be true. Women are likely suitable, too. The point is that algorithms can produce a vicious cycle when past defective data is baked into the new data-analytics process.

Algorithmic use of proxies is a source of blatant unfairness, too. For instance, zip codes and neighborhoods are widely used as proxies for race; the reputation of the school someone graduated from is often used as a proxy for job qualifications. Proxies represent a straightforward and cheap way to predict future outcomes, but they’re frequently predicated on present, troublesome realities—like high-interest loans for people living in certain areas, or higher risk-assessment scores for people who don’t have a college degree—and, consequently, they tend to replicate a variety of inequalities in the future.

Because humans create algorithms, the biases of the algorithms largely reflect the flaws of their creators and the environments they live in. People speaking other languages are systematically underrepresented and disadvantaged in home-assistance systems largely because the decisions made by Siri programmers favor native English speakers. According to the Washington Post, while AI is taught to recognize different accents, “too many of the people training, testing and working with the systems all sound the same.” In the end, the more common “broadcast English,” which is the “predominantly white, nonimmigrant, non-regional dialect of TV newscasters,” is more likely to be understood. Siri, in short, has essentially excluded many non-native English speakers from using their native language for voice commands. (Also excluded from the voice-assistant boom are people with speech disabilities.)

Use of AI is becoming increasingly common in contexts beyond home assistants—think of its prevalence for criminal justice, credit scoring, cybersecurity, automated vehicles, and financial services. Yet even these applications leave out or harm some marginalized groups.

In Florida, the risk assessment scores calculated by an AI system—which was developed by the private firm, NorthPointe—are used in court. The risk assessment score predicts the likelihood that a person will commit a crime in the future (sound familiar?), and thus is used to decide jail time in the justice system. Yet a ProPublica study showed that the computer software systematically gives higher risk scores to black Americans than to similarly-positioned white Americans. This goes against the original ideal of using AI to improve human welfare, and to make life easier and more efficient.

So, how to chart a more equitable path forward?

The solution isn’t to “turn off” certain features or stymie technological development simply to avoid being called biased, as Google did in an effort to fix its racist algorithm. (It completely removed the gorilla category from Google Photo, without actually fixing the deeper issue.)

Rather, to make meaningful moves toward preventing AI systems from reproducing inequality, programmers, policymakers, and users more broadly ought to be aware of the possible biases of the system, the causes of these biases, and the attendant risks for certain groups.

The solution, in other words, lies in building awareness of the data fed into AI systems—of the subtle ways datasets can entrench bias in algorithms.

More About the Authors

Tong
Tong "Echo" Wu