The Public Interest Potential of Natural Language Processing

Blog Post
Shutterstock.com / rafapress
Oct. 13, 2020

This story is part of PIT UNiverse, a monthly newsletter from PIT-UN that shares news and events from around the Network. Subscribe to PIT UNiverse here.

One of the greatest challenges facing internet companies and civil society is the rampant spread of disinformation and hate speech on social media platforms worldwide. Part of the problem is a lack of information—it’s hard to chart the viral spread of specific racist ideas and hateful content amid the sea of noise on social media platforms with millions, if not billions, of users.

Understanding the scope of online hate speech and disinformation is one promising application of Natural Language Processing, or NLP, the field of artificial intelligence that enables computers to understand and generate human language.

Yulia Tsvetkov is one of the people who is leveraging this technology to make a change. Tsvetkov, a 2019 PIT-UN Network Challenge grantee, is an assistant professor in the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. Earlier this year, Tsvetkov’s lab partnered with the Washington Post Fact Checker to track the spread of anti-Black hate speech on social media in Guangzhou, China. The work stemmed from the fact that this year, from late March to early April, unfounded fears that Africans were at high risk of spreading COVID-19 prompted a rash of anti-black discrimination from Guangzhou authorities and businesses.

Tsvetkov calls NLP a “perfect use case of public interest technology,” because it has exciting potential for use in the public interest, but says it also carries significant ethical and privacy risks if used carelessly.

“NLP develops algorithms that process human language, and humans are inherently biased,” she says. “Machine learning algorithms are very good at picking up on those patterns in language, and they learn to absorb and reinforce human biases. As a consequence, naively-built language technologies that do not explicitly address those risks often exhibit undesirable behaviors, potentially with catastrophic consequences. My lab and students in our course, along with many other researchers, are focusing on detecting and mitigating those risks.”

In terms of what these technologies can accomplish, Tsvetkov adds, “I think the biggest promise of language technologies is that they can serve internet users all over the world in their daily tasks involving language and communication. They can provide interfaces for users for accessing education and knowledge on the web, for finding social connections, employment, and friendships. They can have a huge impact, since we all communicate using language.”

The collaboration between Tsvetkov’s lab and the Washington Post focused on identifying discriminatory sentiments directed against the African population in Guangzhou. The team collected more than 200,000 posts from the social network Weibo, using NLP to analyze the sentiments expressed in the posts at scale. As a result, they were able to track the rise of xenophobic and discriminatory language throughout April, 2019.

While sentiment analysis is a ubiquitous application of NLP, Tsvetkov says it is typically used by for-profit companies to better monetize their products. Turning this tool towards identifying hateful and discriminatory speech is a strong example of NLP’s potential for public interest use.

“While sentiment analysis alone cannot address the problems of hate speech, misinformation, and disinformation,” Tsvetkov stresses, “its underlying algorithms can be adapted and used to surface problematic, uncivil, and potentially dangerous interactions on social media. These tools can be used to alleviate the mental load of human moderators on social media platforms, or to automatically detect and analyze problematic patterns of communication.”

In addition to her research, Tsvetkov, along with Prof. Alan Black, teaches CMU’s Computational Ethics for NLP class, which teaches students how to avoid ethical pitfalls and mitigate the risks and social biases in AI tools, as well as how to build language technologies for social good. Expanding the Computational Ethics course was the focus of her 2019 PIT-UN Network Challenge grant.

“The key goals of the course are to equip future technologists with theoretical and practical tools to combat social biases in language technologies, and to develop new techniques—informed by ethics, social science, and law—that are civic minded, that serve diverse populations equitably, and promote public good,” Tsvetkov says. She notes the course received very positive feedback from students and attracted many from underrepresented backgrounds. Many students have continued to engage with the course topics after the class ended—at least eight research papers resulted from students’ work in class.

Looking ahead, Tsvetkov is wary of the myriad ethical and privacy risks posed by artificial intelligence, including NLP, and wants to see the field move towards addressing them.

“We can learn a lot about a user through language analysis algorithms,” she says. “Especially when we analyse people’s communications across time and across their social networks. Personalization algorithms use this property of language to improve services such as search or targeted advertising. But the same algorithms can be used to track users, and to manipulate public opinion through targeted analysis of user’s feeds and through content personalization.”

“There's so much focus on "fake news" today, but I think much more danger comes from such subtle manipulation strategies,” she adds. “I hope more discussion in our field will be focused on developing NLP algorithms that identify and prevent subtle manipulation strategies like agenda setting and polarization.”