Everything in Moderation: How Platforms Use AI to Moderate User-Generated Content

The Limitations of Automated Tools in Content Moderation

As highlighted in the previous section, automated tools used for content moderation are limited in a number of ways. Given that these tools are increasingly being adopted by internet platforms, it is important to understand how they shape the content we engage with and see online, as well as user expression more broadly. This section provides a more detailed discussion of the primary limitations of these automated tools.

Accuracy and Reliability

The accuracy of a given tool in detecting and removing content online is highly dependent on the type of content it is trained to tackle. Developers have been able to train and operate tools that focus on certain types of content—such as CSAM and copyright-infringing content—so that they have a low enough error rate and can be widely adopted by small and large platforms. This is because these categories of content have large corpora with which tools can be trained and clear parameters around what falls in these categories. However, in the case of content such as extremist content and hate speech, there are a range of nuanced variations in speech related to different groups and regions, and the context of this content can be critical in understanding whether or not it should be removed. As a result, developing comprehensive datasets for these categories of content is challenging, and developing and operationalizing a tool that can be reliably applied across different groups, regions, and sub-types of speech is also extremely difficult. In addition, the definition of what types of speech fall under these categories is much less clear.¹ Although smaller platforms may rely on off-the-shelf automated tools, the reliability of these tools to identify content across a range of platforms is limited. In comparison, proprietary tools developed by larger platforms are often comparatively more accurate, as they are trained on datasets reflective of the types of content and speech they are meant to evaluate.²

“Although smaller platforms may rely on off-the-shelf automated tools, the reliability of these tools to identify content across a range of platforms is limited.”

Additionally, the definition of what constitutes accuracy varies based on the objectives of a researcher and a given model. In most NLP studies, accuracy can be defined as the degree to which a model could make the same decisions as a human being. However, because human beings come with their own set of biases and opinions that influence how they would categorize speech, this definition of accuracy is perhaps not the most reliable metric for evaluating automated tools in the content moderation space. Other factors, such as the ratio and number of false positives and false negatives, should also be considered. However, researchers and developers should recognize that these statistics represent more than just quantitative metrics. They also represent real impacts on user expression, and should therefore be weighted accordingly. ³

Contextual Understanding of Human Speech

In theory, automated content moderation tools should be easy to create and implement, as they are far more rule-bound than human beings. However, because human speech is not objective and the process of content moderation is inherently subjective, these tools are limited in that they are unable to comprehend the nuances and contextual variations present in human speech.⁴ As discussed above, these tools are limited in their ability to parse and understand variances in language and behavior that may result from different demographic and regional factors. For example, excessively liking someone’s pictures or using certain slang words may be construed as harassment on one platform or in one region of the world. However, these behaviors and speech may take on an entirely different meaning on another platform or in another community.⁵ In addition, automated tools are also limited in their ability to derive contextual insights from content. For example, an image recognition tool could identify an instance of nudity, such as a breast, in a piece of content. However, it is unlikely to be able to determine whether the post depicts pornography or perhaps breastfeeding, which is permitted on many platforms.⁶ In addition, automated content moderation tools can become outdated rapidly. On Twitter, members of the LGBTQ+ community found that there was a significant lack of search results that incorporated hashtags such as #gay and #bisexual, raising concerns of censorship. The company stated that this was due to the deployment of an outdated algorithm that mistakenly identified posts with these hashtags as potentially offensive. This demonstrates the need to continuously update algorithmic tools, as well as the need for decision-making processes to incorporate context in judging whether posts with such hashtags are objectionable or not.⁷ These automated tools also need to be updated as language and meaning evolves. For example, in an attempt to avoid moderation, some hateful groups have adopted new methods of slang and representations for indicating hate. One example of this is white supremacists using the names of companies, such as “Google” and “Yahoo” to replace ethnic slurs. In order to keep up, automated tools would have to adapt quickly and be trained across a wide range of domains. However, users could continue developing new forms of speech in response, thus limiting the ability of these tools to act with significant speed and scale.⁸ On some platforms when humans moderators engage in content moderation, they are able to combat the rapidly changing nature of speech by viewing additional information on the case, such as information on the user who is accused of violating the platform’s rules. However, incorporating such assumptions and processes into an automated tool runs the risk of enhancing biases around particular groups of individuals and could result in skewed or even discriminatory enforcement of content policies.⁹

As of now, AI researchers have been unable to construct comprehensive enough datasets that can account for the vast fluidity and variances in human language and expression. As a result, these automated tools cannot be reliably deployed across different cultures and contexts, as they are unable to effectively account for the various political, cultural, economic, social, and power dynamics that shape how individuals express themselves and engage with one another.

Creator and Dataset Bias

One of the key concerns around algorithmic decision-making across a range of industries is the presence of bias in automated tools. Decisions based on automated tools, including in the content moderation space, run the risk of further marginalizing and censoring groups that already face disproportionate prejudice and discrimination online and offline.¹⁰ As outlined in a report by the Center for Democracy & Technology, there are many types of biases that can be amplified through the use of these tools. NLP tools, for example, are typically used to parse text in English. Tools that have a lower accuracy when parsing non-English text can therefore result in harmful outcomes for non-English speakers, especially when applied to languages that are not very prominent on the internet, as this reduces the comprehensiveness of any corpora that models are trained on. Given that a large number of the users of major internet platforms reside outside English-speaking countries, this is highly concerning. The use of such automated tools in decision-making should therefore be limited when making globally relevant content moderation decisions.¹¹ These tools are also unable to effectively process differences in dialect and language use that may result from demographic differences.¹²

In addition, the personal and cultural biases of researchers are likely to find their way into training datasets. For example, when a corpus is being created, the personal judgments of the individuals annotating each document can impact what is constituted as hate speech, as well as what specific types of speech, demographic groups, and so on are prioritized in the training data. This bias can be mitigated to some extent by testing for intercoder reliability, but it is unlikely to combat the majority view on what falls into a particular category.¹³

Transparency and Accountability

One of the primary concerns around the deployment of automated solutions in the content moderation space is the fundamental lack of transparency that exists around algorithmic decision-making as a whole. These algorithms are often referred to as “black boxes,” because there is little insight into how they are coded, what datasets they are trained on, how they identify correlations and make decisions, and how reliable and accurate they are. Indeed, with black box machine learning systems, researchers are not able to identify how the algorithm makes the correlations it identifies. Currently, some internet platforms provide limited disclosures around the extent to which automated tools are used to detect and remove content on their platforms. In its Community Guidelines enforcement report, for example, YouTube discloses how many of the videos and comments it removed were originally detected using automated flagging tools, as well as what percentage of these videos were removed before they were viewed or after they were viewed.¹⁴Although many companies have been pushed to provide more transparency around their own proprietary automated tools, they have refrained from doing so, claiming that the tools are protected as trade secrets in order to maintain their competitive edge in the market—and also to prevent bad actors from learning enough to game their systems.¹⁵ In addition, some researchers have suggested that, in this regard, transparency does not necessarily generate accountability. In the broader content moderation space, it is gradually becoming a best practice for technology companies to issue transparency reports that highlight the scope and volume of content moderation requests they received, as well as the amount of content they proactively removed as a result of their own efforts. In this case, transparency around these practices can generate accountability around how these platforms are managing user expression.

However, in the case of algorithmic decision-making, researchers such as Maayan Perel and Niva Elkin-Koren have suggested that looking “under the hood” of black boxes would yield a large volume of incomprehensible data that is a combination of inputs and outputs and that would require significant data analysis in order to extract insights. Although processing this data is not impossible, it would not generally provide any transparency around how the actual decision-making occurred, as well as how a company is ensuring tools are being used fairly. In addition, unlike humans, algorithms lack “critical reflection.”¹⁶ As a result, other ways for companies to provide transparency in a manner that generates accountability are also being explored.¹⁷ One example of such a mechanism is providing greater transparency into the training data, as this can help researchers understand decisions being made by black-box algorithmic models to a certain extent.

“Unlike humans, algorithms lack critical reflection.”

Two mechanisms for providing accountability around content takedown decisions that are gradually being adopted are notice and appeals. Internet platforms have begun providing notices to users who have had their content removed or accounts suspended or deleted for violating content guidelines. In addition, some platforms have introduced appeals processes so that users can seek review of content or account-related decisions. However, these mechanisms have not yet been perfected. Although users may receive notifications that their content has been removed or their account has been suspended or deleted, these notices often lack meaningful explanations on which specific content guidelines the user violated. In addition, on some platforms, appeals processes do not enable users to provide more context or an explanation around the content or account in question, and appeals are often not available for all categories of content that are removed. Furthermore, the appeals process can often be a lengthy procedure that leaves a user without access to their account for a significant period of time. Although these mechanisms for generating accountability around content takedown practices are not perfect, they are gradually being adopted by a range of internet platforms.¹⁸

Citations

Raso, Filippo and Hilligoss, Hannah and Krishnamurthy, Vivek and Bavitz, Christopher and Kim, Levin Yerin, Artificial Intelligence & Human Rights: Opportunities & Risk
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
Grimmelmann, "The Virtues of Moderation".
Robyn Caplan, Content or Context Moderation: Artisanal, Community-Reliant, and Industrial Approaches, November 14, 2018, source.
James Vincent, "AI Won't Relieve the Misery of Facebook's Human Moderators," The Verge, February 27, 2019, source.
Hillary K. Grigonis, "Social (Net)Work: What can A.I. Catch — and Where Does It Fail Miserably?," Digital Trends, February 3, 2018, source.
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
Duarte, Llansó, and Loup, Mixed Messages?
YouTube, YouTube Community Guidelines Enforcement Report, 2019, source.
Langvardt, "Regulating Online Content Moderation”.
Maayan Perel and Niva Elkin-Koren, "Black Box Tinkering: Beyond Disclosure in Algorithmic Enforcement," Florida Law Review69, no. 181 (2017): source.
Perel and Elkin-Koren, "Black Box Tinkering: Beyond Disclosure in Algorithmic Enforcement".
Gennie Gebhart, Who Has Your Back? Censorship Edition 2019, June 12, 2019, source.

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

New America Fellows

No Place to Land: Housing Insecurity Among Caregiving College Students

You’ve Changed

From Life Itself

Cultivating Connections: Why Relationships Matter for Youth Apprenticeship

Everything in Moderation: How Platforms Use AI to Moderate User-Generated Content

Table of Contents

The Limitations of Automated Tools in Content Moderation

Accuracy and Reliability

Contextual Understanding of Human Speech

Creator and Dataset Bias

Transparency and Accountability

Citations

The Limitations of Automated Tools in Content Moderation

Everything in Moderation: How Platforms Use AI to Moderate User-Generated Content