Everything in Moderation

Promoting Fairness, Accountability, and Transparency Around Automated Content Moderation Practices

As outlined in this report, internet platforms of all sizes have developed and adopted automated tools to aid their content moderation efforts. However, these tools have demonstrated a range of weaknesses and are often created and operated in a nontransparent manner. In part, this lack of transparency is due to the black box nature of such tools, which prevents comprehensive insight into algorithmic decision-making processes, even by their creators. In addition, internet platforms have not been sufficiently transparent about how these tools are created, trained, applied, and refined. This poses significant threats to user expression. Going forward, developers, policymakers, and researchers should consider the following set of recommendations in order to promote greater fairness, accountability, and transparency around algorithmic decision-making in this space.

1. Policymakers need to educate themselves on the limitations of automated content moderation tools. Lawmakers around the world are pressing and sometimes mandating that companies be more proactive in their approach to removing harmful content. This encourages these platforms to prioritize speed over accuracy, and therefore to deploy inaccurate and nontransparent automated tools to meet these expectations and requirements. This also encourages companies to err on the side of removing online speech in order to avoid liability, which therefore poses a serious threat to users’ free expression rights—particularly users from marginalized and vulnerable groups. Policymakers should recognize the limitations of automated tools for content moderation purposes and should encourage companies to establish responsible safeguards and practices around how they deploy automated tools for content moderation, rather than placing pressure on platforms to rapidly remove content in a manner that generates negative consequences.

2. Companies need to take a more proactive role in promoting fairness, accountability, and transparency around algorithmic decision-making related to content moderation. This is a vital process that can and should take many forms:

Companies should disclose more information to policymakers, researchers, and their users around their algorithmic models. This should include, but not be limited to, what kinds of information datasets contain (e.g. how regionally, linguistically, and demographically diverse the data are), what kind of outputs models generate, and how they are working to ensure tools are not being misused or abused in unethical ways. This should also include data on accuracy rates for human and automated detection and removal, including the false positive, true positive, and false negative rates, as well as the precision and recall metrics.¹ This will help policymakers understand the limitations of these tools and will also enable researchers and the public to better understand how these tools are impacting user expression and the content that they engage with online.
Companies should use transparency reports as a mechanism for providing additional public information about their automated content moderation practices. At a minimum, companies should break down the number of accounts and pieces of content that were flagged and removed by how they were detected (e.g. through the use of automated tools, through user flags, etc.), and they should also report on how much of the content that was flagged and/or removed using automated tools was erroneously done so, as well as how much of this content was subsequently restored, either proactively by the platform or through user appeals. Currently, very few companies disclose data around how their automated tools impact user speech and no platforms do it in a manner that is comprehensive and meaningful. Resources such as the Open Technology Institute’s Transparency Reporting Toolkit,² Ranking Digital Rights’ Corporate Accountability Index,³ and the Santa Clara Principles⁴ can help companies navigate this disclosure process and understand what meaningful transparency and accountability in this regard looks like.
In order to foster more fairness and accountability, companies should also provide notice to users who have had their content removed in general, but especially as a result of automated tools, and they should also offer users a robust appeals process that is timely and easy to navigate in order to rectify erroneous takedowns. Guiding standards related to notice and appeals can also be found in the Santa Clara Principles,⁵ Corporate Accountability Index,⁶ and the Electronic Frontier Foundation’s Who Has Your Back: Censorship report.⁷
In order to foster a better understanding of the quality of automated tools, how they work, and what their limitations are, companies should further engage with the research community and provide them with access to their models for evaluation and assessment. Although companies have stated that they have concerns regarding protecting their trade secrets and wanting to avoid their systems from being gamed, there are avenues in which responsible and secure research can take place. For example, companies could establish safeguards such as a robust registration and security authentication processes for researchers.
Internet platforms are investing more in hiring human content moderators who have specific regional or linguistic expertise in order to help localize their moderation efforts and ensure that they can capture the nuances and contextual intricacies of human speech. The same level of effort needs to be invested in developing algorithmic models that are diverse and that can account for variations in speech and online behavior across regions, communities, and so on. The majority of developers creating these models are Western and English speaking, and a large proportion of the training set data is similarly skewed. As a result, these models reflect data and creator biases and are not adequately providing meaningful and effective outputs to the millions of users on these platforms who are non-Western and non-English speakers.

3. Research on algorithmic decision-making in the content moderation space needs to be more robust and should seek to test and compare how effective automated content moderation tools are across a range of factors including platforms, domains, and demographic attributes. With cooperation from companies, researchers should also seek to provide further insight around how datasets and classifiers are constructed and how accurate these tools are. The establishment of meaningful metrics in this space would also strongly guide future work and policy development. These insights will be valuable to both policymakers working to legislate and advocate around this space, as well as companies seeking to improve and refine their content moderation policies and practices. In addition, researchers should seek to broaden the scope of their research to include diverse types of speech, particularly non-Western and non-English speech, in order to pave the way for automated content moderation tools to become more accurate for users worldwide.

4. Automated tools should supplement, not supplant, human roles. Typically, conversations around the adoption of automated tools in any industry intersect with the notion that these tools will soon replace human labor as they are more efficient and cost-effective. However, in the context of content moderation, the effectiveness of these tools to identify and remove content across categories, formats, and platforms has proven to be limited. In order to safeguard freedom of expression and foster and maintain fairness and accountability in the content moderation process, internet platforms should ensure that human moderators remain in the loop during the moderation process, specifically when moderating categories of content that are vaguely defined and that require additional context to understand. By adopting and further streamlining hybrid approaches to content moderation, platforms should seek to use automated tools to augment human intelligence and enable human moderators to perform more effectively at scale, rather than to replace humans in this process entirely.

Citations

Bradford et al., Report Of The Facebook Data Transparency Advisory Group.
Spandana Singh and Kevin Bankston, The Transparency Reporting Toolkit: Content Takedown Reporting, October 25, 2018, source.
Ranking Digital Rights, "2019 Ranking Digital Rights Corporate Accountability Index," Ranking Digital Rights, last modified May 15, 2019, source.
"The Santa Clara Principles On Transparency and Accountability in Content Moderation".
"The Santa Clara Principles”.
“2019 Ranking Digital Rights Corporate Accountability Index”.
Gebhart, Who Has Your Back? Censorship Edition 2019.

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

New America Fellows

The Understated Value of Regional Intermediaries for Workforce and Economic Development

Evictions in the District of Columbia: June 2025 – February 2026

The Charleston Regional Youth Apprenticeship Model

Accreditation 101: A Fireside Chat on How Colleges Are Measured

Everything in Moderation

Table of Contents

Promoting Fairness, Accountability, and Transparency Around Automated Content Moderation Practices

Citations

Promoting Fairness, Accountability, and Transparency Around Automated Content Moderation Practices

Everything in Moderation