Welcome to New America, redesigned for what’s next.

A special message from New America’s CEO and President on our new look.

Read the Note

Introduction

The proliferation of digital platforms that host and enable users to create and share user-generated content has significantly altered how we communicate with one another. In the 20th century, individual communication designed to reach a broad audience was largely expressed through formal media channels, such as newspapers. Content was produced and curated by professional journalists and editors, and dissemination relied on physically transporting physical artifacts like books or newsprint. As a result, communication during this period was expensive, slow, and, with some notable exceptions, easily attributed to an individual speaker. In the twenty-first century, however, thanks to the expansion of the internet and social media, mass communication has become cheaper, faster, and sometimes difficult to trace.1

The widespread adoption and penetration of platforms such as YouTube, Facebook, and Twitter around the globe has significantly lowered the costs and barriers to communicating, thus democratizing speech online. Over the past decade, platforms have thrived off of users creating and exchanging their own content—whether it be family photographs, blog posts, or pieces of artwork—with speed and scale. However, in enabling user content production and dissemination, platforms also opened themselves up to unwanted forms of content, including hate speech, terror propaganda, harassment, and graphic violence. In this way, user-generated content has served as a key driver of growth for these platforms, as well as one of their greatest liabilities.2

In response to the growing prevalence of objectionable content on their platforms, technology companies have had to create and implement content policies and content moderation processes that aim to remove these forms of content, as well as accounts responsible for sharing this content, from their products and services. This is both because companies need to comply with legal frameworks that prohibit certain forms of content online, and because companies want to promote greater safety and positive user experiences on their services. In addition, in the context of the United States, this is because the First Amendment limits the extent to which the government can set the rules for what type of speech is permissible. Over the last few years, both large and small platforms that host user-generated content have come under increased pressure from governments and the public to remove objectionable content. In response, many companies have developed or adopted automated tools to enhance their content moderation practices, many of which are fueled by artificial intelligence and machine learning. In addition to enabling the moderation of various types of content at scale, these automated tools aim to reduce the involvement of time-consuming human moderation.

“Over the last few years, both large and small platforms that host user-generated content have come under increased pressure from governments and the public to remove objectionable content.”

However, the development and deployment of these automated tools has demonstrated a range of concerning weaknesses, including dataset and creator bias, inaccuracy, an inability to interpret context and understand the nuances of human speech, and a significant lack of transparency and accountability mechanisms around how these algorithmic decision-making procedures impact user expression. As a result, automated tools have the potential to impact human rights on a global scale, and effective safeguards are needed to ensure the protection of human rights.

This report is the first in a series of four reports that will explore how automated tools are being used by major technology companies to shape the content we see and engage with online, and how internet platforms, policymakers, and researchers can promote greater fairness, accountability, and transparency around these algorithmic decision-making practices. This report focuses on automated content moderation policies and practices, and it uses case studies on three platforms—Facebook, Reddit, and Tumblr—to highlight the different ways automated tools can be deployed by technology companies to moderate content and the challenges associated with each of them.

Defining Content Moderation

Content moderation can be defined as the “governance mechanisms that structure participation in a community to facilitate cooperation and prevent abuse.”3 Currently, companies employ a range of approaches to content moderation, and they use a varied set of tools to enforce content policies and remove objectionable content and accounts. There are three primary approaches to content moderation:4

  1. Manual content moderation: This approach, which typically relies on the hiring, training, and deployment of human moderators to review and make decisions on content cases, can take many forms. Large platforms tend to rely primarily on outsourced contract employees to complete this work. Small- to medium-size platforms tend to employ full-time, in-house moderators or rely on user moderators who volunteer to review content.
  2. Automated content moderation: This approach involves the use of automated detection, filtering, and moderation tools to flag, separate, and remove particular pieces of content or accounts. Fully automated content detection and moderation practices are not widely used across all categories of objectionable content, as they have been found to lack accuracy and effectiveness for certain types of user speech. However, these tools are widely used for some types of objectionable content, such as child sexual abuse material (CSAM). In the case of CSAM, there is a clear international consensus that the content is illegal, there are clear parameters for what should be flagged and removed based on the law, and models have been trained on enough data to yield high levels of accuracy.
  3. Hybrid content moderation: This approach incorporates elements of both the manual and automated approaches. Typically, this involves using automated tools to flag and prioritize specific content cases to human reviewers, who then make the final judgment call on the case. This approach is being more widely adopted by both smaller and larger platforms, as it helps reduce the initial workload of human reviewers. Additionally, by letting a human make the final decision on a case, it comparatively limits the negative externalities that come from using automated tools for content moderation (e.g., accidental removal of content due to inaccurate tools or tools that cannot understand the nuances or context of human speech).

In addition, there are two different models of content moderation that are deployed by platforms, often depending on their size and capacity to engage in substantial content moderation practices.5

  1. Centralized content moderation: This approach often involves a company establishing a broad set of content policies that they apply globally, with exceptions carved out to ensure compliance with laws in different jurisdictions. These content policies are enforced by a large group of moderators who are trained, managed, and directed in a centralized manner. The most common examples of companies who utilize this model are large internet platforms like Facebook and YouTube.
  2. Decentralized content moderation: This approach often tasks users themselves with enforcing content policies. This can take different forms. In most cases, users are given an overarching set of global policies by a platform, which serve as a guiding framework. These companies also typically employ a small number of full-time content moderation staff to oversee general enforcement. The majority of the moderation, however, occurs in a decentralized manner. For example, on Reddit, user moderators are responsible for removing and regulating content in the same way that a moderator in a centralized model does. In addition, moderators on Reddit also have the power to create additional content guidelines for their particular domains.

Both models offer benefits to platforms. For example, centralized models help platforms promote consistency in how they enforce their content policies, and they provide a clear starting point for creating and enforcing new policies. Decentralized models, on the other hand, enable more localized, culture-specific, and context-specific moderation to take place, fostering a diversity of viewpoints on a platform. Centralized models also create robust checkpoints for evaluating content. However, should this checkpoint be evaded, these platforms have few methods of then removing the evading content. In comparison, decentralized platforms offer multiple levels of content evaluation and review.6

Finally, content moderation can take place at three different stages of the content lifecycle. These often involve competing pressure for promoting safety and security on platforms while also safeguarding free expression.7

  1. Ex-Ante Content Moderation: Typically, when a user attempts to upload a photograph or video to a website, it is screened before it is published. This moderation is mostly carried out through algorithmic screening and does not involve active human decision-makers. This form of content moderation is most commonly used to screen for CSAM or copyright-infringing material using tools such as PhotoDNA and ContentID. In these cases, there is typically no competing pressure between promoting safety and security and safeguarding free expression, as these clearly illegal forms of content do not have recognized free expression rights.
  2. Ex-Post Proactive Content Moderation: As platforms have come under increased pressure to identify and remove objectionable forms of content such as terror propaganda, they have begun using automated tools to proactively search for and remove content and accounts in these domains.
  3. Ex-Post Reactive Content Moderation: This form of content moderation takes place after a post has been published on a platform and subsequently flagged or reported for review by a user or entity such as an Internet Referral Unit or Trusted Flagger.8 On most platforms, content that has been flagged is typically processed and triaged by an automated system that then relays relevant content to human moderators for review.
Citations
  1. Kyle Langvardt, "Regulating Online Content Moderation," The Georgetown Law Journal 106, no. 1353 (2018): source.
  2. Sarah T. Roberts, "Digital Detritus: 'Error' and the Logic of Opacity in Social Media Content Moderation," First Monday 23, no. 3 (March 5, 2018): source.
  3. James Grimmelmann, "The Virtues of Moderation," Yale Journal of Law and Technology 17, no. 1 (2015): source.
  4. Grimmelmann, "The Virtues of Moderation".
  5. Grimmelmann, "The Virtues of Moderation".
  6. Grimmelmann, "The Virtues of Moderation".
  7. Kate Klonick, "The New Governors: The People, Rules, and Processes Governing Online Speech," Harvard Law Review 131, no. 1598 (April 10, 2018): source. This report incorporates Klonick’s framework for the different stages of content moderation. However, the framework has been adapted to emphasize the role of algorithmic tools and manual content moderation processes during the ex-post proactive and ex-post reactive content moderation stages.
  8. Internet Referral Units are government-established entities responsible for flagging content to internet platforms that violates the platform’s Terms of Service. Trusted Flaggers are individuals, NGOs, government agencies, and other entities that have demonstrated accuracy and reliability in flagging content that violates a platform’s Terms of Service. As a result, they often receive special flagging tools such as the ability to bulk flag content.

Table of Contents

Close