News Feed Ranking
Search engines are not the only internet platforms that have adopted algorithmic curation and ranking practices. Many small and large online platforms, including social networking and review-based services, have also introduced these tools to curate and present content that is relevant to their interests and needs. The creation of these news feeds has opened up new avenues for users, businesses, brands, and content creators to create and disseminate information at great scale. They have therefore also helped to promote free expression online and boost the revenue of internet platforms. However, these methods of algorithmic curation and ranking also raise a number of concerns, particularly regarding algorithmic awareness and transparency, the creation of filter bubbles, and algorithmic bias. Like search engines, news feeds can shape user perspectives by prioritizing certain forms of content and deprioritizing others.
This section of the report focuses on how internet platforms deploy algorithmic curation and ranking practices in order to shape and operate their news feeds. It will use three platforms —Facebook, Twitter, and Reddit—as case studies of how such practices can vary and what concerns they surface.
Case Study: Facebook
Since its creation in 2004, social media company Facebook has expanded its services to include features such as messaging and “smart displays.” As of June 2019, Facebook has approximately 1.59 billion daily active users1 and ranks third in global internet engagement on Alexa rankings.2
Today, Facebook offers one of the clearest examples of algorithmic content curation and ranking in a news feed. The Facebook News Feed is composed of stories produced by a user’s friends, Pages a user follows, Groups a user is part of, and suggested content such as stories and advertisements. Facebook launched the first iteration of its News Feed in 2006.3 The launch of the News Feed drew significant backlash and controversy from users, even sparking calls to boycott the platform. In particular, users were concerned that the News Feed was eroding their privacy by making more information about them available to their friends and others on the platform, which Facebook asserts was not the case.4
Despite controversy around this major change, the News Feed has emerged as one of the largest and most significant billboards and content hubs for users, brands, publishers, and influencers.5 It has also been a major driver of advertising, generating a significant amount of revenue for the platform.6 The News Feed was launched to drive further engagement—and thus revenue—on the platform; Facebook asserts it was also designed to present users with content that is relevant and meaningful to them. Prior to the launch of the News Feed, users on Facebook and comparable platforms like MySpace and Friendster had to seek out content posted by their friends or Pages by visiting individual pages. The News Feed brought this information together and curated posts based on predictions of what content users were interested in and would engage with. It ranked these posts so that users could view the content that was deemed the most relevant to them first. According to Facebook CEO Mark Zuckerberg, the average Facebook user has approximately 1,500 posts that could appear on their News Feed every day. However, users spend a limited amount of time on their News Feed, and as a result they are likely to only read and potentially engage with 100 of these posts. Facebook states that the News Feed curation process seeks to ensure that these 100 posts are the most relevant and meaningful to a user.7
However, in its efforts to provide users with more meaningful interactions, Facebook is also aiming to increase the time users spend on the platform. This in turn drives increased revenue through avenues such as advertising, furthering the company’s bottom line. Additionally, as Facebook has come under increased scrutiny since the 2016 U.S. presidential elections and the Cambridge Analytica scandal, it has invested significant resources towards convincing regulators that it should be able to continue to self-regulate, rather than address the obvious threats its business model poses to user privacy and more fundamentally for democracy. Facebook’s assertions that the platform now prioritizes meaningful interactions is one such method of convincing regulators that Facebook is a safe and well-meaning platform.
An earlier iteration of the Facebook News Feed deployed an algorithm known as EdgeRank. EdgeRank was used to determine which stories should appear in a user’s News Feed based on three signals: affinity score, edge weight, and time decay.8 Posts that had the highest EdgeRank score would appear at the top of a user’s News Feed. Because each user had a different affinity score, each user also had a different EdgeRank score. These scores were not public. Around 2011, as the News Feed algorithm evolved, EdgeRank was retired and the signals it relied on were incorporated into newer versions of the News Feed algorithmic system.9
During this time, the Facebook News Feed garnered a reputation for looking and operating like a tabloid, since it heavily prioritized popular posts and advertisements. Additionally, the platform came under heavy criticism for promoting addictive practices that sought to maximize the amount of time users spent on the platform. This was because, at the time, how much time users spent on a platform was indicative of a platform’s popularity and success.10
In January 2018, however, Zuckerberg announced that the News Feed was going to be altered so that it prioritized “more meaningful social interactions” with friends and family over content produced by businesses and brands.11 Facebook stated that this shift would place less emphasis on posts that are popular and rather place more of an emphasis on “authentic” posts that encourage and receive significant engagement from a user and their network.12 The new News Feed is reportedly based on three core principles. The first is that Facebook users value meaningful and informative stories, the second is that they value accurate and authentic content, and the third is that they value principles that guide safe and respectful behavior.13
With this new News Feed, Facebook claimed that it hoped users would spend quality time on the platform, rather than more time.14 However, it could be inferred that if a user is engaging with more meaningful and relevant content on the platform, they would also spend more time on the platform.15 One year after the new News Feed algorithm was launched, a report by social media engagement tracking firm NewsWhip found that the platform had seen increased levels of engagement as well as greater amounts of content being posted and engaged with by friends and family.16
The Facebook News Feed algorithm goes through four stages in order to identify stories, rank them, and produce a tailored News Feed experience for each of its users.17
- Inventory: The algorithm takes an inventory of what stories have been posted by a user’s friends and Pages a user follows. This is important to assess, as each News Feed is largely composed of content shared by a user’s connections.
- Signals that inform ranking: The algorithm then evaluates each story using hundreds of thousands of signals. These signals include who posted a story and when it was posted, as well as more granular factors such as the time of day, and how fast a user’s internet connection is. This is particularly important for users with slower connections who can’t properly load certain forms of content.
- Predictions: The News Feed algorithm utilizes machine learning in order to extract insights from a user’s past activity. These insights are used to predict how likely a user is to engage with a post, which is a metric for whether the user finds a post meaningful.18 Some of the predictions the algorithm seeks to make include how likely a user is to comment on a story, how likely they are to spend time reading the story, and whether they would watch an entire video. It also makes some qualitative predictions such as how likely a user is to say that they found a story informative. While such predictions may be able to deliver relevant content to users in the short term, it presumes that a user’s behavior and interests will remain constant over time. It therefore can also result in the creation of a filter bubble and prevent users from engaging with new content that matches their potentially expanding interests.
- Relevancy score: The News Feed algorithm then uses all of the signals and insights at its disposal to calculate a relevancy score. These signals are used to calculate a range of probabilities, including the likelihood a user clicks on a story, the likelihood a user is to spend time on a story, the likelihood a user engages with a story through likes, comments, and shares, the likelihood a user will find a story informative, the likelihood a story is click-bait (posts that aggressively seek out likes and engagement), and the likelihood that a story links to a low-quality website. These predictions are compiled into a relevancy score, which is an overall prediction on how meaningful a given story is for a user. Facebook calculates a relevancy score for every story and all of a user’s connections every time a user opens their News Feed.
Facebook’s News Feed algorithm is influenced by hundreds of thousands of signals that work to identify and rank content for each user. According to Facebook, these signals seek to prioritize content that reflects the three News Feed pillars, and they each represent a distinct data point that Facebook’s News Feed algorithm considers and processes when ranking content. These signals can be explicit, such as likes, or implicit, such as the time a user spends on a page before returning to the News Feed.19 A study on Facebook’s News Feed Algorithm, based on information reflected in Facebook’s “News Feed FYI” blog, sorted some publicly disclosed signals into six categories:20
- Content signals: Content signals are factors that demonstrate how stories differ from one another. They include the format of a story (such as a link, video, or photo); the number of likes, comments, or reactions a story receives; and which friend or Page posted the story.
- Source signals: Source signals are characteristics a user or Page demonstrates when they publish a post. These include the history of the Page and how often a Page has posted stories with click-bait headlines.
- Audience signals: Audience signals are characteristics a user demonstrates when they consume a post, and they often reflect patterns in content consumption. These include how often a user uses the “hide” feature to remove content from their News Feed and how often a user watches videos, in part or in their entirety, rather than scrolling past them.
- Action signals: Action signals represent a user’s behavior when it comes to a specific story. These signals include whether a user likes, clicks on, or engages with a specific story, and the amount of time a user spends reading a story or watching a video. The News Feed algorithm prioritizes active interactions, such as commenting and sharing a post, rather than passive interactions such as liking posts and click-throughs. This is based on the notion that active interactions require more effort and are therefore indicative of meaningful interactions.21 The algorithm also tends to favor posts with comments and replies to comments, as they indicate meaningful interactions via conversations.22
- Relationship signals: Relationship signals are data points collected about the relationship between two users or Pages on the platform. These include how often two users engage with one another, and whether a user decides to unfollow a friend, Page, or Group.
- Likelihood signals: Likelihood signals are the probabilities the Facebook News Feed algorithm calculates around how a user will interact with a post. They include the probability a user will like or comment on a story. These likelihood signals are compiled to determine how posts are ranked in the News Feed.
Facebook also provides temporary boosts to content known as “timely posts”. These are popular posts or news that are currently being discussed.23
According to Facebook, the platform also sought to prioritize “meaningful interactions” so that it could promote high-quality content in News Feeds and curb the spread of low-quality content such as spam, posts that are unverified, click-bait posts, and posts that seek to spread misinformation.24 However, as previously mentioned, such efforts also aim to increase the amount of time that users spend on the platform, and thus drive revenue through avenues such as advertising.
Most recently in July 2019, Facebook announced they would downrank and reduce the spread of posts that made sensationalized health claims. They would also do this for Pages aiming to sell products or services that were based on misrepresented or false health-related claims.25 Additionally, in April 2019, Facebook deployed a new tactic, called Click-Gap, in order to reduce the amount of low-quality content that users see in their News Feed. A number of low-quality websites receive significant traffic from the Facebook platform. In order to tackle this, Facebook systems crawl and index the internet in order to identify such websites. They then downrank low-quality posts that link to these websites. This is based on the notion that such sites are relying on platforms like Facebook to drive views, and by doing this Facebook can stifle their efforts. This approach is similar to the approach Google’s PageRank algorithm used to rank results when it first launched. PageRank determined how high to rank a search result based on the number of websites and the quality of websites that linked to a given web page.26 Because visibility drives impact, posts that are viewed less have less of an impact.
Facebook also uses a range of user-driven metrics in order to assess the quality of a post. These include whether users hide certain posts, and whether users report posts as spam in order to assess the quality of posts. According to Facebook, in this system, posts that are shared and engaged organically will rank higher on user’s News Feeds.27 However, the same March 2019 NewsWhip report that found that the algorithmic changes to the News Feed had increased engagement also found that the changes had not succeeded in adequately tackling the spread of misinformation and low-quality content. 28
Facebook has asserted that its current algorithmic curation and ranking model prioritizes meaningful interactions on the platform. However, this algorithmic system is also responsible for managing information flows and online speech, and presenting users with a certain experience based on the platform’s understanding of a user’s interests. This creates an opportunity to promote certain voices above others, and silence certain voices entirely. This may mean that the voices of dominant social and political groups will be amplified and the voices of disproportionately targeted and already marginalized groups will be silenced. These algorithmic practices are also directly responsible for influencing how users perceive their network and the world around them. As a result, greater transparency and accountability around Facebook’s News Feed algorithmic curation and ranking practices are needed.
In March 2019, Facebook introduced a “Why am I seeing this post?” feature in the News Feed. This feature, which is found in the right hand corner of each News Feed post, explains to users why the user is seeing a certain post (e.g. if it was posted by a friend or a Page the user follows, whether it is highly popular,29 etc.), how their past activity (such as whether they regularly watch videos or click on shared links)30 has informed the ranking of the posts in their News Feed, and what other factors typically influence the ranking of posts in the News Feed. This feature also provides users with access to controls that let them edit their News Feed and privacy preferences.31 The News Feed preference controls enable users to select whose posts they see first, unfollow users or groups in order to hide their posts, reconnect with users or groups in order to see their posts again, manage snooze settings on certain users or groups, and hide apps from the News Feed. In addition, through another News Feed control tab on the left hand side of the News Feed page, users can opt to view posts in a reverse-chronological order, rather than through the lens of the algorithmic curation and ranking filters. The default setting that the News Feed will always revert to, however, is the algorithmically curated and ranking mode, known as “Top Stories.” In this way, Facebook enables users to control their News Feed experience to a degree. However, it does not give them the option to opt out of the algorithmically personalized experience entirely, and the algorithmically personalized News Feed is the default option.
Although Facebook has shared some information about the signals it uses to rank content in a user’s News Feed, it has not provided a comprehensive overview of the range of signals used, and how these signals collectively work together in the News Feed algorithm to determine the ranking of posts. It also does not explain how different signals are weighted.32 Without greater transparency and accountability around how the platform is deploying these signals in their algorithmic curation and ranking practices, and around how these signals work together, users and publishers are unable to properly understand how their experience is being curated and how this can impact their worldview. They are also unable to understand exactly which characteristics of a post, user, or their network are prioritized during this curation and ranking process. This raises concerns that this algorithmic system can establish filter bubbles on the platform.
In addition, individual voices or communities that are suppressed by the algorithm are often left unable to understand why. This raises a number of concerns regarding algorithmic bias, and the extent to which algorithms reflect and exacerbate the judgments, priorities, and preferences of their creators and society at large. Given the limited set of user controls over the News Feed, impacted users are unable to effectively mitigate this situation. However, it is difficult to imagine a set of user controls that could effectively mitigate the issue of algorithmic bias. Given the black box nature of much algorithmic decision-making, developers and users may not be aware of any systematic biases in an algorithm.
Facebook does not currently offer an appeals process or channel for its News Feed curation and ranking efforts. Such a process could help remedy individual cases, even though it would not help to remedy systemic instances of bias.
Case Study: Twitter
Twitter is a microblogging and social network platform founded in 2006. It enables users to post messages known as “tweets” and has gained a reputation as a destination for live updates regarding news, politics, sports, and other current events. As of February 2019, the platform had 126 million daily users and33 it ranks 20th for global internet engagement on Alexa rankings.34
Originally, Twitter’s news feed—known as a “timeline”—was not algorithmically curated. Rather, content on a user’s timeline was presented in reverse-chronological order. In 2015, Twitter introduced a feature known as “While you were away”35 (later rebranded as “In case you missed it”),36 which aimed to curate notable recent tweets that a user may have missed while they were not using the platform.37 In 2016, Twitter introduced algorithmic curation into its timeline. This was an extension of the “While you were away” feature,38 as it used the same algorithms. According to Twitter, it was designed to deliver users with the most relevant and useful tweets, rather than the most recent ones.39 Twitter has asserted that both of these features were based on the notion that a tweet from a few hours ago may be more relevant and meaningful to a user than one that was posted five minutes ago. When using the reverse-chronological curation format, a user would miss out on such content.40 As per the new timeline feature, algorithmically curated tweets appeared at the top of a user’s timeline. However, these algorithmically curated tweets are a small subset of the tweets that have been posted since a user last visited the platform.41 As a result, if a user continued to scroll through the timeline, they would eventually begin seeing tweets in a reverse-chronological format.42
Twitter’s decision to roll out an algorithmically curated timeline was met with some backlash. Many users were concerned that by introducing this feature, the platform was stifling the public square characteristics of the platform, as what content was relevant and meaningful would now be determined by an algorithm. Some critics of an algorithmically curated timeline have advocated for a reverse-chronological feed, as this is perceived as a neutral presentation of content. In a reverse-chronological timeline, hashtags would play a strong role in promoting and highlighting conversations and virality would be organic.43 In this sense, critics contend, Twitter could offer a democratic public square on its platform.44 Although the timeline algorithm could be used to surface content that is broadly considered relevant (such as headline news), it cannot reliably surface unexpected and diverse content that is also relevant, like an organically run public square platform could, as it makes judgments based on past user behavior.45 The introduction of algorithmic curation in the timeline also faced backlash as it made users feel as if they had less control over their experience on the platform, and raised concerns over the creation of filter bubbles.46
These concerns sparked the hashtag #RIPTwitter in early 2016.47 When the new timeline feature rolled out, users had the option to opt out in a limited sense. They could choose to not see Top Tweets at the top of their timeline, but they would still receive curated tweets in other sections of their timeline.48 In response to the outcry, in September 2018, Twitter enabled users to toggle between algorithmically-curated Top Tweets and non-curated, reverse-chronological ordered Latest Tweets.49 Despite the public backlash, fewer than 2 percent of users opted out of algorithmic curation on the platform, which became the default option.50 However, the fact that only 2 percent of users opted out of algorithmic curation does not necessarily mean that users were not opposed to algorithmic curation. Rather, because algorithmic curation became the default, opting out required an extra step, and many users may not have wanted to engage in a more time consuming process or known how to do so.
Despite the controversy, Twitter has insisted that its research indicates individuals have a more positive experience on the platform when engaging with Top Tweets first.51 In a test on the beta version of the algorithmically curated timeline, which was performed on over 100 users and brands, the platform found that individuals tweeted and retweeted more often than they did when using a non-algorithmically curated timeline.52 This, however, assumes that greater engagement is synonymous with a positive user experience. This is not necessarily true. However, greater engagement does drive greater revenue for the platform, which may be a reason it advocated strongly for the algorithmically curated version of the timeline.
The Twitter timeline can consist of numerous sections.53
Top Tweets: The Top Tweets on a user’s timeline are algorithmically curated and ranked using a range of signals. This often also includes tweets from accounts that a user does not follow but may be interested in.
Latest tweets: If a user opts out of algorithmic curation, they will view a reverse-chronological feed of the latest tweets.
In case you missed it: If a user is visiting the Twitter app less frequently, they will see this algorithmically-curated selection of Top Tweets. A user typically only sees this feature in their timeline feed if they have not visited the platform for a number of hours or days. The tweets in this section are less recent, and do not appear in reverse-chronological order. Rather, they are organized based on their ranking scores. As a result, the tweet at the top of this section is the tweet that has the highest ranking score out of all possible tweets from every account a user follows since the last time they logged in.54
Happening now: This section occasionally appears at the top of a user’s timeline and it highlights specific events or subjects of interest. This was originally introduced to focus on sports events and was later expanded to include breaking and personalized news.55
Trends for you: This algorithmically-curated section highlights popular trends and hashtags based on a user’s interests (as explained below). Users can also choose to have this content curated based on their location.
When a user opens Twitter, the platform collects and assesses every recent tweet from every account that a user follows and assigns each one a relevance score. This score aims to predict what content a user will find interesting.56 It is based on a range of factors, including the number of favorites and retweets a tweet has received, and how often the user has engaged with a particular account recently. Simultaneously, Twitter’s algorithm considers a range of other signals, such as how long a user has been away from the platform, how many accounts a user follows, and how a user behaves and uses Twitter, in order to determine how the relevance scores will impact the content on the user’s timeline.57 Content is then ranked based on a series of signals which assess how popular a tweet is and how accounts in a user’s network are engaging with it.58 These signals include:59
Recency: How recently a tweet was posted.
Overall engagement: How many retweets, clicks, favorites, and impressions a tweet has garnered. This signal also considers how much time users have spent reading the tweet.60
Engagement relative to other tweets from the same user: How often users engage with the posting user through active engagements and impressions.
Rich media: The type of media that the tweet includes such as images, videos, GIFS, and polls.
The types of media users typically engage with: If a user typically engages with a specific type of content, such as photos or videos, then they are more likely to see tweets that contain these media formats.
Account engagement and interactions: How often a user engages with a particular author or account, the strength of the user’s connection to this account, the origin of this relationship,61 and how much time a user spends reading tweets posted by this author, even if they do not engage with them.62
Signals such as account interactions, engagement, user interest, network activity,63 how long a user has been away from the site, how many followers an account has, and the account’s location relative to users also play a role in how content is curated and ranked on the Twitter timeline.64 Today, deep learning is the central modelling component in timeline ranking.65
Like Facebook, Twitter asserts that it has updated its ranking algorithm in order to improve the “health” of conversations on its platform. In 2018, this was done in order to combat instances of trolling, harassment, and abuse. The new algorithmic system uses behavioral signals in order to assess whether a Twitter account is adding to or detracting from conversations, based on how other accounts react to content. For example, if a user sends the same message to multiple users and they all block or mute the sender, this will suggest that the sender is detracting from conversations. If the recipients reply to or “heart” the messages, however, this suggests that the sender is positively contributing with interactions. The algorithm also considers signals such as whether an account has a confirmed email address and whether an account appears to be leading a coordinated attack. Those tweets that are identified as detracting from conversation will be deprioritized in the timeline and will therefore appear lower in search results or replies.66
As demonstrated by the backlash against Twitter’s decision to implement an algorithmically curated and ranked timeline, users and publishers have expressed concerns over how such a system manages online expression and creates and reinforces certain perspectives based on what it thinks a user is interested in. Like Facebook, Twitter also fails to provide significant transparency and accountability around its algorithmic curation and ranking practices.
Twitter provides its users with a range of limited controls over their timeline experience. These include the ability to unfollow, mute, and block certain accounts. Users can also select the “show less often” feature, which provides feedback on certain tweets to Twitter so that it can better tailor the timeline experience in the future. In addition, users can opt in to or out of permitting Twitter to personalize their experience based on their “inferred identity” and places that a user has been. As previously mentioned, users also have the option of toggling between an algorithmically curated timeline and a reverse-chronological timeline.67
There is some public information around which signals Twitter uses to curate and rank content on the timeline. However, the company has not released a comprehensive overview of these signals, how they work together to curate and rank posts, and how they are weighted. Without greater transparency and accountability around how this algorithmic curation is taking place, users and publishers are unable to fully understand and control how their worldview is being shaped and what specific characteristics the Twitter timeline algorithm is designed to prioritize. This once again raises concerns regarding the creation and reinforcement of filter bubbles.
Furthermore, given that Twitter is often viewed as a digital public square, the platform’s algorithmic curation and ranking practices raise concerns regarding which voices the algorithm determines are important and worth amplifying, and whether these determinations reflect the same values and judgments that humans would place when assessing public discourse. A lack of transparency around the signals the algorithm uses makes evaluating how the timeline algorithm impacts public discourse even more difficult. Furthermore, a lack of transparency around how the timeline algorithm operates and is constructed also raises concerns around hidden biases in the algorithm and its signals. These biases prioritize certain types of interactions and types of content over others, and can reflect the unintentional biases of their creators or the training data with which they were created. Given the limited set of user controls over the timeline, and the fact that the platform does not offer an appeals process or channel related to its timeline curation and ranking practices, users who feel as if they have been silenced have no method for recourse.
Case Study: Reddit
Reddit is a social news aggregation and discussion website that was founded in 2005. The platform has approximately 330 million monthly active users worldwide68 and is ranked 16th for global internet engagement.69 Reddit enables users who operate under pseudonyms, to create subpages, called subreddits, on specific interests or topics. In this way, the platform has become popular among particular interest or activity-focused communities, such as gamers and sports fans. This represents a significant difference between Reddit on one hand, and Facebook and Twitter on the other. Unlike its counterparts, which emphasize bilateral or multilateral relationships in which users interact on a broad range of topics and interactions between users, Reddit emphasizes users’ participation in thematic forums, or subreddits, with tangible implications for its use of content-shaping algorithms.
Reddit deploys a series of algorithms in order to rank posts and comments on its home page feeds as well as on each individual subreddit. The code for these algorithms is open-source and available publicly online.70 This ranking system is largely influenced by the platform’s user-driven voting system.71 All Reddit users who are logged in can vote on links and comments in order to indicate their meaningfulness and usefulness. In this system, an upvote indicates that a user finds content interesting and relevant, and a downvote suggests that the user finds the content uninteresting, off-topic, or otherwise not meaningful.72 Links and comments that receive a significant number of upvotes will appear higher on the website’s front page or on the front page of a given subreddit. Each link and comment on the platform are assigned a number of points, known as a score, which loosely corresponds to the difference in the number of upvotes it has received and the number of downvotes it has received. The exact calculations of this figure are kept hidden, however, in order to prevent spammers and other actors with negative intentions from gaming the system.73 On the platform, comments are by default sorted using the “best” comments filter. As a result, comments with the highest number of upvotes are likely to be viewed more often.74 In addition, the posts with a significant number of comments are also typically ranked higher than others. This suggests an element of democracy on the platform, as Reddit seeks to rank the content that users engage with—and therefore value the most—the highest. 75
The score that a post or comment receives translates into “karma” points for the posting user. Karma is an informal user ranking on Reddit measuring how much users value a particular account’s contributions to the Reddit community.76 A user who frequently contributes high-ranking posts or comments will build a high karma score denoting their total net-positive impact on the site. This system is not foolproof, however. A user can easily gain karma points by reposting popular content across multiple subreddits and by posting content that aligns with the general mentality and values of a certain subreddit or the platform as a whole.77 Additionally, as a user gains more karma points, or as a post or comment gains more upvotes or downvotes, it can spark a bandwagon response in which other users vote in line with the general trend. In order to prevent this, some subreddits hide karma totals for certain periods. However, this is not a complete solution.78
Reddit deploys different algorithmic approaches when ranking posts and comments. When a user logs in, they can choose to view content on their homepage feed using a range of algorithmically-curated options. These curation options sort content into categories: best, hot, new, controversial, top, and rising.79 According to a 2015 blog post by Amir Salihefendic, the CEO of Doist who has conducted significant research on Reddit’s algorithmic ranking practices, posts on Reddit under the “hot” category are ranked using an algorithm known as the “hot ranking algorithm.” This algorithm is impacted by signals including:80
- Submission time: The time at which a post was submitted is a major factor influencing how a post ranks on Reddit. The hot ranking algorithm ranks new stories higher than older stories.
- The logarithm scale: The hot ranking algorithm uses the logarithm function to weigh the earlier votes higher than later ones. This means that generally, the first ten upvotes that a post receives will have the same weight as the next 100 upvotes. These 100 upvotes will in turn have the same weight as the next 1,000 upvotes, and so on. Therefore, a post that has 10 very recent upvotes and a post that has 50 older upvotes could rank similarly on the platform.
- Downvotes: Reddit is one of the few platforms on the internet that deploys a downvote feature. Posts that get a large number of upvotes and downvotes, as well as that get a large number of downvotes, will therefore rank lower on news feeds. This particularly impacts content that is controversial.
Reddit’s comment ranking algorithm was theorized by Randall Munroe, an American cartoonist, engineer, and scientific theorist. He argued that the hot ranking algorithm would not be suitably applicable for ranking comments on the platform, as it would preference comments that were posted more recently, rather than the comments that were considered the most meaningful. The solution to this was to deploy Wilson’s Score Interval, which uses a confidence sort to treat a vote count on a comment as a statistical sample of a hypothetical full vote by opinion, similar to an opinion poll. This system provides each comment with a provisional ranking, that it is 85 percent sure the comment will reach. The more votes that a comment receives, the closer its score gets to this 85 percent confidence estimate. This system helps ensure that if a comment has only one upvote and zero downvotes, it will retain a 100 percent upvote rate. However, because the system does not have enough data on this comment, it will be ranked lower. If a comment received ten upvotes and only one downvote, on the other hand, the system could accrue enough confidence to place this comment above something with 40 upvotes and 20 downvotes, as it ascertains that by the time this first post gets 40 upvotes, it would have fewer than 20 downvotes. If the system is wrong, which it is 15 percent of the time, then it will work to get more data so that comments with less data are ranked lower. The confidence sort in this system is not impacted by submission time, but rather it is impacted by how many upvotes a comment receives compared to the total number of votes and the sample size. The more votes a comment gets, the more accurate its confidence score is.81 However, when subreddits have a large number of posts, it is likely that most people simply read the comments in the “best” section and vote on them. This prevents other comments from gaining traction and82 can create a preference toward these pieces of content.83 Users can also feel persuaded to vote for already popular content, due to a herd mentality.84 Additionally, users can create multiple accounts in an attempt to rig the voting system.85
Reddit’s algorithms also come into play when curating content for /r/all, which is the home page that non-logged-in users see. When a user first creates an account on Reddit, they are subscribed to a list of default subreddits that aim to highlight the range of communities, interests, and genres of content available on the platform.86 Once a user has an account, they can curate their own home page feeds by subscribing to subreddits of interest to them, and unsubscribing from default subreddits if they prefer. However, users who prefer to not create an account are unable to pick and choose which specific subreddits they engage with. For these users, the Reddit homepage displays the /r/all page that contains algorithmically curated content from a range of subreddits on the platform in order to demonstrate the breadth of popular content available on the service.87 The algorithm used to sort this page tends to highlight material across subreddits that is new and has been upvoted a lot.
However, for its content to make it to the /r/all page, a subreddit often already has to have a large subscriber base capable of driving a large score. The default subreddits that users are automatically subscribed to are examples of these. However, by privileging these default subreddits, other, organically created and operated subreddits are less frequently highlighted. This can silence certain voices and render certain communities and interest groups invisible to non-logged in users of the service. However, subreddit moderators can voluntarily opt out of this curation as well. This is often done if moderators feel that their content is controversial or not fit for public consumption (also known as “not safe for work” or NSFW).88 In this way, the opt-out feature can act as a privacy mechanism.
Reddit’s algorithmic curation and ranking systems seek to prioritize and deliver relevant and meaningful content to its users. Like on Facebook and Twitter, this raises a host of concerns regarding which voices are prioritized and how these voices are identified. When it comes to providing transparency and accountability around its algorithmic curation and ranking practices, Reddit offers some novel approaches, but also fails to adopt some existing practices that are gradually becoming more common across the industry.
Reddit’s approach to transparency is novel, in that it publishes the code for its ranking algorithms publicly online in an open-source format. This enables users, researchers, and publishers to better understand how content is tailored and ranked on the platform, what characteristics Reddit’s algorithms preference, and how this may impact a user’s worldview. This is one way of potentially revealing the existence of a filter bubble, as well. However, in order to effectively use this resource and extract valuable insights from it, an individual would have to have a relatively extensive technical background. Therefore, although the platform provides some valuable information on its curation and ranking practices, there are barriers to accessing and understanding it. Reddit also does not have a company-issued page explaining to its users how the algorithmic curation and ranking system works. As a result, most public information about Reddit’s ranking system and the signals and processes it uses are based on research or speculation, rather than company-verified information. Therefore, Reddit should publish information in language that is accessible to individuals who lack a technical background, as well as the general public. This will provide greater transparency and accountability around how Reddit curates and ranks content across its home page and subreddits.
Because content on Reddit is curated and ranked primarily on the basis of user votes, users do not have as many additional controls over their news feed experiences on the platform. Aside from voting on content, Reddit users can hide posts on the front page news feed and in subreddits. They can also sort content using a range of filters, including the “new” category, which filters content in a reverse-chronological order. Aside from this, however, users do not have any significant further controls, as the assumption is that user votes are representative of what users find interesting and meaningful, and the Reddit algorithm curates and ranks based on this. Like Facebook and Twitter, Reddit does not offer an appeals process or channel for its news feed curation and ranking efforts.
Citations
- Dan Noyes, "The Top 20 Valuable Facebook Statistics – Updated July 2019," Zephoria Digital Marketing, source
- "Facebook.com Competitive Analysis, Marketing Mix and Traffic," Alexa Internet, source
- Victor Luckerson, "Here's How Facebook's News Feed Actually Works," TIME, July 9, 2015,, source
- Michael Arrington, "Facebook Users Revolt, Facebook Replies," TechCrunch, September 6, 2006, source
- Luckerson, "Here's How Facebook's News Feed Actually Works".
- Amy Gesenhues, "Facebook Ad Revenue Tops $16.6 Billion, Driven by Instagram, Stories," Martech, last modified January 31, 2019, source
- Josh Constine, "Zuckerberg Answers Big Questions About Facebook, Forced Downloads Of Messenger, And Page Reach," TechCrunch, November 6, 2014, source
- Affinity score can be determined based on how friendly one user is with another user. This is based on the time a user spends interacting with or looking at another user’s profile. For example, the more time User A spends interacting with User B, the more likely Facebook is to show User A profile updates from User B. Edge weight can be understood as relative weight of the content a user sees. For example, relationship status updates are weighted highly, as this is considered an update that a user’s friends and network would generally be very interested in. The time decay signal provides greater weight to new content compared to old content.
- Sarah Shirazyan, e-mail message to author, August 21, 2019.
- Fred Vogelstein, "Facebook Tweaks Newsfeed to Favor Content from Friends, Family," WIRED, January 11, 2018, source
- Shannon Connellan, "Facebook Will Give You More Info About Why Certain Posts Show Up In Your News Feed," Mashable, March 31, 2019, source
- Vogelstein, "Facebook Tweaks Newsfeed to Favor Content from Friends, Family"
- Facebook, "How News Feed Works," Publish News Feed, source
- Vogelstein, "Facebook Tweaks Newsfeed to Favor Content from Friends, Family"
- Abhinav Sharma, "Your Social Media News Feed And The Algorithms That Drive It," Forbes, May 15, 2017, source
- Connellan, "Facebook Will Give You More Info About Why Certain Posts Show Up In Your News Feed”.
- Facebook, "How News," Publish News Feed.
- Sharma, "Your Social Media News Feed And The Algorithms That Drive It".
- Sharma, "Your Social Media News Feed And The Algorithms That Drive It".
- Explaining the News Feed Algorithm: An Analysis of the “News Feed FYI” Blog
- Shannon Tien, "How the Facebook Algorithm Works and How to Make it Work For You," Hootsuite, last modified April 25, 2018, source
- Tien, "How the Facebook," Hootsuite.
- Sean Si, "Facebook Is Updating Their News Feed Ranking Algorithm," SEO Hacker, source
- Si, "Facebook Is Updating," SEO Hacker.
- Sarah Perez, "Facebook News Feed Changes Downrank Misleading Health Info and Dangerous 'Cures,'" TechCrunch, July 2, 2019, source
- Salvador Rodriguez, "Facebook Is Taking A Page Out Of Google's Playbook To Stop Fake News From Going Viral," CNBC, April 10, 2019, source
- Si, "Facebook Is Updating," SEO Hacker.
- Connellan, "Facebook Will Give You More Info About Why Certain Posts Show Up In Your News Feed”.
- Connellan, "Facebook Will Give You More Info About Why Certain Posts Show Up In Your News Feed”.
- Connellan, "Facebook Will Give You More Info About Why Certain Posts Show Up In Your News Feed”.
- Ramya Sethuraman, "Why Am I Seeing This? We Have an Answer for You," Facebook Newsroom, last modified March 31, 2019, source
- Explaining the News Feed Algorithm: An Analysis of the “News Feed FYI” Blog
- Jacob Kastrenake, "Twitter Keeps Losing Monthly Users, So It's Going To Stop Sharing How Many," The Verge, February 7, 2019, source
- "Twitter.com Competitive Analysis, Marketing Mix and Traffic," Alexa Internet, source
- Marty Swant, "Twitter Starts Using an Algorithm to Curate Users' Timelines," AdWeek, February 10, 2016, source
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Katie Sehl, "How the Twitter Algorithm Works in 2019 and How to Make it Work for You," Hootsuite Blog, entry posted February 20, 2019, source
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Sehl, "How the Twitter," Hootsuite Blog.
- Evan Niu, "Snap Is About to Embrace Algorithmic Curation," The Motley Fool, last modified November 9, 2017, source
- Will Oremus, "Twitter's New Order," Slate, March 5, 2017, source
- Niu, "Snap Is About to Embrace Algorithmic Curation," The Motley Fool.
- Aja Romano, "At Long Last, Twitter Brought Back Chronological Timelines. Here's Why They're So Beloved.," Vox, September 20, 2018, source
- Zeynep Tufekci, "Why Twitter Should Not Algorithmically Curate the Timeline," The Message, last modified September 4, 2014, source
- Tufekci, "Why Twitter Should Not Algorithmically Curate the Timeline," The Message.
- Oremus, "Twitter's New Order".
- Oremus, "Twitter's New Order".
- Sehl, "How the Twitter," Hootsuite Blog.
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Oremus, "Twitter's New Order".
- Romano, "At Long Last Brought Back Chronological Timelines. Here's Why They're So Beloved.".
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Sehl, "How the Twitter," Hootsuite Blog.
- Oremus, "Twitter's New Order".
- Keith Coleman, "See What's Happening!," Twitter Blog, last modified June 13, 2018, source
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Oremus, "Twitter's New Order".
- Twitter, "About Your Twitter Timeline," Twitter Help Center, source
- Sehl, "How the Twitter," Hootsuite Blog.
- Oremus, "Twitter's New Order".
- Nicolas Koumchatzky and Anton Andryeyev, "Using Deep Learning at Scale in Twitter's Timelines," Twitter Blog, last modified May 9, 2017, source
- Oremus, "Twitter's New Order".
- Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines".
- Sehl, "How the Twitter," Hootsuite Blog.
- Koumchatzky and Andryeyev, "Using Deep," Twitter Blog.
- Julia Carrie Wong, "Twitter Announces Global Change to Algorithm in Effort to Tackle Harassment," Guardian, May 15, 2018, source
- Twitter Support, "Never miss important Tweets from people you know," Twitter, September 17, 2018, 7:58 pm, source
- Lauren Feiner, "Reddit Users Are The Least Valuable Of Any Social Network," CNBC, February 11, 2019, source
- Alexa, "reddit.com Competitive Analysis, Marketing Mix and Traffic," Alexa, source
- Emily van der Nagel, "'Networks That Work Too Well': Intervening in Algorithmic Connections," Media International Australia, Incorporating Culture & Policy 168, no. 1 (August 2018).
- James Grimmelmann, "The Virtues of Moderation," Yale Journal of Law and Technology 17, no. 1 (2015): source
- Adrienne Massanari, "#Gamergate and The Fappening: How Reddit's Algorithm, Governance, and Culture Support Toxic Technocultures," New Media & Society 19, no. 3 (October 2015): source
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Amir Salihefendic, "How Reddit Ranking Algorithms Work," Hacking and Gonzo, last modified December 8, 2015, source orithms-work-ef111e33d0d9
- Salihefendic, "How Reddit," Hacking and Gonzo.
- Maria Glenski and Tim Weninger, Predicting User-Interactions on Reddit, July 1, 2017, source
- Glenski and Weninger, Predicting User-Interactions on Reddit.
- Bozdag, "Bias in Algorithmic Filtering and Personalization”.
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".
- Massanari, "#Gamergate and The Fappening".