Algorithmic Tools Used in Criminal Justice, Education, and Employment Are Powered by Personal Data

Machine learning and other algorithmic tools are increasingly used to replace or augment human decision-making. A 2019 survey conducted by New Vantage Partners showed that 91.6 percent of the Fortune 1000 executives surveyed were increasing their big data and AI investments over the previous year.¹ Algorithms powered by vast quantities of personal data are now commonly used to make decisions about numerous aspects of people’s lives. OTI’s event and this report focus on criminal justice, education, and employment, but the widespread use of automated tools also affects many other sectors, including credit and housing. Both the data used to train algorithms and the data collected and used to create outputs present privacy and equity issues. The outputs from machine learning algorithms are created from the underlying data, the coded instructions, and the algorithms’ own learnings. Even when their creators did not intend to discriminate against certain groups, algorithmic tools can reinforce historical discrimination if the training data used to build the AI reflects biases.² After the AI is trained, the tools can collect and store sensitive data—such as gender, race, and medical conditions—that can increase the scope of personal information used to make decisions for or about an individual.³

Algorithmic systems can harm individual privacy, both in the collection required to build the system and after the system is built. AI systems based on statistical models require large data sets to function, and there are privacy risks inherent in collecting this volume of data on individuals. Algorithms reliant on machine learning constantly require more data to train the system and enable it to draw inferences. When systems require such volumes of data about people, it is likely that this increases the risk of data being obtained in a privacy-intrusive manner.⁴ During the panel, Gillmor noted that when an entity stores large data sets that it does not properly manage, this can become an “attractive nuisance” for law enforcement, immigration services, foreign hackers, or identity thieves to target.

Algorithmic Tools Used in Criminal Justice

Automated tools are widely used in the criminal justice system, and their use has generally led to inequitable outcomes.⁵ Two of the most commonly used tools are risk assessment algorithms and facial recognition tools. Risk assessment algorithms are used to estimate the likelihood of certain outcomes, such as a defendant’s chance of recidivism or failure to appear before a judge. As Albert III noted during the panel, risk assessment “can be used at really any juncture of the criminal legal process where critical decisions are made of freedom.” They are commonly used at the pre-trial stage⁶ to replace or supplement cash bail systems,⁷ during incarceration to make determinations about early release, and post-release to make decisions about probation and parole. These tools use information such as criminal history, socioeconomic status, neighborhood crime rates, and other factors to predict an individual’s potential risks, and can be used without an individual’s consent. When used to predict future criminal behavior, these tools raise grave civil liberties concerns.

OTI joined a coalition of over 100 civil rights, digital justice, and community-based organizations in a statement opposing these types of tools in the criminal justice system and condemning the use of algorithmic assessments for pretrial detention because risk assessment tools can result in racially biased outcomes that reflect patterns of historical discrimination.⁸ The statement also called for safeguards and testing requirements in cases where such tools are already in use. Black people are one-third more likely to be stopped by the police and three times more likely to be searched by the police.⁹ Algorithms that rely on police records and criminal history to predict likelihood of recidivism will perpetuate these discriminatory patterns found in the training data, and disproportionately harm Black people.¹⁰ Yet decisions from these automated decision-making systems often do not face the same amount of scrutiny as their human counterparts due to an assumption that technical solutions are inherently more accurate and objective.

The First Step Act, signed into law in 2018, requires the U.S. Attorney General to develop a “risk and needs assessment system” for the Federal Bureau of Prisons to assess each prisoner’s risk of recidivism and determine what type of recidivism reduction programming is appropriate for them.¹¹ To address this law, the U.S. Attorney General William P. Barr and the United States Department of Justice (DOJ) created the Prisoner Assessment Tool Targeting Estimated Risk and Need (PATTERN). As of January 2020, the DOJ announced that incarcerated individuals will be assigned into recidivism reduction programs and other activities based on their assessment scores from PATTERN.¹² Those individuals who participate or complete these programs can be placed in pre-release custody or receive sentence reductions, indicating how important scores from the risk assessment tools can be for prisoners. Another example of this occurred earlier this year when the Barr ordered that some federal prisoners should be released to avoid overcrowding during the COVID-19 pandemic, and that prisoners with a minimum PATTERN score should be prioritized. This requirement can favor white prisoners over Black prisoners, who are more likely to have higher risk assessment scores due to historical patterns of disparate policing practices.¹³ As Albert III noted, these risk assessment scores can exacerbate racial inequalities and should not be used to “essentially prioritize who lives and who dies” during a pandemic.

Algorithmic Tools Used in Education

Educational institutions at all levels use algorithms to make critical decisions for students, such as which curriculum they study or what resources they should receive. The different ways algorithms are used can either perpetuate discrimination or help address it. One example of this is through ability grouping, where schools divide students into different groups based on their academic ability. This practice has been used for decades, but recent changes in technology allow educational data-mining (EDM) technologies to sort through vast amounts of student data to form these groupings.¹⁴ It is particularly important to understand the repercussions of using these types of algorithms in education. These systems can determine what skills students develop, the curriculum they are taught, who their peers are, and what expectations teachers have of them.

EDM technologies can, however, have a positive influence on the education system as well. As Palmer explained, when these systems are well-designed, they can help higher education systems “reach out to the students that are vulnerable, who need their help, and who would not have otherwise come through (an advisor’s) door.” An experiment at Georgia State University found that students who received assistance through an AI outreach tool that provided guidance on the application process were more likely to enroll than their control group counterparts.¹⁵ This type of assistance can have a particularly positive effect on low-income and first-generation students who may not otherwise have outside guidance on the college application process.¹⁶

However, similar to the issues caused by algorithmic systems in criminal justice, education algorithms learn to make predictions from training datasets that include historical data and can therefore reinforce patterns of racial, gender, and socioeconomic discrimination. Schools use predictive analytics tools that have been trained on the data of past students to select which attributes best predict a student’s success and forecast whether a student will perform well. As Palmer noted, this can contribute to the underrepresentation of minority students in science, technology, engineering, and mathematics (STEM) fields. When algorithms predict that students will not be successful in a certain major, students could be discouraged from pursuing that field of study. When schools use predictive analytics tools, they are relying heavily on the data available, which reflect inequities in the education system and therefore may not accurately assess current students’ abilities. Students from a privileged background are more likely to have access to technology and be more technologically proficient compared to other students. This means that algorithms that rely on the amount of time students spend logged into an educational resource, for instance, may not accurately capture the academic ability or learning habits of a student from a low-income background if the student lacks internet access or otherwise has inadequate access to the resource.¹⁷

The use of algorithms in the education system can also pose privacy and security risks. Schools and universities collect extensive personal data on students. This can include what courses a student has taken and the clubs they are involved in, as well as a student’s home address or medical history.¹⁸ Although there is a federal law protecting student privacy, the Family Educational Rights and Privacy Act (FERPA), it has a limited scope and has not kept pace with the dramatic changes in educational technology and student data collection.¹⁹ Schools often partner with different third-party vendors that help them run their online education programs, supply predictive analytics services, and provide other technology and data support. But under FERPA, only schools are responsible for what vendors do with that data.²⁰ Both schools and vendors often lack sufficient protocols to protect student data, causing extensive security and privacy concerns. Sophisticated cyber attacks are not necessarily the most likely reason for a breach, particularly at universities, where Palmer explained that most breaches are due to the system lacking basic security practices. Since 2005, K-12 school districts, colleges, and universities in the United States have experienced over 1,300 data breaches affecting more than 24.5 million student records.²¹

Algorithmic Tools Used in Employment

A growing number of organizations are also utilizing machine learning algorithms to make employment decisions, such as who they should interview or hire. This is another important area where the use of algorithms, without proper testing and assessment tools, can perpetuate historical biases and negatively affect certain minority groups. For example, if a recruiting system makes decisions on candidates based on predicted tenure at an employer, the system may be more likely to have a disparate impact on certain protected classes. As Gillmor explained during the panel, if a system is built to simply compare candidates to people who have already done well in an organization “and the reason some people are not doing well at a company is an internally discriminatory regime, a system will pick up on that” and assess potential new candidates based on this regime, thereby reenforcing discrimination.

When employers pay for online job advertisements, internet platforms use machine learning algorithms to both target and distribute the ads.²² Facebook, like other advertising platforms, allows advertisers to select and target audiences based on demographic factors. Until 2019, the company allowed advertisers to select or exclude users from being shown advertisements based on protected characteristics, such as gender and race,²³ a practice they ended in response to lawsuits pertaining to housing, employment, and credit ads that ran on their website. Although Facebook has since removed the option to target or exclude protected classes from certain advertisements, studies have shown that their advertising algorithm may still result in racial or gender biases in the delivery of ads. Because training data for these algorithms reflect historical employment discrimination, employers who do not intend to only target certain demographics may still have their ads delivered primarily to people of a certain race or gender. For example, researchers that ran five ads for jobs in the lumber industry that tried to deliver them to a large and inclusive audience found that the ad delivery algorithm delivered to over 90 percent male users and over 70 percent white users in aggregate.²⁴ Researchers also found that the delivery of housing and employment ads on Facebook was skewed based only on the ad’s content and link.²⁵ Researchers have also found Google’s advertising algorithm perpetuated biases in employment advertising. In one study, the researchers posed as male and female users, and found that Google served ads for high-paying executive positions at a higher rate to the male profiles.²⁶ Though it is hard to confirm exactly why these types of issues are occurring without further insight into the proprietary algorithmic tools, the results of these studies show that the advertising systems are perpetuating patterns of discrimination found in the training data.

Citations

"Big Data and AI Executive Survey 2019," NewVantage Partners LLC (2019), source
See e.g., Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, Cass R Sunstein, “Discrimination in the Age of Algorithms,” Journal of Legal Analysis, 10, (April 2019), source
Cameron F. Kerry, “Protecting Privacy in an AI-driven World,” Brookings, February 10, 2020, source
“Royal Free – Google DeepMind Trial Failed to Comply with Data Protection Law,” Information Commissioner's Office, July 3, 2017, source
See e.g. “Algorithms in the Criminal Justice System: Risk Assessment Tools," EPIC, accessed August 10, 2020, source
Electronic Privacy Information Center. “Liberty At Risk: Pre-Trial Risk Assessment Tools in the U.S.” epic.org, July 2020. source.
Doyle, Colin, Chiraag Bains, and Brook Hopkins. “Bail Reform: A Guide for State and Local Policymakers.” Criminal Justice Policy Program. Harvard Law School, February 2019. source.
See e.g. “New America’s Open Technology Institute Joins Coalition Condemning Use of Algorithmic Risk Assessments for Pretrial Detention,” New America, July 30, 2018, source
Christine Kumar, “The Automated Tipster: How Implicit Bias Turns Suspicion Algorithms into BBQ Beckys”, 72 Fed. Comm. L.J. 97, (June 2020), source
See e.g. Greg Satell, Josh Sutton, “We Need AI That Is Explainable, Auditable, and Transparent,” Harvard Business Review, October 28, 2019, source
First Step Act of 2018, S. 756 (115th Cong.), source
“Department of Justice Announces Enhancements to the Risk Assessment System and Updates on First Step Act Implementation,” Department of Justice, January 15, 2020, source
See e.g. Nathan James, "Risk and Needs Assessment in the Criminal Justice System," Congressional Research Service, (October 2015), source
Yoni Har Carmel, Tammy Harel Ben-Shahar, "Reshaping Ability Grouping Through Big Data, " Vanderbilt Journal of Entertainment & Technology Law, (May 2017), source
Lindsey C. Page, Hunter Gehlbach, “How an Artificially Intelligent Virtual Assistant Helps Students Navigate the Road to College,” AERA Open, December 12, 2017, source
See e.g. Laura Falcon, “Breaking Down Barriers: First-Generation College Students and College Success,” League for Innovation in the Community College, June, 2015, source
Closing the Home Learning and Homework Gap: Innovative School and Community Wi-Fi Initiatives, New America’s Open Technology Institute, June 25, 2020, source.
Jonah Newman, “Do you know what your college is doing with your data?,” Marketplace, September 25, 2014, source
“Legislative History of Major FERPA Provisions,” U.S. Department of Education, accessed August 14, 2020, source
Tina Nazerian, “The Unintentional Ways Schools Might Be Violating FERPA, and How They Can Stay Vigilant,” EdSurge, September 12, 2018, source
Cook, Sam. “US Schools Leaked 24.5 Million Records in 1,327 Data Breaches since 2005.” Comparitech, July 1, 2020. source.
Spandana Singh, Special Delivery: How Internet Platforms Use Artificial Intelligence to Target and Deliver Ads, New America’s Open Technology Institute, February 18, 2020, source
Colin Lecher, “Facebook drops targeting options for housing, job, and credit ads after controversy,” The Verge, March 19, 2019, source
Muhammad Ali et al., "Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes," Arxiv, September 12, 2019, source
Muhammad Ali et al., "Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes," Arxiv, September 12, 2019, source
Amit Datta, Michael Carl Tschantz, and Anupam Datta, Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination, April 18, 2015, Proceedings on Privacy Enhancing Technologies, source

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

New America Fellows

From Life Itself

Cultivating Connections: Why Relationships Matter for Youth Apprenticeship

Private 5G Networks at Risk

Artificial Intelligence, Higher Education, and the Future of Knowledge

Automated Intrusion, Systemic Discrimination

Table of Contents

Algorithmic Tools Used in Criminal Justice, Education, and Employment Are Powered by Personal Data

Algorithmic Tools Used in Criminal Justice

Algorithmic Tools Used in Education

Algorithmic Tools Used in Employment

Citations

Algorithmic Tools Used in Criminal Justice, Education, and Employment Are Powered by Personal Data

Automated Intrusion, Systemic Discrimination