The History of U.S. Housing Segregation Points to the Devastating Consequences of Algorithmic Bias

Blog Post
Dec. 12, 2018

This is part of The Ethical Machine: Big ideas for designing fairer AI and algorithms, an on-going series about AI and ethics, curated by Dipayan Ghosh, a former Public Interest Technology fellow. You can see the full series on the Harvard Shorenstein Center website.

LAUREN GREENAWALT
PUBLIC INTEREST TECHNOLOGY FELLOW, NEW AMERICA

Algorithms, and algorithmic discrimination, are often presented as recent phenomena. But while big data and computational power have enabled more advanced algorithms and reduced the need for human calculation, simpler algorithms have long been used to allocate private and public goods—and not without prejudice. Indeed, as early as the 1930s, algorithms like those used by the Federal Housing Authority to grant or deny federally insured mortgages relied on significantly biased variables. There’s a strong argument to be made that this system increased residential segregation, hastened the decline of urban neighborhoods, and magnified racial inequality that still persists today.

Before we look backwards, though, it’s worth noting the two major benefits that elevating examples of historical algorithmic discrimination provides. First, surfacing cases of algorithms and algorithmic discrimination from decades in the past can help demystify these concepts. Second, showing the long-term consequences of algorithmic discrimination can provide current policymakers with lessons that may not be evident by analyzing more recent algorithms, whose effects may yet go unrecognized.

The FHA’s Use of Algorithms

After the banking crisis of the 1930s, the National Housing Act of 1934 established a Federal Housing Administration (FHA) and tasked it with, among other assignments, insuring privately issued mortgages [1]. FHA-secured loans had more favorable terms than loans available before the creation of the administration: the FHA required that all federally insured loans be fully self-amortizing over a repayment period of at least 25 years, whereas prior loans had shorter repayment periods and typically left the borrower with an outstanding balance on the house. The security afforded by federally insured loans also allowed lenders to lower their required down payments and interest rates, making for far more affordable mortgages. This was true both for the individual mortgages that the FHA insured, as well as for loans made to proprietors of multi-family developments. For those who received FHA-backed loans, “It often became cheaper to buy than to rent” [2].

The Housing Act, however, required that the FHA only insure “economically sound” mortgages [3]. Given that the FHA insured loans throughout the country, the administration needed to provide clear standards for local underwriting staff and contractors to determine if a mortgage was economically sound. To standardize these decisions, the FHA outlined a risk-rating approach in an Underwriting Manual that was distributed to all FHA staff and contractors. The Manual explained the following:

In order to secure uniformity and consistency in decisions, the risk-rating system prescribes that the elements of risk shall be treated by inter-related groups and then integrated into a final result according to a specified procedure. Adherence to the procedure is mandatory. [4]

In other words, the FHA created algorithms to rate the risk of mortgages and required that staff use these algorithms to determine which mortgages would be insured by the federal government.

The FHA likely did not think of its mortgage-insuring system as algorithm-based. However, a detailed look at the risk-rating system shows clear parallels to algorithms used today. Like modern algorithms, the FHA algorithms gathered and weighted a variety of inputs, and returned a score based on those inputs. Similar to current algorithms, the score generated by the FHA had a direct impact on who gained access to a hugely valuable good.

The FHA outlined 28 “features” that it considered “the most important ratable elements of risk in the making of a mortgage loan on a dwelling property.” These features were divided into four categories: the property, the location, the borrower, and the mortgage pattern [5].

Local FHA staff and contractors were required to fill out grids (like the one below) in order to rate the property, location, and the borrower. Staff rated each “feature” or variable on a scale of one to five according to instructions provided in the Underwriting Manual. Each variable was assigned a designated weight; the small numbers in the top left of the boxes represented the weighted score. FHA staff carried the weighted score into the “rating” column and summed the rows to produce a total score, or rating, for the category [6].

1.png

The mortgage pattern rating grid yielded a total mortgage score based on information from the three categories: property, borrower, and location. The Underwriting Manual instructed staff members to transcribe the scores of the three other categories onto this card, as well as specified mortgage information. As in the other grids, the table assigned a weight to each feature score, which staff members summed to produce a total rating [7].

2.png

This prescribed system, or algorithm, provided three opportunities for loans to be automatically rejected. First, loans were rejected if staff members determined that any of the 28 features deserved a score less than one. To take an example from the borrower category, FHA examiners were instructed to mark an “X” under the “reject” column if a “borrower’s reputation is . . . so questionable that the undue risk would be involved in insuring a mortgage loan.” Second, loans were rejected if any of the three category scoring grids yielded a score of less than fifty percent [8]. Finally, loans were rejected if the total rating, as calculated on the mortgage pattern grid, fell below 50 percent [9]. With each opportunity for rejection, the harms of biased variables compounded.

FHA Algorithms’ Anti-Urban, Anti-Integration Biases

The described algorithms and corresponding guidelines for automatic rejection appear objective and innocuous. However, certain variables considered in the rating of the location and the rating of property created systematic disadvantages for mortgages in urban and heterogenous neighborhoods.

The below image shows the rating of property grid that FHA staff and contractors filled out when considering a loan for federal insurance.

3.png

Whether intended or not, the features, or variables, under the “function” section of the grid disadvantaged many urban properties and increased the chance that a mortgage would be rejected. For example, the Manual instructed staff to lower ratings for “Livability and Functional Plan” for any properties with a “dark or poorly ventilated room” [10]. In rating “Natural Light and Ventilation,” staff members were instructed to consider “proximity to adjoining buildings” [11]. Unfortunately, urban properties often failed to sport these elements of supposed function. As Kenneth T. Jackson wrote in the seminal book Crabgrass Frontier, “While such requirements did provide light and air for new structures, they effectively eliminated whole categories of dwellings, such as the traditional 16-foot-wide row houses of Baltimore, from loan guarantees” [12].

Per Jackson’s point, mortgages for some typical urban housing structures were outright rejected based on a single variable. Even if this didn’t disqualify the applicant right away, the variable would be scored lower, which decreased both the category and overall mortgage rating, thus increasing the chance that the mortgage would ultimately be denied for federal insurance.

The location-rating algorithms also created barriers to federally insuring mortgages in urban neighborhoods [13]. The “Relative Economic Stability” and “Protection from Adverse Influences” variables comprised more than half of the potential score. The Manual instructed FHA staff to rate “Relative Economic Stability” of the location based on the occupations of the people in the neighborhood.

4.png

It noted that “laborers” were the lowest class of workers; neighborhoods where many residents were “laborers” were to be scored lower than neighborhoods with residents who had higher-status jobs. The Underwriting Manual itself acknowledged that this would make it more challenging for people in urban neighborhoods to secure federal insurance for mortgages. A paragraph in the instruction section noted that “a large percentage of the employed population of a city is found working in the capacity of laborers” [14]. By including a variable that would necessarily lower the score of urban neighborhoods, the rating of location algorithm, and the dependent rating of the mortgage pattern algorithm, again increased the chance that mortgages in urban neighborhoods would be rejected.

The “Protection from Adverse Influences” variable likewise biased the algorithms against urban neighborhoods. The Manual listed a variety of “adverse influences” that would lower the score for this variable. These “influences” such as nearby businesses, nearby schools, or “offensive noises and odors” were common in urban neighborhoods [15]. As Jackson explains, “Prospective buyers could avoid many of these so-called undesirable features by locating in suburban sections” [16].

More perniciously, the rating of location algorithm, and therefore the total mortgage score algorithm, also disadvantaged diverse or integrated neighborhoods.

More perniciously, the rating of location algorithm, and therefore the total mortgage score algorithm, also disadvantaged diverse or integrated neighborhoods. The Manual explicitly labeled the presence of “inharmonious racial groups” as an “adverse influence” [17]. It meanwhile instructed higher scores for locations that had natural or artificial barriers that protected from such inharmonious groups to form. The Manual even encouraged higher scores when racially restrictive covenants or deed restrictions were incorporated into the mortgage. Potential mortgages lacking these features were penalized, thus increasing the chance that the FHA would reject the insurance application of a mortgage in a diverse, or potentially diverse, neighborhood [18].

The FHA algorithms are no doubt a historical example of algorithmic discrimination. The input variables included in the rating grids strongly biased toward rejecting federally insured loans in urban or heterogeneous neighborhoods.

FHA Algorithms’ Legacy: Urban Decline, Residential Segregation, and Racial Inequality

The FHA algorithms’ anti-urban and anti-integration biases had penetrating and enduring effects, leading to mortgage insuring patterns that drove urban decline, residential segregation, and racial inequality. The United States is still grappling with their devastating impacts today.

The algorithms, as they were designed, produced a predictable result—the FHA insured far more loans in the suburbs than in urban neighborhoods. In the first 20 years of the FHA, for example, the suburbs of St. Louis County received five times the amount of FHA investment as did the city, when measured as number of loans or per-capita dollars insured [19]. Some cities fared even worse when compared to their suburbs. FHA insured zero mortgages in Newark and Patterson from the FHA’s inception to 1966 [20].

Previous research shows that the relative availability of FHA-insured mortgages in the suburbs led many city residents to move away from urban centers. Of the FHA-backed loans in St. Louis County, for example, over half were held by people who had most recently lived in the city [21]. This out-migration hurt the neighborhoods and neighbors left behind. As Jackson writes, “This withdrawal of financing often resulted in an inability to sell houses in a neighborhood, so that vacant units often stood empty for months, producing a steep decline in value” [22].

The suburbs were not equally accessible to people of all races.

But the suburbs were not equally accessible to people of all races. Therefore, the urban/suburban disparities in FHA-insured loans intensified racial segregation. As noted earlier, FHA algorithms were more likely to approve mortgages for federal insurance if the neighborhood or the mortgages presented a barrier to integration. Contractors, who sought FHA backing for full developments, added deed language to keep developments segregated and therefore increase the chance that their mortgage would be federally insured [23]. Black people, in larger part, were left out of the growing suburbs [24]. A pattern of white suburbs and black urban centers began to emerge, as black/white segregation in metro areas rose 40 percent between 1910 and 1940 and continued to grow—though at a slower pace—until 1970 [25].

The FHA eventually removed language instructing lower scores for mortgages in areas with “inharmonious racial groups,” and in 1948 the Supreme Court deemed the racial covenants that were incentivized under the algorithms to be unconstitutional. However, these adjustments did not remedy the damage caused by the bias in the original algorithms. Richard Rothstein explains that houses in the suburbs appreciated in value and rapidly became unaffordable to those who were not able to secure mortgages during the early days of FHA-insurance. As he writes, “By the time the federal government decided finally to allow African-Americans into the suburbs, the window of opportunity for an integrated nation had mostly closed” [26].

The residential segregation partially spurred by FHA algorithms has been stubborn and resistant to intervention. Brookings Institution’s analysis of recent census data shows that in order for urban neighborhoods to be fully integrated—that is, to have a proportionate number of black and white residents—more than half of the black population would need to move to a different neighborhood [27]. Residential segregation has, expectedly, bled into other aspects of life. K-12 schools are more segregated today than they were 40 years ago [28], and racial segregation has been identified as a driver of health inequality [29], educational inequality [30], and the racial wealth gap [31].

A Call to Monitor Modern Algorithms

Today, the government deploys algorithms for a wide range of uses. They are employed to determine eligibility for a variety of public benefits, to decide whether a defendant should be released before trial or detained, and to allocate firefighters or police officers to neighborhoods. The historical case of FHA mortgage rating shows the potential for government-deployed algorithms to systematize bias with devastating results. In mortgage risk-rating grids, both explicitly discriminatory variables and variables that appeared to be objective biased FHA algorithms against urban, heterogeneous neighborhoods. Though it may have been difficult to predict how influential these algorithms would be in contributing to urban decline, residential segregation, and racial inequality—consequences that appear indefensible by current standards—today’s goal must be constant vigilance and forethought. First, these insights should compel current policymakers to evaluate variables in government-deployed algorithms for potential bias.

Though it may have been difficult to predict how influential these algorithms would be in contributing to urban decline, residential segregation, and racial inequality—consequences that appear indefensible by current standards—today’s goal must be constant vigilance and forethought.

Policymakers should engage the public and subject-matter experts to evaluate variables used in algorithms that are currently deployed or may be implemented in order to determine the likelihood that use of the algorithm will systematize bias.

In any such analysis, policymakers should consider the following questions:

  • Do any variables in the algorithm explicitly preference or disadvantage a particular identity, race, gender, class, or geography?
  • What is the correlation between each variable in the algorithm and particular groups of interests such as race, gender, class, or geographic location? Are any of the variables so closely correlated with a group that they serve as a proxy for that group?

While analyzing algorithms’ variables is important, policymakers may not be able to fully gauge the disparate impacts or negative consequences of an algorithm until it’s deployed. Therefore, policymakers must also track any disparities and unintended consequences after deployment. At a minimum, they should monitor deployed algorithms closely enough to answer the following questions:

  • Once implemented, does the algorithms yield systematically different results for different groups? In other words, does the algorithm result in preferences or disadvantages for a particular identity, race, gender, class, or geographic group?
  • What are the consequences of the disparity created by the algorithm? What ripple effects does the disparity create?

The suggested analysis of algorithms’ variables and ongoing monitoring of algorithmic consequences could help surface potential cases of bias and discrimination soon enough to prevent devastating results. However, it’s impossible to prescribe how policymakers should respond to the myriad potential results of each analysis. While a review of the FHA risk-rating system certainly shows that the algorithms required reform, analysis of other algorithms may not yield an equally clear direction. In certain cases, analysis of variables may find one that is both critical to the functioning of an algorithm and highly correlated with a particular group. In other cases, the disparate impact of an algorithm may need to be weighed against other values.

In certain cases, analysis of variables may find one that is both critical to the functioning of an algorithm and highly correlated with a particular group.

The complexities of analyzing algorithms points to the importance of consulting both subject-matter experts of all kinds and the public in the review process, and in decisions based on these analyses. For instance, policy-area experts can help evaluate particular variables and surface unintended consequences of algorithms as they are deployed, while algorithmic decision-making experts can provide frameworks and tools to measure potential algorithmic bias [32], to evaluate algorithm risks [33], or to weigh the potential costs to fairness against other goals [34]. The public, especially, must be able to shape algorithms that allocate public services or goods and should thus be consulted as the ultimate experts.

The FHA algorithms didn’t simply reject mortgages—they rejected people from receiving mortgages. Algorithms deployed by government today also affect human lives. Only careful analysis of these algorithms can prevent the type of harm caused by those used in federal housing policy decades ago.

References

  1. “Creation of Federal Housing Administration,” 1246 § 847 (1934), Sections 1–2, http://www.legisworks.org/congress/73/publaw-479.pdf.
  2. Kenneth T. Jackson, Crabgrass Frontier: The Suburbanization of America (New York: Oxford University Press, 1984), 205.
  3. Federal Housing Administration, Underwriting Manual: Underwriting and Valuation Procedure Under Title II of the National Housing Act, 1938, Section 203I, available at https://babel.hathitrust.org/cgi/pt?id=mdp.39015018409253;view=1up;seq=1.
  4. FHA, Underwriting Manual, Pt I, Paragraph 214.

5, FHA, Underwriting Manual, Pt I, Paragraph, 221–222.

  1. FHA, Underwriting Manual, Pt I, Paragraphs 224–228. More detail on which staff filled out which section is available in Paragraph 224.
  2. FHA, Underwriting Manual, Pt I, Paragraphs 301–310.
  3. FHA, Underwriting Manual, Pt I, Paragraphs 227–230.
  4. FHA, Underwriting Manual, PI, Paragraph 206.
  5. FHA, Underwriting Manual, Pt II, Paragraph 133.
  6. FHA, Underwriting Manual, Pt II, Paragraph 149.
  7. Jackson, Crabgrass Frontier, 208.
  8. Economic stability comprised 40 percent of the category score. Weighted scores were left off of the grid as FHA Insuring Offices determined a maximum score that a neighborhood could receive given the large metropolitan district to which it belonged (FHA, Underwriting Manual, Pt II, Paragraph 203–217).
  9. FHA, Underwriting Manual, Pt II, Paragraph 219.
  10. FHA, Underwriting Manual, Pt II, Paragraph 232.
  11. Jackson, Crabgrass Frontier, 208.
  12. FHA, Underwriting Manual, Pt II, Paragraph 229.
  13. FHA, Underwriting Manual, Pt II, Paragraph 226–229.
  14. Jackson, Crabgrass Frontier, 210.

20. Jackson, Crabgrass Frontier, 213.

21. Jackson, Crabgrass Frontier, 209.

22. Jackson, Crabgrass Frontier, 213.

23. Richard Rothstein, The Color of Law: A Forgotten History of How Our Government Segregated America (New York: Liveright Publishing Corporation, 2017), 77.

24. Rothstein, The Color of Law, 67.

25. Douglas S. Massey, “Residential Segregation and Neighborhood Conditions in U.S. Metropolitan Areas,” in America Becoming: Racial Trends and Their Consequences 1 (2001), https://www.nap.edu/read/9599/chapter/14.

26. Rothstein, The Color of Law, 129.

27. William H. Frey, “Census Shows Modest Declines in Black-White Segregation,” Brookings Institution, 2015, https://www.brookings.edu/blog/the-avenue/2015/12/08/census-shows-modest-declines-in-black-white-segregation/.

28. Rothstein, The Color of Law, 179.

29. David R. Williams and Chiquita Collins, “Racial Residential Segregation: A Fundamental Cause of Racial Disparities in Health,” Public Health Reports, 2001, available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497358/.

30. Sean Reardon, “School Segregation and the Racial Academic Achievement Gaps,” Center for Education Policy Analysis, Working paper, 2015, 15–22, https://cepa.stanford.edu/sites/default/files/wp15-12v201510.pdf

31. Thomas Shapiro, Tatjana Meschede, and Sam Osoro, “The Roots of the Widening Racial Wealth Gap: Explaining the Black-White Economic Divide,” Institute on Assets and Social Policy, 2013, https://iasp.brandeis.edu/pdfs/Author/shapiro-thomas-m/racialwealthgapbrief.pdf.

32. See Joshua A. Kroll et al., “Accountable Algorithms,” University of Pennsylvania Law Review, 165, (2017) available at https://scholarship.law.upenn.edu/penn_law_review/vol165/iss3/3/.

33. See Ethics & Algorithms Toolkit at http://ethicstoolkit.ai/

34. See Sam Corbett-Davies et al., “Algorithmic Decision Making and the Cost of Fairness,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, https://5harad.com/papers/fairness.pdf, for an example of such a framework.