Section 3. FEAT Methods
FEAT is a data transformation tool that runs through a series of data processing and analysis steps to produce useful insights on housing loss. The length of time that FEAT takes to run will depend on the volume of data being processed.
This section walks through each step that FEAT performs on housing loss data.
Map Input Data to Required Fields
FEAT validates that the housing loss data is stored in the proper format (.csv) and contains the required data fields (i.e., street_address_1, city, state, zip_code, date, and type). FEAT also checks that the data format meets the criteria laid out in Figure 1.
Clean and Drop Duplicate or Missing Data
FEAT cleans the input data and drops data that it cannot parse. FEAT drops data for the following reasons:
- Duplicate records that are identical on every required data field;
- Records missing required information (i.e., a date, address, or GEOID); or
- Records that have a date that is before 2016.
FEAT outputs uploaded data in the address_errors output file. See the Address Errors information in Section 4 for more detail.
Geocode Data Using Census Batch Geocoder API
FEAT geocodes data to the census tract. To translate addresses into their corresponding census tract IDs, or GEOIDs, FEAT standardizes the geographical identifier columns by converting them to uppercase, stripping any punctuation, and removing suffixes (e.g., apartment numbers or post office box numbers).
Then, FEAT uses one or more methods to match the address of the eviction or foreclosure record to the appropriate census tract. The method FEAT uses depends on the geographic location data included in the input data:
- If Address Data is Already Geocoded: If a census tract identifier (an 11-digit GEOID representing the combination of the 2-digit state, 3-digit county, and 6-digit tract code) is in the data, FEAT moves to the next step.
- If the Street Address Field Is Populated, but There Is No GEOID Data: FEAT uses the census.gov geocoder, and it submits addresses in batches to retrieve the GEOID and merge the found census tract information into the data. For all data input by the user, FEAT uses the 2020 census geocoder parameters. The parameter benchmark is set to Public_AR_Census2020 and the vintage is set to Census2020_Census2020.
It is important to note that any census tract level FEAT analysis only includes data that was successfully geocoded (i.e., matched to a census tract). According to the U.S. Census Bureau, there are several reasons why the census geocoder may be unable to match data, including:
- Address is non-residential or commercial;
- Housing unit may have been recently constructed and is not in our database yet;
- Local Addressing Authority changed the address, and changes are not yet reflected in our database;
- Address may be in a location where we are missing address range information; or
- Housing unit may have been destroyed/demolished.
It is important a user assesses what data is not geocoded by FEAT and thus dropped from some FEAT analysis. See Section 4 for more detail.
Summarize Data at Census Tract Level
Once FEAT has matched data to the relevant census tracts, FEAT aggregates the data by census tract and calculates housing loss summary statistics, including eviction or foreclosure totals, rates, and indices at the census tract level. FEAT conducts analysis individually on each type of housing loss, for each year of data uploaded and across the total years of data, and also provides analysis for combined housing loss (evictions and foreclosures), if a user uploads both types of data. See Section 4 for a more in-depth description of FEAT analysis.
Create a Time Series
FEAT organizes the data by month and year and creates a spreadsheet and time series chart that showcases housing loss totals over the course of a year.
Append Socio-Demographic and Housing Variables
This step utilizes the State and County FIPS Codes to source and append over 70 corresponding socio-demographic and housing variables from the five-year 2017–2021 American Community Survey (ACS) at the census tract level to the eviction or foreclosure data, using the census.gov API. This includes financial, housing, race and ethnicity, and other ACS variables from DP02 (selected social characteristics in the United States); DP03 (selected economic characteristics); DP04 (selected financial characteristics); and DP05 (demographic and housing estimates).
For a full list of ACS variables that FEAT appends, see our data dictionary here.
Calculate Correlation Analyses
FEAT calculates the Pearson correlation coefficient, or r value, between each ACS variable (using ACS 2017–2021 five-year estimates) and housing loss data. Specifically, FEAT calculates correlations on housing loss rates across all the geocoded census tracts in the data.
The Pearson correlation coefficient is calculated using the Pearson function in the stats module of the SciPy Python package. The methodology used to calculate this statistic is maintained here.
After each correlation is calculated, the correlation’s p value is compared to 0.05. If the p value correlation is less than or equal to 0.05, the correlation between the variable and that housing loss type rate is statistically significant at the 0.05 level. If it is higher than 0.05, the correlation between the variable and housing loss type rate is not statistically siginificant at the 0.05 level. This means it is more likely the true correlation between these variables is 0, and that the correlation coefficient we are observing with this set of data is an anomaly. For more details on interpretation of correlation analysis results, see Section 4.
Variables with a 0 variance have a 0 correlation. Census tracts that return an error value from the ACS for a variable (e.g., -888888888 or -666666666 in the ‘housing_loss_summary’ files due to insufficient or unavailable estimates) are excluded from the correlation analysis. For a complete list of estimate and annotation values and explanations, please see these ACS notes.
Get Geometry Data for Mapping
Lastly, FEAT sources census tract boundaries for the geographies from TIGERweb by the U.S. Census Bureau, creating a file containing geographical boundaries for the given county’s census tracts (.geojson) and a Geopackage file (.gpkg) for use with GIS software.
Eviction Lab Methods
Eviction Lab’s Eviction Tracking System (ETS) provides updated eviction filing data for 10 states and 33 cities in the United States, beginning in January 2020. In these jurisdictions, FEAT allows users to use ETS data to produce FEAT outputs.
Because ETS data is aggregated at the census tract or zip code level, and FEAT requires data to be at the individual record level, FEAT converts ETS data into a format that FEAT is able to run. Below are the steps that FEAT takes to process and analyze ETS eviction filing data.
- Download the following two files from Eviction Lab’s website: Cities and States. These files contain all of the eviction filing data housed in the ETS for cities and states, by month.
- For each of the files, the program cleans and formats the data to meet the criteria for FEAT. This includes disaggregating the data by census tract level, such that each eviction filing record is documented in its own row of data. The program also creates and populates data fields that FEAT requires that are not in the ETS data (e.g., “county” and “type”). Lastly, the program creates individual input files for each city and state in the ETS data.
- Each state and city input file is run through FEAT at the beginning of each month to reflect new Eviction Lab data.
- FEAT analysis for each state and city can be downloaded or viewed in the ‘Use Eviction Lab Data’ page of FEAT.
FEAT analysis of ETS data differs from analysis on housing loss data uploaded by a user in a few ways:
- 2010 Census Tract Boundaries: Eviction Lab uses 2010 census tract boundaries (as opposed to 2020 census tract boundaries).
- ACS Data: FEAT uses the 2015–2019 ACS five-year estimates, 2015–2019 ACS five-year data profiles, and the 2015–2019 five-year subject tables to run analysis on ETS data, to align with the 2010 census tract boundaries.
- TIGERweb: FEAT uses 2019 for the TIGERweb API call.
Users who access FEAT analysis of ETS data should use the following attribution: Peter Hepburn, Renee Louis, and Matthew Desmond, Eviction Tracking System: Version 1.0 (Princeton: Princeton University, 2020), www.evictionlab.org.