Esri Demographics

Compare Places Dashboard: Review the Proposed Census 2020 Disclosure Avoidance System

The US Census Bureau has been working diligently to refine the Disclosure Avoidance System (DAS) for the Census 2020 redistricting data release in September. The DAS is an essential component of this decade’s Census data releases because the Census Bureau is required by law to protect individual privacy under Title 13. The DAS represents a compromise between accurate data and privacy protection. In prior censuses the Census Bureau used various forms of disclosure avoidance. These techniques historically consisted of table suppression and data swapping. Given the risk of a “reconstruction attack,” Census 2020 will apply a more modern formal statistical approach of disclosure avoidance called differential privacy. Refer to this blog for more information on the move to differential privacy.

Demonstration Data

To better understand what differentially privatized data look like, the Census Bureau has released a series of demonstration products that compare published Census 2010 data with incremental versions the Census 2020 DAS applied to Census 2010 data. This allows for a comparison of the differences between the old method of data swapping and the new method of differential privacy. This comparison therefore allows users to test the Census 2020 DAS for fitness of use in various use cases. The latest version of the DAS has increased the Privacy-Loss Budget (PLB) to 12.2. This essentially means that there is less noise injected into the data than in prior versions of the DAS. The Census Bureau plans to apply this DAS to the upcoming redistricting data release. However, there is still time to provide the Census Bureau with valuable feedback on the data’s fitness for use.

To make the differentially privatized Census 2010 demonstration data more accessible and easier to use, IPUMS NHGIS have processed and tabulated the released microdata for 15 geographies. In conjunction with these tabulation releases, Esri has published five versions of the demonstration data as feature sets for 8 geographies in the ArcGIS Living Atlas of the World. The most recent version is V5 (April 2021).

Compare Places

Esri used the latest version of Census demonstration data published in the Living Atlas of the World to analyze the differences between the demonstration and released data sets at the place level. Places are easy to interact with as we are all familiar with our general local areas, but not many people know what tract or block group they live in. There are 29,261 places in the 2010 Census places inventory. This includes 9,721 Census Designated Places (CDPs) and 19,540 incorporated jurisdictions. Use this application to better understand places that you are familiar with.

Are the differences between the latest version of demonstration data (DP) and the released Census 2010 data (SF1) acceptable for your use case?

Click the image below to launch the Dashboard

Differences across datasets

There are many ways to explore the differences between the two datasets. As you navigate through the application it is helpful to have some context as to how the differences in your places of interest compare to the typical differences among all places. For this context refer to the descriptive statistics below.

Descriptive Statistics For All 2010 Places

Differences (DP – SF1) Average Min Median Max SD Min %* Median %* Max %* MALPD MAPD
Household Population -2.7 -1,995 0 4,798 65.7 -90.0% 0.0% 1050.0% 0.4% 1.6%
Population 18+ -1.3 -923 0 769 33.2 -90.0% 0.0% 300.0% 0.1% 1.8%
Households -0.1 -273 0 289 15.9 -96.0% 0.0% 1886.0% 0.1% 3.2%
White -1.1 -510 -1 331 18.6 -93.8% 0.0% 500.0% -0.5% 2.7%
Black or African American 0.3 -334 0 309 13.5 -94.1% 0.0% 1800.0% 3.7% 33.2%
Amer Indian and AK Native 0.0 -310 0 269 9.6 -95.5% 0.0% 2200.0% 7.2% 44.4%
Asian 0.3 -124 0 425 10.7 -94.1% 0.0% 2800.0% 5.1% 37.8%
Pacific Islander -0.1 -378 0 721 7.8 -93.9% 0.0% 1700.0% 0.1% 29.1%
Other 0.6 -227 0 350 12.7 -95.7% 0.0% 2300.0% 5.3% 35.9%
Two or More Races -2.4 -1,727 0 3,988 54.9 -96.3% 0.0% 2800.0% 27.5% 52.6%
Hispanic -0.7 -971 0 3,314 41.1 -93.8% 0.0% 1700.0% 14.4% 35.9%
Non-Hispanic -2.1 -1,305 0 1,987 39.9 -91.7% 0.0% 1100.0% 11.0% 2.2%
Persons Per Household 0.0 -5.0 0.0 48.0 0.6 N/A N/A N/A 3.2% 9.8%
Occupancy* -1.0% 83.3% 0.0% 78.0% 5.0% N/A N/A N/A -0.2% 2.2%

MALPD = Mean Algebraic Percent Difference, a measure of bias
MAPD = Mean Absolute Percent Difference, a measure of average percent difference
SD = Standard Deviation
* These calculations are limited to cases where DP values are > 0 and SF1 values are > 0

Looking at these descriptive statistics we can see some patterns in how the proposed DAS impacts the data:

Data Integrity

Many statistical tables from census data yield information on the interaction between people and housing (e.g., persons per household, household type, occupancy, etc.). However, differential privacy is applied separately to person-level counts and housing counts. This poses the additional challenge of data integrity across the person and housing universes. The DAS attempts to maintain integrity through various forms of post-processing, a second step of the DAS after the formal privacy protection has been applied. Looking at the latest version (V5, PLB 12.2) we see that data integrity has improved but still yields some peculiar results even at the place level. Faults in data integrity can be broken out into two types, impossible and improbable. For example, a place cannot have more households than household population because, by definition, a household is occupied by at least one person. Improbable results are possible but highly unlikely. For example, a place could include all population under 18 years of age if the place only contains a juvenile facility and the caretakers do not live at the facility full time. This scenario, although possible, is highly improbable, and these cases should be scrutinized.

Checks on Data Integrity

DP Data Integrity Problems – Impossible Place Count Percent
More households than household population 39 0.13%
Household population > 0 but households = 0 16 0.05%
Households > 0 but household population = 0 8 0.03%
DP Data Integrity Problems – Improbable Place Count Percent
All population under 18 years of age 2 0.01%
Persons per household greater than 10 20 0.07%
DP occupancy rate is 100% but SF1 occupancy rate is not 100% 433 1.48%
SF1 occupancy rate is 100% but DP occupancy rate is not 100% 56 0.19%
DP occupancy rate is 0% but SF1 occupancy rate is not 0% 17 0.06%
SF1 occupancy rate is 0% but DP occupancy rate is not 0% 6 0.02%
SF1 household population = 0 but DP household population > 0 8 0.03%
DP household population = 0 but SF1 household population > 0 11 0.04%

When applying the same tests to the released 2010 SF1 data only one place has persons per household greater than 10. All other counts equal zero for the above data integrity tests.

This is a limited analysis of a subset of the demonstration data. These differences represent the trade-off between privacy protection and accurate data. We encourage you to perform your own analysis of the demonstration data to test your use cases. This blog by Lauren Scott Griffin provides an excellent workflow for creating web maps to compare the differences. If you are comfortable with the proposed changes to census data, then you are well positioned for the upcoming 2020 census release. If you have concerns, now is the last opportunity to voice them. Please send any comments to the Census Bureau at 2020DAS@census.gov before the May 28th deadline.

About the author

Kyle R. Cassal, Chief Demographer at Esri, is the lead developer for Esri’s Data Development team. His team is responsible for producing independent demographic and socioeconomic updates and forecasts for the United States. These data are leveraged across the Esri platform through web maps, infographics, data enrichment and custom applications including Business Analyst and the Living Atlas of the World. In addition to processing US Census and ACS data, his team produces unique and innovative databases such as Tapestry Segmentation, Consumer Spending and Market Potential which are now industry benchmarks.

Connect:

Leave a Reply

Please Login to comment

Next Article

What's new in ArcGIS StoryMaps (September 2021)

Read this article