arcuser

Aggregation Improves Reliability Concerns for American Community Survey Data

Many GIS analysts work with American Community Survey (ACS) data from the US Census Bureau. This data is based on a sample of the population, therefore—like all survey data—it contains error due to sampling.

The census bureau publishes margins of error (MOE) along with every estimate it publishes, which is incredible. However, when users view the MOE for a specific measure on a specific tract, they can erroneously dismiss the entire ACS dataset. Even more unfortunately, users may not know that the typical geographic analysis of ACS data increases the reliability of the data by a noticeable amount.

[To learn more about MOEs, read “The Importance of Margins of Error and Mapping” in the summer 2021 issue of ArcUser.]

The census bureau can publish estimates down to the census tract and even block group level by coarsening and aggregating data across years. Pooling together five years of data produces estimates at fine geographic levels. In addition to coarsening the data across time, data can also be coarsened across space.

While this may seem like heresy to many geographers, there are benefits to coarsening or aggregating data geographically. Many people need data disaggregated by race/ethnicity, gender, income, and other dimensions. Coarsening can make this data more reliable.

Some cities have gone through robust validation processes to create their own geographies that are coarser than census tracts. For example, New York City has its own Neighborhood Tabulation Areas, and Houston has its Super-Neighborhoods. You can use geography to test your own groups of tracts. You can call them anything you like, but I’ve been calling them super tracts.

The ACS Summarization App

A lightweight app, ACS Summary Statistics with Margin of Error, is available at no charge to help you get a quick sense of how many tracts you’ll need to aggregate to get an estimate that meets your desired level of reliability. The left panel of the app lets you search and select layers and attributes. If you select a count, the tract centroids will be symbolized by size. If you select percentage, the tract centroids will be symbolized by color.

Use the sketch tools on the top of the map to create your super tract. First, draw a shape around the neighborhood or district you know you want to analyze. Get the basic shape on the map to start.

The right panel of the app lets you see summary statistics for the chosen layers and attributes. The gauge shows the coefficient of variation (CV) for the selection, which is calculated on the fly. [CV is a measure of the relative spread of the values.] Watch the gauge change as you modify your sketch. It characterizes the reliability of the CV as high, medium, or low. The lower the CV, the higher the reliability. Conversely, the higher the CV, the lower the reliability.

Reliability for Coefficient of Variation (CV) Ranges
The ACS Summary Statistics with Margin of Error app helps you get a quick sense of how many tracts you’ll need to aggregate to get an estimate that meets your desired level of reliability.

Best Practices for Tract Summarization

When aggregating tracts to improve reliability, try to

Preserve Patterns in the Map

Try to follow the patterns in the map when combining. For example, combine high values with other high values and lower values with other low values. The symbology in this app can help, but your own local on-the-ground knowledge is invaluable here. I realize this is hard when aggregating just a few tracts while balancing other considerations. However, combining areas with wildly differing characteristics will smooth out the numbers and become less informative.

Be Mindful of Tracts with an Estimate of Zero

Tracts with an estimate of zero are symbolized by the transparent teal symbols (for counts). These tracts are generally in places such as airports, cemeteries, and open land, but they can be anywhere. Even zero estimates have MOEs, which means there may be a few individuals in your population of interest in these tracts. They will not add anything to your estimate, but because they have a nonzero MOE, they could add slightly to your error.

Use Official Estimates when Available

In addition, this aggregation method will only give you an approximation for both the estimate and the MOE using the approximation formulas in Understanding and Using American Community Survey Data. The census bureau produces official estimates for school districts, incorporated places, county subdivisions, congressional districts, and many other areas. If you’re aggregating up to get values for a defined census geography, such as a city boundary, check data.census.gov for an official estimate you can use. Not only does the official estimate use the true boundary, it will also have a lower MOE than you’d get from aggregating tracts, since it’s not being approximated.

Reliability and the Nature of the Estimate

Adjust your reliability comfort level depending on the nature of the estimate. Ideally, you would like all estimates to have high reliability with a low coefficient of variation. However, this will be hard to achieve for very small populations, such as these:

In cases like these, ask yourself if you could live with a medium level of reliability if that meant obtaining finer geographic detail. Also, neighboring tracts can have different levels of reliability for the same attribute. Sometimes tracts do have reliable estimates, so there’s no need to aggregate unnecessarily and lose the geographic detail.

Tracts with an estimate of zero are symbolized by the transparent teal symbols (for counts). These tracts are generally in places such as airports, cemeteries, and open land, but they can be anywhere.

Start with the ACS Summarization App

The app is designed to give you a jumping-off point for creating aggregating tracts. It’s designed to help you iterate quickly and dynamically, to get a quick sense of how much you’ll have to aggregate up geographically to meet your reliability comfort level. Note that the final super tract does not persist outside of the app.

Create your final super tract using geoprocessing tools such as merge and dissolve. The app uses centroids for faster performance, but you might want to use the polygon versions of these layers that contain tract boundaries.

Estimates of medians are not included. Aggregating medians requires the full distribution of values, not just the medians of the various tracts. Therefore, estimates of medians are not in the app. Sometimes, entire layers are not included, such as Median Age, since all the attributes in this layer are medians.

The estimates update every year with new values, since they use ArcGIS Living Atlas of the World layers that are updated annually. Remember that an aggregation that meets your reliability requirement now may not hold in future data releases.

Geography can help you work with error, instead of being scared by it.

About the author

Diana Lavery

Diana Lavery loves working with data! She has over a decade of experience as a practitioner of demography, sociology, economics, policy analysis, and GIS - making her a true social science quantoid. Diana holds a BA in quantitative economics and an MA in applied demography. She has been with Esri as a product engineer on Esri's Living Atlas and Policy Maps teams since 2017. Diana enjoys strong coffee and clean datasets, usually simultaneously.