Nearly Two Centuries of Data
Historical boundary and statistical datasets available
This article as a PDF .
Seven years ago, the Minnesota Population Center embarked on an extraordinarily ambitious social science infrastructure project. The National Historical Geographic Information System (NHGIS) was designed to create and freely disseminate GIS-compatible boundary files and statistical information that includes virtually all aggregate census information for the United States since 1790. For the first time, a researcher can explore most of the census data available from 1790 to 2000 using GIS.
The project, funded by the National Science Foundation, has four main objectives:
- Create a comprehensive, high-precision spatial database of census tract boundaries for every decade since 1910 and county boundaries in every census year since 1790. These boundaries, which are fully compatible with Census Bureau TIGER/Line files, required the construction of more than 200,000 polygons.
- Collect and enrich a massive database of historical and contemporary U.S. summary data on population, housing, agriculture, manufacturing, business patterns, voting, and other georeferenced statistics that comprise a total of 750 gigabytes of data drawn from more than one million separate source files. This involved manual data entry of key datasets from printed or manuscript sources, correction and editing of existing datasets produced by other researchers, and reformatting census-produced data to maximize efficient storage and access.
- Develop machine-readable metadata for the entire collection, representing a total of more than five million lines of tagged and structured codebooks.
- Design and implement a data retrieval system based on 100 gigabytes of indexes with a Web-based interface that allows simple and free access to digital GIS boundary files, statistical data, and metadata.
Meeting these objectives required the dedication of 20 research and professional staff and thousands of hours of labor. The first phase of NHGIS was completed in March 2007. The system now has more than 3,000 registered data users. Since its initial release, NHGIS has secured additional support from both the National Science Foundation and the National Institutes of Health to implement a number of enhancements.
In April 2008, NHGIS released additional GIS boundary files that correspond to additional geographic levels. The additional geographic levels for 1990 and 2000 include American Indian Areas, Congressional Districts, County Subdivisions, Places, Urban Areas, Voting District, ZIP Code Tabulation Areas, Census Block Groups, and Census Blocks. For 1980, Minor Civil Divisions, Places, and the Block Numbering Areas were also released. With the these additional GIS boundary files, the match between the statistical and the GIS boundary file data available in NHGIS is approximately 90 percent.
Developments in geographic standards, statistical infrastructure, and information technology require that NHGIS be regularly updated to capitalize fully on this investment in social science infrastructure. The following improvements are planned to be phased in over the next four years. Enhancements that include realigning boundary files, adding the most recent data, and integrating geography and data over time will maximize the usability of NHGIS for the research community.
Realignment of NHGIS boundary files will ensure compatibility with new standards. In preparation for the 2010 Decennial Census of Population and Housing, the U.S. Census Bureau is carrying out the Master Address File/Topologically Integrated Geographic Encoding and Referencing Accuracy Improvement Project (MAF/TIGER AIP). Improving the accuracy of TIGER features has serious implications for the NHGIS database. NHGIS leveraged features in the 2000 TIGER/Line files to construct historic census tract and county boundaries. The historical NHGIS GIS files will be realigned to correspond to the newly updated and released Census 2010 boundary files.
The most recent data from the American Community Survey (ACS) will be added. For the past five decades, the principal source of georeferenced summary statistics for small areas has been the long-form samples of the decennial censuses. The Census Bureau is now replacing the long form with the ACS. The ACS offers many advantages over the census long forms, but it also poses new challenges. For example, data is collected continuously instead of on a specified census day, and the reference period for aggregate statistics varies according to the population size of the geographic unit.
Continued efforts will concentrate on providing data to researchers, educators, businesses, and policy makers. To visualize the data without having to download and use a statistical package or GIS software, NHGIS has partnered with www.socialexplorer.com to provide users with ready access to simple visual representation of the NHGIS data.
For more information, contact
Minnesota Population Center
University of Minnesota
Pétra Noble, Spatial Core Director, Research Fellow