Providing Access to Past and Future Census Data Sets
U.S. Bureau of the Census Utilizes Esri's Internet Technology
How do you effectively distribute data on 250 million people, 100 million households, and 15 million Businesses to the users of that data? For the U.S. Bureau of the Census, the solution to that problem lies in making the census data available through the Internet.
In 1997, Esri was part of a team chosen by the U.S. Bureau of the Census to help build the Data Access and Dissemination System (DADS), which will become the Internet site through which users of census data--from school children to state data centers to researchers of demographic data--will gain access to census data. This site has become known as the American FactFinder and is now available on the Internet at factfinder.census.gov. Esri is partnering with IBM, who is the prime contractor for this multiyear project, which will make census data widely accessible to internal and external users. Other participants in the project include Oracle Corporation, which is providing consulting and software development services in the deployment of the census data warehouse that will serve as the repository of all census data accessible through the site.
One of the key requirements for the project is to integrate geography with data. "The objective is to let users gain access to exactly the data they want for a specific geographic location via a simple interface," says Gerry Clancy, manager of Esri's Business Solutions Group. "This is no easy task since census data is both vast and often provided in a rather complex format. While individuals most often want census information on a specific location such as their hometown, many users, such as the state data centers, users responsible for redistricting, academia, and the press, need the ability to geographically visualize census data for larger geographic areas."
To address these diverse needs, Esri's primary role in the project is to utilize its expertise in GIS and spatial database development to build a system where users can effectively search for and visualize census data through maps. To that end, software developers and consulting staff from Esri's Business Solutions Team which is part of Esri's Professional Services Division, have been developing the software and spatial database needed to support functions such as geographic searching, and reference and thematic mapping. Esri's Spatial Database Engine (SDE) and Internet Map Server (IMS) software form the basis for the development.
Building an easy-to-use Internet site that will allow users timely access to census data depends on innovative strategies that address technology as well as data. "We need a system that can address the needs of potentially thousands of users-novice and expert-from within and outside the bureau to whom we want to grant efficient access to data without divulging confidential information," says Enrique Gomez, Census Bureau program manager for the project.
To ensure all requirements were accounted for, the design period for the system included an extensive requirements analysis. Software components that were identified as candidates for the system were evaluated by the project team based on requirements, their track record as commercial products, and developer knowledge base. Esri consultants worked with IBM and bureau staff to gather requirements as well as design the geographic components of the system. The team used many object-oriented methods to support this process, including use case analysis, and object interaction diagrams.
Ultimately, the project team chose a three-tier architecture with hardware and software components that have been proven in other high-volume Internet applications. IBM is deploying its RISC/6000 SP server technology as the hardware platform. Because of the various confidentiality requirements two systems will be deployed-one to service internal census staff and the other to support the public Internet site. The internal system is an eight-node RISC/6000 SP machine. It was configured to address the high computational load but lower user request volume anticipated from users internal to the bureau. The configuration of the 20-node external system was optimized to deal with a much higher user request volume, but a lower computational load of individual queries that is expected from external users.
On the data side, an Oracle Data warehouse serves as the repository for all tabular census data, while SDE is utilized for the management of all spatial data. The overall back end system provides for high availability through full mirroring. In the middle tier, Oracle's Application Server manages the translation and dispatch of user requests. IMS, coupled with custom software, provides the mapping capabilities users utilize to search for and visualize census data geographically. The bulk of the middle tier is written in Java and C++. To provide the broadest possible access to the site, the front-end user interface is based on HTML. In the future the system will support some level of E-commerce activity.
As a data dissemination system, one of the most important aspects of American FactFinder is accurate representation of the census data within the system. This requires a methodology that allows the project team to harness and utilize the expertise that resides with Census Bureau staff who know and understand the census data. To accomplish this, American FactFinder is being developed with an integrated project team that includes contractor personnel and bureau staff. Using an object-oriented development methodology, developers and consultants from IBM, Esri, Oracle, and the Census Bureau itself work at Census Bureau offices as a team on the development of the American FactFinder system.
"Using an integrated team benefits everyone involved," says John Somiak, IBM program manager. "We have immediate access to a large knowledge base comprised of bureau staff that help assure data is represented as intended, while the bureau's direct involvement in the development of the site will ensure a smooth transition once the site moves from a development to an operational mode."
The initial release of American FactFinder became available to the public in March of 1999 at factfinder.census.gov. Currently, users can search for and view data from the 1990 Decennial census, the American Community Survey, the Census 2000 Dress Rehearsal, and the 1997 Economic Census. The data currently available to users is a relatively modest 500 GB in volume. As new surveys get added, the data volume is expected to grow to several terabytes by the time the 2000 Decennial and 2002 Economic censuses are completed. The Census Bureau estimates that several thousand users may access American FactFinder simultaneously at peak times.
For more information, contact Gerry Clancy, Esri Project Manager (E-mail: firstname.lastname@example.org).