GIS databases evolve constantly. From paper maps through the digital conversion process to data maintained in a database, GIS data are being constantly transformed. Maintaining the integrity and accuracy of these data through a well-designed quality assurance (QA) plan that integrates the data conversion and maintenance phases is key in implementing a successful GIS project.
Poor data negate the usefulness of the technology. Sophisticated software and advanced hardware cannot accomplish anything without specific, reliable, accurate geographic data. GIS technology requires clean data. To maximize the quality of GIS databases, a quality assurance plan must be integrated with all aspects of the GIS project.
The fundamentals of quality assurance never change. Completeness, validity, logical consistency, physical consistency, referential integrity, and positional accuracy are the cornerstones of the QA plan. All well-designed QA strategies must coexist within the processes that create and maintain the data and must incorporate key elements from the classic QA categories. If QA is not integrated within the GIS project, QA procedures can themselves become sources of error.
Categories of Quality Assurance
> Completeness means the data adhere to the database design. All data must conform to a known standard for topology, table structure, precision, projection, and other data model specific requirements.
> Validity measures the attribute accuracy of the database. Each attribute must have a defined domain and range. The domain is the set of all legal values for the attribute. The range is the set of values within which the data must fall.
> Logical consistency measures the interaction between the values of two or more functionally related attributes. If the value of one attribute changes, the values of functionally related attributes must also change. For example, in a database in which the attribute SLOPE and the attribute LANDUSE are related, if LANDUSE value is "water," then SLOPE must be 0, as any other value for SLOPE would be illogical.
> Physical consistency measures the topological correctness and geographic extent of the database. For example, the requirement that all electrical transformers in an electrical distribution database's GIS have annotation denoting phasing placed within 15 feet of the transformer object is one that describes a physically consistent spatial requirement.
> Referential integrity measures the associativity of related tables based upon their primary and foreign key relationships. Primary and foreign keys must exist and must associated sets of data in the tables given predefined rules for each table.
> Positional accuracy measures how well each spatial object's position in the database matches reality. Positional error can be introduced in many ways. Incorrect cartographic interpretation, through insufficient densification of vertices in line segments, or digital storage precision inadequacies are just a couple sources of positional inaccuracies. These errors can be reandom, systematic, and/or cumulative in nature. Positional accuracy must always be qualified because the map is a representation of reality.
William C. Masters
The following section outlines general stages of GIS database creation from an existing paper map to a seamless, continually maintained database and how a QA plan is integrated at each stage.
Random error will always be a part of any data, regardless of form. Random error can be reduced by tight controls and automated procedures for data entry.
Systematic error, on the other hand, must be removed from the data conversion process. A systematic error usually stems from a procedural problem that, once corrected, usually clears up the systematic error problem.
Random and systematic error can be corrected by checking data automatically and visually at various stages in the conversion cycle. A short feedback loop between the quality assurance and conversion teams speeds the correction of these problems.
Visual inspection should occur during initial data capture, at feature attribution, and again at final data delivery. At initial data capture, review the data for missing or misplaced features and alignment problems that could point to a systematic error. Each error type needs to be evaluated along with the process that created the data in order to determine the appropriate root cause and solution.
Automated QA must occur in conjunction with visual inspection. Automated quality assurance allows quick inspection of large amounts of data. It will report inconsistencies in the database that may not appear during the visual inspection process. Both random and systematic errors are detected using automated QA procedures. Once again, the feedback loop has to be short in order to correct any flawed data conversion processes.
For a minor attribute the 1 percent error may not be crucial, but what if the attribute in error is a primary key and the error is a nonunique value for that key? This seemingly minor error cascades through the database jeopardizing relationships to one or more tables. Weighting attributes by importance solves this problem. Each attribute should be reviewed to determine if it is a critical attribute and then weighted accordingly.
Additionally, the cartographic aspect of data acceptance should be considered. A feature's position, rotation, and scaling must also be taken into account when calculating the percentage of error, not just its existence or absence.
Rigid control provides the user with only one point of entry into the database, which improves the consistency and security of the database. Maintenance applications rely upon a set of business rules that define the features, relationships between features, and update methods.
Maintenance applications are very dependent upon the static database design and the degree to which the database conforms to the design. These applications are usually supported by a database management system composed of permanent and local or temporary storage systems.
Data are checked out from permanent storage, copied in local storage for update, and then posted back to the permanent storage to complete the update. In this environment, preposting QA checks are required to ensure database integrity. The data storage manager must maintain the database schema so that table structure and spatial data topologies are not destroyed. Automated validation of attribute values should also be a part of the prepost QA. Visual checkplots can be useful when large amounts of data are either added or removed.
Quality Assurance Worth the Cost?
High-quality GIS databases facilitate sharing. Without some assurance of cleanliness, marketing or sharing data is difficult. A high-quality database with solid quality statistics helps break down barriers to data sharing.
The decision making process supported by GIS is frequently liable for its decisions. The results of locational analysis such as floodplain evaluation or hazardous waste siting can be disastrous with poor data. Critical customer service information, such as medical priority in facilities management or addressing in E-911 databases, can mean the difference between life and death.
In the past, issues of quality were often minimized because of the additional short-term costs associated with quality assurance. The average GIS database is very expensive to create and maintain. The reluctance to incur additional cost is understandable. However, the potential cost of poor analysis, application revision, and data reconstruction caused when QA is shortchanged in the project implementation far outweighs the initial cost of a well-designed and well-executed QA plan. Protecting the organization's investment in its data is as prudent as insuring a home or business against catastrophe.
Bill Masters is the president of Dog Creek Design and Consulting, Inc., and lives outside Milwaukee, Wisconsin. Masters has a bachelor's degree in Engineering Physics from the University of Arizona. Masters was the senior programmer on Esri's Digital Chart of the World project. Prior to that, he was a photogrammetric analyst in the Air Force, worked at Oak Ridge National Laboratory, and worked as a senior analyst at a major data conversion vendor.