ArcNews Online
 

Winter 2008/2009
 

The Geodatabase: Modeling and Managing Spatial Data

Highlights

  • The geodatabase helps maintain integrity of spatial data.
  • Users can apply sophisticated business rules and relationships to data.
  • The geodatabase supports concurrent, multiuser editing.

The geodatabase (GDB) is the common data storage and management framework for ArcGIS. Simply put, it is a container for spatial and attribute data. The geodatabase has been the primary data model for ArcGIS since the 8.0 release. The name combines geo (referring to spatial) with database—specifically, a relational database management system (RDBMS). The term promotes the idea of having all GIS data stored uniformly in a central location for easy access and management.

  click to enlarge
With a geodatabase, all of an organization's GIS data can be stored in a central location and in a uniform format for easy access and management. The geodatabase can be scaled to suit a single-user or multiuser access and editing environment.

The geodatabase supports all the different types of GIS data that can be used by ArcGIS, such as attribute data, geographic features, satellite and aerial images (raster data), CAD data, surface modeling or 3D data, utility and transportation network systems, GPS coordinates, and survey measurements. ArcGIS has a comprehensive suite of data conversion tools to easily migrate existing data into the geodatabase. By storing GIS data within a geodatabase, users can take advantage of its superior data management capabilities to leverage spatial information. This, in turn, can enhance and expand business and GIS application workflows.

The geodatabase is a more robust and extendable data model compared to shapefiles and coverages. While shapefiles and coverages are outstanding GIS data storage formats, they do not take advantage of the latest data storage technologies. The geodatabase is designed to make full use of the capabilities of ArcGIS Desktop and ArcGIS Server. The geodatabase is not just another spatial data format that can be used by ArcGIS; it is an integral part of the ArcGIS system.

GIS Data Storage

Vector data is stored in the geodatabase as thematic layers called feature classes. A feature class is a collection of geographic features with the same geometry type, such as a point, line, or polygon; the same attributes; and the same coordinate system. Feature classes can be grouped together within a feature dataset—a collection of feature classes—to model geospatial relationships between them. Raster data is stored as raster datasets; each raster image is stored as its own thematic layer. Multiple rasters can be grouped into a raster catalog (a collection of raster data), or if the rasters are adjacent to each other, they can be mosaicked into a single raster dataset. Table 1 contains a list of all the different types of GIS data that can be stored in the geodatabase.

Modeling Geospatial Relationships

Storing GIS data in the geodatabase enables users to take advantage of its advanced data modeling properties. Complex business logic can be applied to GIS data to create more detailed and accurate spatial data models that represent real-world GIS application workflows. Examples include land parcel management; natural resources management; river and stream system modeling; utility network system modeling, such as gas, water, and sewage pipelines; and three-dimensional surface modeling of the landscape.

By storing feature classes within a feature dataset, geospatial relationships can be modeled between the feature classes, enabling more advanced GIS analysis. The more common types of geospatial relationship data structures in the geodatabase are

  • Topology—Defines and enforces data integrity rules for features. For example, there should be no gaps between polygons. It supports topological relationship queries and navigation, such as feature adjacency or connectivity and sophisticated feature editing tools, and allows feature construction from unstructured geometry (for example, constructing polygon features from line features).
  • Geometric Network—Consists of a set of connected edges and junctions (line and point features) that, along with connectivity rules, are used to represent and model the behavior of a common network infrastructure in the real world. Water distribution, electrical lines, gas pipelines, telephone services, and water flow in a stream are all examples of resource flows that can be modeled and analyzed using a geometric network.
  • Network Dataset—Consists of a set of connected edges and junctions, as well as turn features, along with connectivity rules, that represent and model the behavior of transportation network systems. Highways, roads, and streets in a city; rail lines; and bus routes are examples of undirected network flows that can be modeled with a network dataset.
  • Terrain—A data structure that is generated from a mass collection of elevation measurement points, typically from remote-sensing data sources. It is a triangulated irregular network (TIN)-based data structure with multiple levels of resolution and is used to represent surface morphology. A terrain is used for 3D surface modeling applications.
  • Cadastral Fabric—A continuous surface of connected parcel features that represents the record of survey for an area of land. This data structure enables GIS data to be integrated with survey data to maintain a consistent and accurate survey record.
  click to enlarge
Table 1: Datasets in the geodatabase.

Additional business logic in the geodatabase, in the form of subtypes and attribute domains, can also be applied to GIS data. Subtypes enable categorization of data in a table or feature class. For example, the streets in a streets feature class could be categorized into three subtypes: local streets, collector streets, and arterial streets. Attribute domains are rules that describe the legal values of a field. Whenever a domain is associated with an attribute field, only the values defined by the domain are valid for the field. In other words, the field will not accept a value that is not in that domain. For example, a domain that specifies that values 10 to 50 (meters) are valid is applied to a field describing survey length measurements. Any value that lies outside the range defined by the domain is invalid for the field and would not be allowed. Both subtypes and attribute domains can be easily customized to meet the requirements of a user's specific business and GIS application workflows.

Collectively, these examples of business logic in the geodatabase help streamline data entry and ensure the integrity of a user's GIS data. Therefore, the geodatabase enables users to leverage and optimize their GIS data to its full potential and helps maintain a consistent, accurate repository of GIS data.

Types of Geodatabases

The geodatabase is designed to support both the individual GIS user and organizations of various types and sizes. Just like the ArcGIS system, the geodatabase architecture has been engineered to easily scale to meet the changing needs and requirements of diverse organizations. A user can start with a file geodatabase for an individual project and upgrade to a larger workgroup or enterprise geodatabase as the volume of GIS data increases or the project scope expands.

There are two main classes of geodatabase: multiuser and single user. As the names suggest, multiuser geodatabases are meant for medium to large organizations, while single-user geodatabases are intended for individual users.

Multiuser Geodatabases

Multiuser geodatabases use ArcSDE technology and are implemented on an RDBMS platform. [Note: Prior to ArcGIS 9.2, ArcSDE was a stand-alone software product. At the ArcGIS 9.2 release, ArcSDE was integrated into both ArcGIS Desktop and ArcGIS Server. ArcSDE technology manages spatial data in an RDBMS and enables it to be accessed by ArcGIS clients.] Supported RDBMS platforms include DB2, Informix, Oracle, PostgreSQL, and SQL Server. Multiuser geodatabases leverage the underlying RDBMS architecture to provide better data security, such as access permission control for individual datasets, distributed file management, backup/recovery capabilities, and data integrity. ArcSDE technology provides additional geodatabase functionality that is not available in single-user geodatabases. This includes

  • Versioning—With versioning, a multiuser geodatabase can manage and maintain multiple states while preserving integrity in the database. A version represents an alternative, independent, persistent view of the geodatabase; supports multiple concurrent editors; and does not involve data duplication. Versioning is the default editing environment in a multiuser geodatabase. It explicitly records states of individual features and objects as they are modified, added, and/or retired. It is the framework that enables multiple users to access and edit the same data simultaneously and provides long transaction (i.e., database changes that span long periods of time) support. Simple queries are used to view and work with any desired state for a particular point in time or see an individual user's current edits.
  • Geodatabase Replication—This is a data distribution method provided through the ArcGIS system. With geodatabase replication, GIS data can be distributed across two or more geodatabases in a manner that allows them to synchronize any data changes that are made. This functionality is built on top of the versioning environment and supports the full geodatabase data model, including geospatial relationships, such as topologies and geometric networks. Three types of replication workflows are available: two-way, one-way, and checkout/check-in replication. In this asynchronous model, the replication is loosely coupled; this means that each replicated geodatabase can work independently and still synchronize changes with one another. Since geodatabase replication functionality is implemented at the ArcGIS software level, the RDBMS platforms involved can be different. For example, a user could replicate data from a multiuser geodatabase implemented on Oracle to a multiuser geodatabase implemented on SQL Server. Geodatabase replication can be used in connected and disconnected environments. It works with local geodatabase connections over a network, as well as over the Internet, using geodata services available in ArcGIS Server.
  • Geodatabase Archiving—When enabled on a dataset, geodatabase archiving captures any and all changes made to the dataset in the default version of the multiuser geodatabase. The edits are recorded in an archive class, which is a duplicate copy of the original dataset. The archive class contains additional fields that record a feature's edit history in the geodatabase. This functionality can be used with historical versions in the geodatabase. A historical version provides a read-only view of the geodatabase at a specific moment in time.

There are three types of multiuser geodatabase: enterprise, workgroup, and desktop. The storage capacity and number of possible concurrent users vary with each type.

  • Enterprise Geodatabase—This is intended for large-scale enterprise application scenarios and can be implemented on DB2, Informix, Oracle, PostgreSQL, and SQL Server RDBMS platforms. It can be scaled to any size and can support any number of users, running on computers of any size and configuration. It is the most robust of all the geodatabase types. It is set up and maintained with a combination of both RDBMS software and ArcGIS and is typically administered and managed by a dedicated database administrator (DBA). It is designed to be easily integrated into an enterprise IT structure so GIS data can be shared across the entire enterprise system. It is available as part of ArcGIS Server Enterprise, all editions.
  • Workgroup Geodatabase—This is intended for the small- to medium-sized departmental application scenarios. It is implemented on Microsoft SQL Server 2005 Express, and all of the setup parameters for the geodatabase are preconfigured for the user, making it easy to install and immediately use out of the box. Geodatabase and RDBMS management, such as defining users and their data access permissions, is performed entirely within ArcCatalog. A workgroup geodatabase has a maximum size limit of 4 gigabytes and supports up to 10 simultaneous users, all of which could be editors. It is available as part of ArcGIS Server Workgroup, all editions.
  • Desktop Geodatabase—This is designed for small teams or a single user who requires the functionality of a multiuser geodatabase. It is also implemented on Microsoft SQL Server 2005 Express, has a maximum size limit of 4 gigabytes, and uses ArcCatalog for setup and management. The difference between a workgroup geodatabase and a desktop geodatabase is that the desktop geodatabase only supports up to three concurrent users (one editor and two viewers). It is available with the ArcEditor and ArcInfo levels of ArcGIS Desktop, as well as ArcGIS Engine.

Single-User Geodatabases

The single-user geodatabase class has two types—the file geodatabase and the Microsoft Access personal geodatabase. Both types of geodatabase are intended for an individual GIS user, and both are available with all license levels of ArcGIS Desktop.

  click to enlarge
Table 2: Summary of multiuser and single-user geodatabase types.

File Geodatabase—This is implemented as a collection of binary files in a file system. It has no size capacity limit. By default, each table can store up to 1 terabyte of data. However, this can be changed so that a table can store up to 256 terabytes, if desired. Vector data stored within the file geodatabase can optionally be compressed into a read-only format, reducing the memory footprint and improving performance. Users can uncompress the vector data to make it editable at any time. It is also possible to have more than one editor in the file geodatabase at the same time, provided they are editing in different tables, feature classes, or feature datasets. The file geodatabase does not support versioning and geodatabase archiving. It can be used as a child geodatabase in both one-way and checkout/check-in geodatabase replication. Esri recommends that users who will be starting new GIS projects for their own local use should use file geodatabases over Microsoft Access personal geodatabases, because they offer more functionality and better performance.

Microsoft Access Personal Geodatabase—This is implemented in a single Microsoft Access file and has a maximum size capacity of 2 gigabytes. It works for small GIS projects but does not support multiuser editing, versioning, or geodatabase archiving. Esri will continue to fully support Microsoft Access personal geodatabases for the foreseeable future.

The GIS data storage model is fully supported by all five geodatabase types. GIS datasets can be transferred between the various geodatabase types using the simple migration tools in ArcGIS Desktop, such as copy/paste and import/export.

Conclusion

The geodatabase is the primary data storage model for ArcGIS. It is a container of spatial and attribute data and enables the user to store many different types of GIS data within its structure. Its structure is implemented in an RDBMS or as a collection of files in a file system. With its comprehensive GIS data model, geospatial modeling capabilities, and scalable architecture, the geodatabase is the foundation that enables the assembling of intelligent geographic information systems that can be adapted for many different GIS businesses and other GIS applications.

More Information

For more information on the geodatabase, visit www.esri.com/geodatabase.

Please see the related poster [PDF].

Contact Us | Privacy | Legal | Site Map