News

ArcUser Online


Search ArcUser

 

E-mail to a Friend

Enterprise Geodatabase 101
A review of design and key features for GIS managers and database administrators
By Derek Law, Esri Product Management

Summary

This article provides a general overview of the enterprise geodatabase—its key features, architecture, and implementation—designed for GIS managers and database administrators.

Because the enterprise geodatabase is one of the foundation elements for seamless, organization-wide use of GIS, management staff need a good understanding of its role and capabilities.

Figure 1, click to enlarge
Figure 1: Enterprise geodatabase tiers

The geodatabase is the native data format for ArcGIS. It is a data storage container that defines how data is stored, accessed, and managed by ArcGIS. The term geodatabase combines geo (spatial data) with database (specifically a relational database management system or RDBMS). ArcGIS 9.2 has three types of geodatabases: Microsoft Access-based personal geodatabases, file geodatabases, and ArcSDE geodatabases.

Personal and file geodatabases are designed for single users and small projects. ArcSDE geodatabases are scalable and designed for larger-scale use, ranging from medium to enterprise-wide implementations. These geodatabases require ArcSDE technology and are available at three levels (in ascending order of capacity and functionality): personal geodatabase (ArcSDE Personal), workgroup geodatabase (ArcSDE Workgroup), and enterprise geodatabase (ArcSDE Enterprise). This article deals with ArcSDE enterprise geodatabases.

Understanding Enterprise Geodatabase Architecture

At a conceptual level, an enterprise geodatabase consists of a multitier architecture that implements advanced logic and behavior in the application tier (e.g., ArcGIS software) on top of a data storage tier (e.g., RDBMS software). The application tier can be further subdivided into two parts—ArcObjects and ArcSDE technology. The responsibility for managing geographic data in an enterprise geodatabase is shared between ArcGIS and whichever RDBMS is used.

On the data storage tier, RDBMS software provides a simple, formal data model for storing and managing information in tables. The schema of an enterprise geodatabase is persisted in the RDBMS as a collection of tables known as the ArcSDE Repository. Aspects related to data storage and retrieval are implemented as simple tables and certain aspects of geographic data management, such as disk-based storage, definition of attribute types, query processing, and multiuser transaction processing, are executed by the RDBMS. IBM DB2, IBM Informix, Oracle, and Microsoft SQL Server platforms are currently supported by ArcGIS. At version 9.3, PostgreSQL will be supported.

ArcSDE technology forms the middle tier. Prior to ArcGIS 9.2, ArcSDE was a separate software product. At ArcGIS 9.2, ArcSDE was integrated into both ArcGIS Desktop and ArcGIS Server and is now formally known as ArcSDE technology. As the gateway between GIS clients and an RDBMS, ArcSDE serves spatial data and enables that data to be accessed and managed within an RDBMS. It is implemented as several components—a directory of executables, a set of tables and stored procedures in the database (i.e., the ArcSDE Repository), and an optional service. These components will be discussed in more detail.

TableFunction
server_configContains parameters and values that define how the ArcSDE server uses memory and is read each time the ArcSDE service starts. During the ArcSDE postinstallation, its contents are populated by a file named giomgr.defs.
dbtuneLists configuration keywords for data objects in the geodatabase such as feature classes, raster datasets, topologies, and networks. Configuration keywords are used during data loading and define how the datasets are stored in the geodatabase. Geodatabase administrators can use a file named dbtune.sde to manage the configuration keywords used by the enterprise geodatabase.
table_registryManages all the registered tables of the enterprise geodatabase including all geodatabase system tables and datasets (e.g., feature classes and raster datasets) registered with the geodatabase.
LayersMaintains data about each feature class in the geodatabase. This information helps build and maintain spatial indexes, ensures proper shape types, and maintains data integrity.
raster_columnsMaintains data about each raster dataset in the geodatabase and helps keep track of the supporting tables for a raster dataset such as the band, block, and auxiliary tables.
Figure 2: Key tables in the ArcSDE Respository

ArcSDE technology provides fundamental capabilities that include

  • Access and storage of simple feature geometry in the RDBMS
  • Support for native RDBMS spatial types (if available)
  • Spatial data integrity
  • Multiuser editing environment (i.e., versioning)
  • Support for complex GIS workflows and long transactions
  • Geospatial data integration with other information technologies

The upper level of the application tier, ArcObjects, implements geodatabase application logic. This set of platform-independent software components is written in C++ and provides services to support GIS applications as thick clients on the desktop and thin clients on the server. This technology component is built into GIS clients (e.g., ArcGIS Desktop) and implements more complex object behavior and integrity constraints on simple features, such as points, lines, and polygons, stored in an RDBMS. In other words, ArcObjects implements behavior on the feature geometries. Feature classes, feature datasets, raster catalogs, topologies, networks, and terrains are all examples of geospatial data elements within the geodatabase data model for which ArcObjects provides the application logic that implements GIS behavior on top of simple spatial features stored in an RDBMS.

The three enterprise geodatabase architectural tiers are defined at a conceptual level. To most end users, working with the architectural tiers of the enterprise geodatabase is an easy, transparent process. GIS managers or database administrators most likely work directly with these tiers only during the setup and configuration of an enterprise geodatabase or when performing maintenance.

Enterprise Geodatabase Capabilities

Figure 3, click to enlarge
Figure 3: Two methods for communicating with an enterprise geodatabase: an application server connection or a direct connection

Designed for large-scale systems, the enterprise geodatabase can be scaled to any size, support any number of users, and run on computers of any size and configuration. It takes full advantage of the underlying RDBMS architecture to provide high performance and support for extremely large continuous GIS datasets. RDBMS functionality supports GIS data management for scalability, reliability, security, backup, and data integrity. In addition to supporting many users with concurrent access to the same data, an enterprise geodatabase can be integrated with an organization's existing IT systems.

Some of the aspects of ArcSDE technology that contribute to these capabilities include the following.

Versioning

With versioning, the ArcSDE geodatabase can manage and maintain multiple states while preserving integrity in the database. Versioning is the default ArcSDE geodatabase editing environment that explicitly records states (i.e., versions) of individual features and objects as they are modified, added, and/or retired. It enables multiple users to access and edit the same data simultaneously and provides long transaction support. Simple queries are used to view and work with any desired state for a particular point in time or see an individual user's current edits.

Nonversioned Editing

Using nonversioned editing is equivalent to a standard database transaction. The transaction is performed within the scope of an ArcMap edit session and the data source is directly edited. Nonversioned edit sessions do not store changes in other tables as versioned edit sessions do.

Geodatabase Replication

With geodatabase replication, data is distributed across two or more geodatabases in a manner that allows them to synchronize any data changes that are made. It is built on top of the versioning environment and supports the full geodatabase data model including topologies and geometric networks. In this asynchronous model, the replication is loosely coupled. This means each replicated geodatabase can work independently and still synchronize changes with other replicated geodatabases.

Because geodatabase replication is implemented at the ArcObjects and ArcSDE technology tiers, the RDBMSs involved can be different. Geodatabase replication can be used in connected and disconnected environments and can also work with local geodatabase connections as well as geodataserver objects (through ArcGIS Server), which enables access to a geodatabase over the Internet.

Historical Archiving

When enabled for a dataset, historical archiving captures all data changes in the DEFAULT version of the enterprise geodatabase by preserving the transactional history as an additional archive class. ArcGIS applies transaction time when changes are saved or posted to the DEFAULT version to record the moment of change to the database.

Installing an Enterprise Geodatabase

The setup and configuration of a typical enterprise geodatabase has two stages. In the first stage, enterprise ArcSDE software is installed on the server. The second stage (i.e., ArcSDE postinstallation) consists of four steps.

  1. Create or configure a database with a geodatabase administrative user. Typically, this user is named sde. For SQL Server-based enterprise geodatabases, this user could also be named dbo (i.e., the database owner) instead of sde.
  2. Populate the database with the ArcSDE Repository.
  3. License the ArcSDE server.
  4. (Optionally) create the ArcSDE service.

On Windows operating systems, the ArcSDE postinstallation can be performed using a wizard. Alternatively, it can also be performed manually with ArcSDE commands. Additional parameters may also need to be specified during the postinstallation, depending on the RDBMS and operating system used. After the enterprise geodatabase has been created, database management tools can be used to create users, schemas, and indexes to customize the enterprise geodatabase.

Enterprise Geodatabase Components

A typical enterprise geodatabase installation has three main components—the ArcSDE home directory, the ArcSDE Respository, and the ArcSDE service.

The ArcSDE Home Directory

When the ArcSDE component of ArcGIS Server is installed on the server, this directory is created. It is referenced in the server operating system by an environment variable named %SDEHOME%. The directory contains the ArcSDE command line executables, ArcSDE configuration files, geocoding and language support files, log files (for troubleshooting ArcSDE server issues), help documentation, and some sample utilities.

The ArcSDE command line executables are a collection of binary files that can be run at the command prompt by geodatabase administrators to create, configure, manage, and monitor both the enterprise geodatabase and ArcSDE service. ArcSDE command line executables include a set of commands for data import and export at the ArcSDE technology tier of the enterprise geodatabase.

The ArcSDE Repository

The internal system tables and stored procedures that are installed in the RDBMS during the ArcSDE postinstallation are owned and managed by the geodatabase administrative user created in the first step of the ArcSDE postinstallation. They are self-managed internally by both ArcGIS and the RDBMS via stored procedures and should not be edited manually.

ArcSDE Repository tables can be subdivided into ArcSDE system tables and geodatabase system tables (i.e., system tables prefixed with GDB_). ArcSDE system tables work at the ArcSDE technology tier level and contain basic metadata for ArcSDE, store feature geometry and raster data, and manage the versioning environment. The geodatabase system tables work at the ArcObjects tier level and store information on geodatabase behavior and functionality for topologies, networks, and domains. These two groups form the schema of the enterprise geodatabase.

Enterprise geodatabase administrators should be familiar with the key ArcSDE Repository tables listed in Figure 2.

The ArcSDE Service

Also commonly called the giomgr process (an abbreviation for geographic input/output manager), the ArcSDE service is a persistent service on the ArcSDE server that is dependent on the RDBMS instance. The giomgr process supports application server connections to the enterprise geodatabase.

The ArcSDE service listens for incoming client connection requests on a dedicated port and helps enable clients to connect to the geodatabase. A typical enterprise geodatabase installation has one associated ArcSDE service; however, the ArcSDE service is not required if only direct connections are made to the enterprise geodatabase.

Type of Client Connections

Clients typically communicate with an enterprise geodatabase over a network using TCP/IP protocols and can connect to an enterprise geodatabase in two ways—using an application server connection or a direct connection.

Application Server Connection

This traditional client-connect method involves the ArcSDE service, which listens for client connection requests. When a client application, such as ArcGIS Desktop, requests a connection to the enterprise geodatabase, a gsrvr (an abbreviation for geographic server) process is launched by the ArcSDE service and provides a dedicated link between the client and the geodatabase. The ArcSDE service continues to listen for connection requests.

The connection to the geodatabase is based on the user name and password submitted. Dataset access depends on the permissions established for the user by the geodatabase administrator. The gsrvr process remains connected to the geodatabase until the client releases the connection by closing the application. This connection method is commonly called a three-tier connection because it involves the client application, the geodatabase, and the giomgr and gsrvr processes. In this method, most of the work is performed on the server.

Direct Connection

With this method, clients connect directly to the enterprise geodatabase without using the ArcSDE service. Communication between the clients and geodatabase occurs via ArcSDE direct-connect drivers, located on the client side, not through the ArcSDE service. Client machines must be configured for network access.

ArcSDE direct-connect drivers are automatically installed for the whole ArcGIS product suite, the ArcView 3.x Database Access extension, ArcIMS, ArcInfo Workstation, and MapObjects 2. For custom applications built from the ArcSDE C API, the ArcSDE direct-connect drivers need to be enabled with the application to support this functionality.

Direct connection drivers are built from the same software code used to build the ArcSDE service. The difference is that direct connect drivers are built as dynamic-link library files and execute in the process space of the client application, whereas the ArcSDE service was built as an executable program that runs on the ArcSDE server.

With this connection method, commonly called a two-tier connection because it only involves the client application and the geodatabase, some of the work that would have occurred on the server with the application server connection is performed on the client.

To have ArcSDE server handle the majority of the ArcSDE processing load, use application server connections. When the client machines have enough resources to handle some of the ArcSDE processing load, use direct connections. Direct connections may cause more network traffic. Both client connection methods can be supported for the same enterprise geodatabase in any combination and configuration.

Conclusion

The enterprise geodatabase is the foundation for building a large-scale GIS with ArcGIS Server Enterprise. It uses a combination of ArcObjects, ArcSDE technology, and RDBMS software to define how data is stored, accessed, and managed by ArcGIS. Conceptually, it stores GIS data in a centralized location. However, it can be set up and configured for a variety of implementations.

The enterprise geodatabase can be used for applying sophisticated business rules and relationships to spatial data, defining advanced georelational models such as topologies and networks, and providing a multiuser access and editing environment. With these capabilities, the enterprise geodatabase spatial data can be leveraged to its full potential while maintaining a consistent, accurate GIS database.

More Resources

For more information on enterprise geodatabases, see these Esri resources:

Visit www.esri.com/training for information on these and other courses.
[an error occurred while processing this directive]