Raster Data Management with ArcSDE
By James Neild, Esri Product Management
Editor's note: With the release of ArcGIS 8.2 and ArcIMS 4.0, Esri has a complete solution for storing vector, raster, tabular, and metadata data in a relational database management system (DBMS). The combination of ArcSDE, Esri's gateway for storing raster information in a DBMS, the ArcGIS Desktop applications, and ArcIMS provide a comprehensive approach for managing and distributing raster data. This article briefly describes how ArcSDE stores and manages raster information and the tools available for data management. Knowledge of basic raster concepts is assumed.
Introducing Raster Data Management
A raster data management system is a collection of scalable tools that allow users to store, access, analyze, and extract raster data on demand. ArcGIS provides tools for all aspects of raster data management, from ArcGIS Desktop applications for access, management, and analysis to ArcIMS for data distribution and extraction to ArcSDE for raster management and access. Client applications that can access this raster data through ArcSDE include ArcGIS, ArcIMS, and custom applications built with the ArcSDE C API.
Secure, fast, real-time multiuser access to seamless raster datasets that can range in size from several megabytes to many terabytes requires a DBMS. Although ArcSDE may not be necessary if raster storage and access requirements are less demanding, ArcSDE can act as a central metadata storage system that provides information on data location as well as reduced resolution "quick look" images of the data. The requirements for different uses of ArcSDE for managing raster data vary depending on the type of organization and its users. ArcSDE is commonly used for managing raster data used in basemaps and as feature attributes (i.e., photographs of features such as buildings or valves) and in image repositories maintained by data providers. The following are just a few examples of how ArcSDE improves raster management for these types of applications.
A water company, with a lot of legacy data stored on paper and Mylar maps, needed to supply this data as background images that would be combined with vector data into a seamless hybrid map. Although these images will eventually be replaced by vector data, this process will take time to complete. The legacy data was scanned and 3,000 one-bit TIFF images were generated. The resulting files-about 40 GB worth of data-is centrally maintained using ArcSDE and supplied as background images to users running ArcMap. These users are inserting new vector data as well as updating existing vector data with new information.
Assessor's offices, water districts, and other organizations often have photographs of assets, such as buildings, that need to be linked to spatial features to provide additional information on those features. ArcSDE can be used to manage the raster data for these types of inventory applications.
An association of governments needed a central repository for imagery that could be easily accessed by members as well as citizens. To accomplish this goal, the images were converted to downloadable formats and the association set up a Web site that provided access to these image files. The association's data holdings-nearly a terabyte of data stored as eight-bit, three-band, one-meter DOQQs-is currently held by the association and managed using ArcSDE.
Raster Storage Architecture in ArcSDE
Rasters are stored in a series of business and user tables in ArcSDE. These system tables, listed in Figure 1, are maintained by ArcSDE and should not be directly modified. When storing raster information, the block table will grow the fastest and remain the largest because it stores binary large objects (BLOBs). The other tables will remain relatively small in size.
ArcSDE's storage parameters allow the user to specify how the data will be stored. The parameters for pyramids, tile size, and compression can affect storage requirements and client application performance. Determine baseline performance for ArcSDE before changing the default settings for these parameters so that any gains or losses in performance can be measured.
|Figure 2: When pyramids are created, the spatial extent remains the same.|
ArcSDE generates pyramids-reduced resolution representations of data-to speed up display of raster data. Pyramids allow ArcSDE to fetch only the data at the specified resolution required for the display. Pyramid building is performed on the ArcSDE server side whenever the underlying raster is modified or updated. For large datasets, this can take a long period of time and should be considered when deciding whether to mosaic the data or use raster catalogs. If the original data is compressed, the server will first decompress the data, build the pyramids, and compress the data again to insert into the block table.
The base layer of the pyramid has the highest resolution. Resampling the original data creates pyramid layers. One of the three supported resampling methods is used to instruct the server how to resample the data. The type of data determines which of the three methods--nearest neighbor, bilinear interpolation, and cubic convolution--is most suitable for a specific dataset.
- Nearest neighbor assignment should be used for nominal or ordinal data. For these types of data, each value represents a class, member, or classification (categorical data, such as a land use, soil, or forest type).
- Bilinear interpolation interpolates four adjacent pixels and should be used for continuous data such as elevation, slope, intensity of noise from an airport, and salinity of the groundwater near an estuary.
- Cubic convolution interpolates 16 adjacent pixels and should be used for continuous data such as satellite imagery or aerial photography.
The tile size controls the number of pixels stored in each database BLOB field and is specified in x and y pixels when loading the data. The default value of 128 pixels x 128 pixels should be satisfactory for most applications. The optimal tile size setting depends on factors such as data type (bit depth), database settings, and network settings. A smaller tile size, such as 100 x 100, will result in smaller BLOBs and more records in the raster block table, which will slow down queries. A larger tile size, such as 300 x 300, will result in larger BLOBs that require more memory to process although fewer records will be created in the block table. Experiment with tile size before changing the default setting.
Data compression, optional but recommended, is performed on tiles as they are stored in the database. The two compression methods available are LZ77 and JPEG. The LZ77 algorithm, the same method used for PNG image format and ZIP compression, produces a lossless compression so that the unique values of cells in a raster dataset can be recovered. JPEG compression can have very high ratios but is lossy. Using this method, the values of cells in the raster dataset may be changed slightly. JPEG compression can only be applied to eight-bit data without a color map. The user can specify a quality setting for JPEG compression that ranges from 5 to 95 with 95 producing the best quality image. The default setting is 75.
Compressed data requires less storage space and produces smaller files resulting in better display performance for client applications. The amount of compression depends upon the data. The fewer unique cell values, the higher the compression ratio. The ArcSDE client performs compression and decompression. The ArcSDE client sends compressed data to the server at loading, and the server always returns compressed data to the client at retrieval. If retention of pixel values is important (e.g., categorical data or data used for analysis) use LZ77 compression. If individual pixel values are not important, as in the case of simple background images, use JPEG compression.
Continued on page 2