Implementing European Metadata
by Bert Vermeij, Business Consultant for Esri Nederland B.V.
Editor's note: This article describes how a Dutch version of the European metadata standard CEN/TC287 was implemented by customizing ArcCatalog in ArcInfo 8. With ArcGIS 8.1, in addition to the Federal Geographic Data Committee (FGDC) standard, other metadata schemas are supported. Users can develop custom metadata synchronizers that read dataset properties and store those properties automatically in a metadata document according to another specified standard.
ArcCatalog has an open extensible architecture and provides a framework for the implementation of a custom metadata environment. Metadata describes and documents spatial data. Metadata assists data managers in archiving data and data users in searching for data. Spatial metadata is one of the key components of a geoinformation infrastructure. Data managers prefer to keep the metadata with the data, while data users are primarily interested in a metadata environment that can be queried. Publishing metadata in a clearinghouse provides a searchable database of information about geodata and promotes data sharing.
With ArcCatalog, Esri introduced a system that automatically associates metadata with spatial datasets. The use of content standards that determine what metadata is collected is an important issue. Out of the box, ArcCatalog currently supports the FGDC content standard used by the United States for spatial metadata. However, in Europe, the CEN/TC287 standard, sponsored by the European Committee for Standardization, has been widely adopted. Esri Nederland B.V. integrated ArcCatalog with the relatively well-developed metadata standard used in the Netherlands. Central to this effort was the design and implementation of metadata schema and an editor that supports the Dutch content standard for spatial metadata, a locally redefined subset of the European CEN/TC287 content standard.
Because spatial data is the fuel of a GIS, it is important to know if the data will meet the system's needs. Metadata describes data using terminology that defines potentially disparate data and facilitates consistent collection, indexing, querying, and publishing. Metadata documents content, quality, source organizations, data format and organization, collection schedule, uses, data currency, spatial references, and distribution mechanisms for the data.
Keeping spatial metadata records is important. From a data management perspective, metadata is important for maintaining an organization's investment in spatial data. Data users need metadata to locate appropriate datasets. Metadata provides information about the data available within an organization or from catalog services, clearinghouses, or other external sources. Metadata not only helps find data, but once data has been found, it also tells how to interpret and use data. Publishing metadata facilitates data sharing. Sharing data between organizations stimulates cooperation and a coordinated, integrated approach to spatially related policy issues.
These different uses of metadata require different levels of detail in metadata content. Data managers need very detailed information on data format, internal structures, and data definitions. Users generally require some kind of catalog that contains information on where to find data, and how to use it, and who to contact. From an organizational perspective, many users only need a few metadata items. Only a few users need detailed metadata. No matter the detail required, the bottom line is that metadata makes spatial information more useful to all types of users.
Metadata in ArcCatalog
ArcCatalog, an application for locating, browsing, and managing spatial data, resembles Windows Explorer but can see into databases and quickly view data and metadata. ArcCatalog automatically associates metadata with all geographic datasets and creates metadata for any dataset supported by ArcInfo as well as any other dataset identified and cataloged by the user (e.g., text, CAD files, scripts, images).
ArcCatalog comes with the support for the FGDC metadata standard, an editor for entering metadata, a storage schema, and property sheets for viewing data. The ArcCatalog environment identifies two types of metadata--properties (inherent metadata) and documentation. Inherent metadata, which is derived from the data and generated automatically, includes items such as the dataset name, feature types, the geographic extent, and the projection. Documentation, descriptive metadata supplied by the user, can be items such as the organizations that collected the data and quality characteristics of the data.
Metadata is actually stored in XML with the data. All data management functions in ArcCatalog (e.g., copy, rename, move, and clip) honor it. Metadata always travels with the data. Users can view the metadata in any XML-aware environment. Within ArcCatalog, stylesheets present metadata to the user. A stylesheet can show all the required metadata items required or give different, extracted views of the same metadata. Using different stylesheets easily supports different metadata requirements for different groups of users.
Background: Metadata in the Netherlands
In the early 1990s a new term was introduced into the GIS world--metadata. If every geographic dataset was described by (and preferably accompanied by) metadata, it could be queried by data users. At first, metadata, also called meta information, was perceived as a "necessary evil," but as people became aware of the importance and the benefits of metadata, its popularity grew. Today many large, mostly governmental, organizations in the Netherlands that deal with spatial information have implemented metadata. Most of those organizations create metadata using GeoKey, a popular Dutch metadata management tool that makes metadata available to multiple users through an Intranet or the Internet.
Since GeoKey allows distributed search, several organizations can share metadatabases. Organizations in the Netherlands are currently discussing whether data that is not described by metadata should be accessible to users. Some organizations believe only data described by metadata should be stored in a data warehouse or on central network disks. This data would only be accessible through metadata.
The European CEN Metadata Standard
What information should be included in metadata for a database? In general, the description of a dataset covers the following topics.
These items describe a dataset, and they can be used in different ways. A standard provides a common set of metadata elements or variables that document geospatial data and provides definitions and a common terminology. A content standard makes metadata transparent to users.
The FGDC adopted a content standard for metadata in the United States. All federal agencies use this standard to document newly created geospatial data. In the Netherlands, the European CEN/TC287 standard has been chosen as the national standard. This European model is an official standard that is maintained by a standardization board. When the FGDC and CEN content standards are compared, the most important difference is that FGDC is more granular and mainly uses discrete variables, whereas the CEN model has some freedom in the number of metadata elements. In many cases one CEN tag maps to many FGDC tags, making writing a translator nontrivial.
The generic CEN model has been further specified in the Dutch preliminary standard NVNENV 12657. This model contains about 290 metadata elements. The Dutch version of the CEN/TC287 standard has been widely accepted as the metadata standard for the Netherlands and is used in the National Clearinghouse for Geo Information (NCGI) pilot project. Note that CEN will be streamlined with the future ISO standard for geospatial metadata. It is widely believed by the GIS community that the CEN standard and the official Dutch version of the CEN standard are not practical from a user perspective because both have too many variables. Initially, the ministries of Traffic, Water Management, and Public Works decided to define a simpler version of CEN containing a subset of 80 metadata elements. This model, known as CEN-RWS, has been implemented in the ministries' metadata management system and is being adopted by many other organizations.
Implementing CEN in ArcCatalog
GIS users sought ArcCatalog support for the CEN metadata standard. This standard was part of the vitally important National Spatial Data Infrastructure. Without support for European standards, users could not take advantage of the metadata tools in ArcCatalog.
Consequently, Esri Nederland B.V. decided to build a Dutch version of ArcCatalog. Building a custom environment in ArcCatalog requires four components--a logical model for metadata content, an implementation schema, an editor, and one or more stylesheets. The first step was determining which metadata elements to collect. These elements could be taken from a standardized set (e.g., FGDC or CEN standards) or be an internal standard defined by an organization. Initially, the CEN-RWS model was supported. As a subset of CEN, it was relatively easy to further develop it into a complete model and add organization-specific metadata elements in support of specific requirements.
The logical data model needs to be translated into an implementation schema in XML. The open, user-definable structure of XML makes it relatively simple to adjust a schema to specific needs. One of the powerful features of ArcCatalog is the automatic harvesting of inherent metadata (or properties) that is written to an XML file. Tags in the XML file correspond to metadata elements that follow the FGDC content standard and naming conventions. This synchronization turned out to be the big Achilles' heel in the process, as it is one of the few things in ArcCatalog that cannot be customized. Synchronization always uses FGDC tags and will not store this information using a different schema such as CEN.
This problem was solved in the editor design. The core part of the customization effort is the metadata editor. In ArcCatalog, a metadata editor allows users to enter and edit metadata for data sources, following a standard or another defined model. The editor stores the metadata in the XML schema, simplifying the creation of metadata with text boxes and drop-down lists for fields with predefined domain values based on the standard chosen.
Esri Nederland B.V. built an editor for the Dutch version of CEN in Visual Basic. The editor supports the official standard "Guidelines for the Implementation of the Dutch Preliminary Norm NVN-EVN 12657 Geographic Information-Data Description-Metadata." Wizards, menu interfaces, and pick lists help the users fill in metadata. The user interface can be likened to the Dutch government's tax code--it is straightforward and well-ordered, and most users are familiar with it. Help files on the use of the guidelines complete the interface.
Instead of changing the format in which ArcCatalog saves metadata properties, the editor copies the FGDC property tags to corresponding CEN tags. Although this process causes some data redundancy, this slight disadvantage is outweighed by the benefits of automatically generating metadata tags that support the CEN model.
Stylesheets let users see metadata by converting the automatically generated XML to a more easily readable HTML format. Different stylesheets support different users. The XML/XSL architecture allows multiple stylesheets for one XML schema. Dutch stylesheets in ArcCatalog enable viewing metadata in different ways. A database manager or users who need to know detailed information on the dataset will use a stylesheet that shows the complete metadata. An end user can use a simple stylesheet that exposes a subset with only a few key metadata elements.
The open extensible architecture of ArcCatalog provided a powerful framework for building a custom environment to capture metadata. Although the collection of metadata is important, the real fun starts when metadata searching can be performed using a simple metadata browser in a distributed environment.
Dr. Bert Vermeij, Business Consultant