Expanding Access to Large Geospatial Repositories
By Bonnie Burns and David E. Siegel, Harvard University
Harvard University's Geospatial Library and Center for Geographic Analysis
|HGL is a catalog and repository of geospatial data that contains georeferenced historic maps, vector data, and satellite imagery.|
A university environment exerts particular pressures on the campus GIS infrastructure. The users' experience levels range from complete novice to expert. Data requirements are just as varied. Urban planning students need building footprints, social scientists need census demographic information, and environmental studies concentrators need global elevation and land cover. Finally, the types of tasks being performed run the gamut from simple cartography to complex modeling and analysis. Meeting these needs demands flexibility in data collections, user support, and data access.
The Harvard Geospatial Library (HGL), a catalog and repository of geospatial data that contains georeferenced historic maps, vector data, and satellite imagery, is continually being expanded and enhanced. It has been identified as the foundation of a university-wide platform that supports all areas of teaching and research using geospatial analysis.
Faced with the expectations of a diverse user community, all aspects of the system must be extended to a broad array of applications and services without abandoning the library's roots. The Center for Geospatial Analysis (CGA) at Harvard has been tasked with synthesizing data from different sources, such as the Harvard MIT Data Center; providing community and user-specific interfaces to the repository; and capitalizing on open communications protocols. Enhancing existing access capabilities and developing new ways for client interaction are the center's top priorities.
While most GIS activities on campus are decentralized and handled by individual schools or departments, HGL is available to the entire university community and the public. HGL, supported by the Harvard Map Collection and the Harvard University Library, maintains 5,000 data layers such as global ecoregions; historic maps of the United States; and building footprints for the city of Boston, Massachusetts. Users can locate data by searching Federal Geographic Data Committee (FGDC)-compliant metadata using geography or text strings. This metadata is created by a library cataloger. Once located, relevant datasets can be displayed together in the browser or downloaded and clipped to an area of interest.
In this environment, CGA was created. A part of the Institute for Quantitative Social Science, CGA is now a much-needed centralized home for GIS activities on campus. CGA staff provide help desk-style support as well as consultation on longer-term projects for researchers without dedicated GIS staff in their departments. They are also researching ways to spatially enable large social science data repositories on campus. The creation of CGA has generated new demands on HGL, which is the campus data distribution infrastructure. In a time of limited resources, HGL's challenge has been to meet those demands and expand access and functionality while continuing its commitment to creating high-quality metadata and providing persistent access to historical data.
Persistent Custom Map Services
From the beginning, HGL has allowed users to create custom map services from the layers in the repository. Users can select multiple layers from search results lists, and HGL will create an ArcIMS map service of those layers and display it within the Web browser using a customized JavaServer Pages (JSP) viewer. Requests have been made for added functionality that would let users access those services from within ArcGIS Desktop and have those services persist for a specified length of time. This would provide read-only access to data through a Web service without downloading shapefiles. One of the main areas where this is very useful is in teaching labs.
A teaching fellow or instructor can preselect relevant data and create an ArcIMS service in HGL so students can access the service in class. This provides all students with the same layers at the same time and eliminates the need to install the data locally either on PCs or on department networks. It also ensures that the students in the class are working with the right data. Users can also create individual ArcIMS services, access data through ArcMap, and avoid downloading large files. This also eliminates the need for HGL to manage many user names and passwords for direct connections to the ArcSDE instance.
The main drawback to this solution is the limited cartographic options allowed by ArcIMS. The solution to this limitation is to implement an ArcMap service. Another drawback is the need to determine who is allowed to create services. Should it be limited to instructors and teaching assistants or open to anyone in the Harvard community? Services containing large datasets can eat up system resources. If many users start up many services, it could impact performance.
Direct Connections to the Repository
|Users can select multiple layers from search results lists, and HGL will create an ArcIMS map service of those layers and display it.|
Another step in providing more open access to the data in HGL was to allow direct connections to the data repository, bypassing the Web interface and the download mechanism. Staff at CGA had a need for read-only access to the same datasets repeatedly for display or as a starting point for data extraction or analysis. The HGL data repository is an SDE instance running on top of Oracle, and the simplest solution was to create a read-only account for the CGA staff that let them connect directly to SDE. They were given a user name and password and all the necessary connection information. This provided them with the access they wanted, but there are drawbacks to this type of solution. A direct connection to ArcSDE? lists all the layers in alphabetical order. Once an instance holds more than a hundred or so layers, navigating becomes difficult and browsing the list can be slow. In addition, there are limits to the number of characters that can be used in an SDE layer name, which leads to some fairly cryptic entries. Naming conventions are helpful, but a user needs to be very familiar with the database to really be able to use a direct connection effectively.
There are things that can be done to make this a more useful and open method of access. First of all, creating layers as feature classes within a geodatabase allows for longer file names, which can alleviate part of the problem. Another solution to be implemented is the development of a search/browse tool within ArcMap that helps users identify the data they want by searching the HGL metadata schema.
Alternative Web Mapping Services
A third area where users have been requesting added functionality is in providing Web maps outside the HGL front end. Many collections and museums on campus want to provide interactive mapping within their Web pages but don't want to have to manage all the data, software, and hardware associated with a Web mapping server. The investment in the HGL infrastructure will be leveraged to serve maps to clients using open standards such as the Open Geospatial Consortium's Web Map Service (WMS) and Web Feature Service (WFS). Simple viewer code will also be provided to client departments that will allow them to implement (as is or with improvements) within their own Web pages.
As an example, the Milman Parry Collection at Harvard is a unique repository of South Slavic and Balkan oral traditions. It owns many valuable recordings of songs and stories collected in the 1920s and 1930s throughout the Balkan region, and each recording is tagged with the location where it was made. The notes and interviews that accompany each recording allow researchers to trace the interesting spheres of influence in the region over time. Curators of the collection would like to allow visitors to the Web site to plot on a map the location of any particular recording from their online database. Users could then display other data related to contemporary political boundaries or linguistic or religious boundaries, depending on their interests.
Client departments will be responsible for creating any additional data layers that are to be used in the maps such as the political boundaries in the Milman Parry Collection example. These layers will be added to the HGL data repository and distributed in the same way as all other layers, thus increasing the HGL collections and the visibility of the new layers. Clients will also be responsible for setting up the service on HGL and implementing the WMS client code within their own design.
In addition to improving access to the data in the HGL repository, another goal is sharing metadata within the catalog as much as possible. The HGL project staff includes a geospatial resources cataloger. HGL has built a strong catalog of FGDC Content Standard for Digital Geospatial Metadata (FGDC-STD-001-1998)-compliant documentation. This metadata is available to the public, even though the data is not available through the Web interface. This catalog of FGDC records could be used by other libraries for copy cataloging or searched by other GIS portals to help users from around the world locate interesting and useful data.
There are many protocols for sharing the metadata catalog. HGL currently has some federated cross-catalog searches enabled across various library catalogs. Users can simultaneously search the HOLLIS catalog, the Harvard online library system; the Visual Information Archive (VIA); the Online Archival Search Information System (OASIS), a database of finding aids for manuscripts; as well as HGL. In the future, the HGL catalog will be opened to Z39.50 federated searches [portal-based searches that use the Z39.50 client server protocol for searching remote databases and retrieving data] and Open Archives Initiative (OAI) harvesting of the catalog. Other databases at the university have already implemented OAI harvesting on their systems, and HGL is working with those groups to identify holdings that can be spatially enabled and whose metadata could usefully be stored in the HGL database.
To maintain maximum flexibility while still adhering to standards, all FGDC metadata files are stored complete within a single column in a single Oracle database table. Using Oracle interMedia indexing allows simplification of search syntax while allowing explicit searches for specific tags in a document. The next step is to explore using XML Path Language (XPath) instead of interMedia and XMLTypes instead of character large objects (CLOBs).
Through a variety of strategies, HGL is adapting to the new demands that have arisen from a more GIS-literate community. CGA is increasing awareness of the power of geospatial analysis at a university that hasn't had a geography department for 50 years. This increased awareness can only benefit the community. HGL's challenge is to keep up with the needs of this new group of users and make finding and accessing data as easy and flexible as possible.
About the Authors
Bonnie Burns is the GIS coordinator for the Harvard Map Collection, part of the Harvard College Library, a position she has held since 1999. She began her career with the National Conference of State Historic Preservation Officers, working closely with the National Park Service using GIS to aid in the preservation of cultural and natural resources around the country. Her professional interests include the application of geographic analysis techniques to questions of historic research and landscape preservation.
David E. Siegel joined the Harvard University Library, Office for Information Systems, in 2000 as a geospatial data and informat,ion software engineer. In 2006, he began sharing his time with the Center for Geographic Analysis where he consults on several projects and institutional initiatives. His professional interests include developing Web mapping solutions for discovering and delivering geospatial data.