October - December 2007
Leon County is home to Florida's capital city, and its rich history is reflected in the area's inhabitants who range from the Apalachee Indians to the Spanish, French, and English who successively occupied it. Nowhere is this rich heritage as vivid as Leon County's legacy of unique local road names. Roads such as Chowkeebin Nene, Calle de Santos Road, Rue de Lafitte, and Old St. Augustine Road are a testament to the imprint these cultures have left on the county. Overlaying this diverse history is the footprint of ongoing development, whose architects often have vivid imaginations when it comes to street naming.
This melting pot of road names, however romantic, poses practical problems when geocoding addresses. There are several distinct types of road names for which the default Esri geocoding process will frequently fail:
Esri's geocoding process relies heavily on the recognition of keywords to properly parse an address into different parts called tokens. The objective of the Leon County Geocoding Development Kit (GDK) customizations was to improve this parsing for the addresses within the county's master address database. These known "good" addresses are the official source of parcel site addresses for Leon County and are assigned by Leon County's Growth Management Department as mandated by local ordinance. Theoretically, 100 percent of the addresses in the master address database should geocode properly against a corresponding parcel-based locator because they are both based on the same data.
A secondary objective of the GDK customizations was to improve recognition of frequently misspelled street names. The exotic spellings of some local roads make misspellings a virtual certainty (e.g., Yashuntafun Rd). Esri geocoding uses a soundex algorithm to match input addresses against a locator service. This is not infallible. Certain commonly observed mistakes in source addresses, including the confusion of prefix direction with suffix direction and some common misspellings, are beyond the ability of the soundex algorithm to handle successfully.
The GDK was downloaded from Esri's Developer Network (EDN) Web site. The kit contains all a user needs to customize the geocoding process including the Geocoding Rule Base Developer Guide, the geocoding rule bases of the current release of ArcGIS, an interactive standardizer (STANEDIT.EXE) that is used for syntax checking and debugging of standardization pattern rules, and the standardizer pattern rule encryption program (ENCODEPAT.EXE). After installing the GDK, modifications were made to the classification file, us_addr.cls, and the pattern file, us_addr.pat. These files support geocoding locators for several of the most commonly used U.S. style addresses (U.S. Streets, U.S. One Range, and U.S. One Address locators).
In general, changes to the classification file were made to
Next, changes to the pattern file provided improvements in road-specific pattern recognition. Certain road names in Leon County are intrinsically baffling to the default Esri geocoding routines (e.g., North by Northwest Rd); the GDK provides users with the tools to create custom patterns to recognize and properly parse addresses containing these confusing road names on a case-by-case basis. Edits to the pattern file are actually made to the unencrypted version of the file (us_addr.xat), then ENCODEPAT.EXE is used to encrypt the file to the version ArcGIS uses (us_addr.pat).
By making changes in the classification file and the pattern file, it is possible to improve geocoding rates on problematic roads without requiring changes to the master address database. A balancing act is required when making changes in these two files. For example, in Leon County the placement of street-type keywords is not always consistent (e.g., Ride is sometimes used as a street type and sometimes used as part of the street name), so judgment must be used in determining whether to retain Ride as a street-type keyword in the classification file or to remove it. This determination will, in turn, affect which specific road names must be dealt with on a case-by-case basis by adding pattern recognition routines to the pattern file. After the pattern and classification files have been modified and the pattern file has been encrypted, the files are copied over the default versions in the Program Files\ArcGIS\Geocode directory.
For benchmark testing, 97,834 addresses were extracted from the Leon County parcel address layer. These addresses were geocoded against a parcel-based locator service using the Esri default classification and pattern files. The geocoding settings were left as default (spelling sensitivity=80, minimum match score=60, ties allowed). Even though all the addresses in the benchmark test had exact string matches to addresses in the locator, 1,131 of the addresses did not geocode successfully. This clearly illustrates the magnitude of improperly parsed addresses in Leon County when using the default ArcGIS classification and pattern files.
Next, these same addresses were geocoded using the customized classification and pattern files. Only 59 of the records did not geocode successfully; the customizations to the classification and pattern files resulted in a 95 percent reduction in the number of unmatched addresses. The remaining unmatched records were primarily addresses containing confusing unit numbers for apartments and condos.
Further testing was conducted to gauge the improvements when geocoding typical user-supplied addresses in a real-world environment. There were 7,019 addresses pulled from the Leon County Animal Services service request database that were geocoded against a standard composite locator (a parcel-based locator with no ties allowed, followed by a street centerline-based locator allowing ties). In this test, the GDK customized files reduced the number of unmatched records by 29 percent.
The Geocoding Development Kit enables GIS professionals to improve geocoding match rates for address datasets containing unique local address styles by modifying the Esri default classification and pattern files. The return on investment of the time required to understand the GDK and implement the necessary local modifications can be easily justified by the reduction of manual matching efforts throughout an entire organization. History and culture leave a unique signature on local road names, no matter what corner of the world you work in. So look aroundyou are certain to find the GDK can improve your geocoding rates too.
For more information, contact
GIS Specialist III
Tallahassee-Leon County GIS
About the Author
While implementing the GDK, Jay Johnson provided GIS support to the Tallahassee-Leon County (Florida) Interlocal GIS program. He has more than 12 years of professional GIS experience and received his master's degree in GIS from the University of Colorado at the Denver College of Engineering and Applied Science. He recently relocated to Reno, Nevada.
Esri's Geocoding Development Kit (visit edn.esri.com and search the Downloads section for geocoding.)