ArcUser Online

Search ArcUser


E-mail to a Friend

Decoding Addresses From Point Layers
By Valerie Raybold Yakich, IGERT Fellow in GIScience, Geography Department, University at Buffalo, State University of New York

Recent research at the University at Buffalo explores whether addresses used in geocoding may be decoded from point layers. Simulations select sets of ten points in a ZIP Code area and discover parameter input values for offset and squeeze. A final testing procedure attempted to ascertain the addresses used to create the points. The accuracy of the author decoding method suggests that point layers should be handled with the same level of privacy as an original address list.

Address matching maps a set of addresses onto a reference layer. Reverse address matching attempts to transform a set of coordinate locations into the original list of addresses.

Governments struggle to balance privacy and confidentiality concerns against the public trust that comes from open information. Academic researchers lean toward openness to facilitate repeatability, and thus verification, of their hypotheses. Businesses recognize the competitive value of information and tend to retain it within their organizations.

In the wake of increased terrorism, these stereotypical stances have shifted significantly toward privacy. However, the quantity and reach of information continue to grow. Determining which data to share on the Internet requires judgment on the part of a potential distributor. As GIS proliferates throughout government, academic, and corporate organizations, such decisions will also increasingly involve whether to distribute map layers. Understanding the effects of GIS transformations provides a necessary foundation for these decisions.

Address matching procedures in GIS software take a list of addresses and convert these into coordinate locations. The resulting maps visually display geographic patterns of phenomena such as disease and crime. If coordinate locations are subsequently released, the map creator takes a risk that others may reverse the matching procedure and discover original addresses. The coordinates are essentially an encoded address list. If the creator considers the encoding procedure sufficiently secure, he or she may be willing to distribute the coordinates of private or semi-private information.

The marketing advantages of mailing lists create an incentive for businesses to decode relevant point layers using reverse address matching techniques. The result may be inadvertent loss of individual privacy. Address matching encodes, but does not encrypt, addresses. Geocoded points communicate location information; encryption prevents communication except among privileged parties.

Understanding Offset and Squeeze

When performing geocoding, an address list is transformed into a set of coordinate points based upon a street reference layer. If an appropriate street segment is found for an address, two parameters—offset and squeeze—help determine the coordinate locations.

Street line segments regularly represent an approximate street centerline. Offset is the distance to the left or right of a line that a point is removed from the street segment, so an offset of zero places all addresses along the middle of streets. To get a more realistic position for buildings, addresses are frequently offset a certain distance, such as 20 meters, from the street line segment.

A squeeze value greater than zero squeezes points in from the ends of line segments, which avoids placing the first and last addresses on each street in the middle of a cross street. Possible squeeze values range between 0 and 100 percent. A squeeze of 100 percent places all addresses on one side of any given block in exactly the same spot.

Reverse Address Matching

Untrained or inexperienced users may not readily notice the geometric and mathematic repeatability of address matching or consider the potential for reversing the process. In 1999, the Geographic Information and Society international conference included a paper, Hacking: On the Use of Inverse Address-Matching to Discover Individual Identities from Point-Mapped Information Sources, by Marc P. Armstrong and Amy J. Ruggles. Although they acknowledge that variables used in the matching process could be discovered, they suggested that increasing offset and squeeze parameters can guard against deciphering addresses in ArcInfo.

see caption
Address matching to a street reference theme may include offset and squeeze parameters. Offsets greater than zero map locations to the left and right of directed streets. Squeeze values greater than zero conceptually shorten the street segment to avoid placing first and last addresses in the middle of cross streets.

Their findings necessarily rely on assumptions made by the programmer who wrote the reversal script. For example, the reverse address matching script in ArcView assumed a zero squeeze percentage even though the default for address matching in that program is five percent. Intuitively, an offset of zero will reduce decoding accuracy because equal ranges on both sides of a street create identical pairs of coordinates along the street segment. No strictly geometric analysis can determine the correct address for coincident points with greater than 50 percent probability.

The research by Armstrong and Ruggles prompted an analysis of address matching algorithms. Concern for the privacy implications of their results and interest in the process of discovering input variables led this author to consider alternate methods for reverse address matching. Specifically, she sought to take advantage of parameter information inherent in a set of points rather than considering each coordinate individually. Sets of 10 points each were selected randomly from possible addresses from TIGER street segments in two suburban ZIP Code areas outside of Buffalo, New York. One hundred sets were created for combinations of offsets of 0, 20, and 40 meters and squeeze of 0, 10, and 20 percent.

click to see enlargement
These examples show the effects of several offset and squeeze values. The maps depict points for all possible addresses. In reality, only a fraction of the addresses are actually in use.

The results reinforced confidence in the ability of map hackers to determine offset and squeeze values from a small number of geocoded points. Offset was deduced perfectly in all 900 tests by calculating the most common distance between each test point and its nearest street segment. Although other methods are possible, the author squeeze detection algorithm relied on the replicable nature of address matching and computationally intensive, or brute force, programming. Using a known offset, all possible addresses on every street were plotted at one percent increments. Avenue scripts found the most likely squeeze value by evaluating the sum of square distances between test points and possible address points for the tested percentages. All 900 tests perfectly deduced the input squeeze value. (Note: Restricting squeeze to integers limits the ability to generalize this finding.)

Continued on page 2

[an error occurred while processing this directive]