News

ArcUser Online


Search ArcUser

 

E-mail to a Friend

Can Geography Rescue Text Search?
By Randy Ridley, John-Henry Gross, and John Frank, MetaCarta, Inc.

Decision makers depend on information buried in text documents. Millions of staff weeks are spent reading text. As the volume of textual information increases across an organization, users need new methods for understanding this critical information. When we communicate, we reference locations and events. We understand situations by interpreting geographic references such as "20 miles east of Média."

click to enlarge
Clicking on these buttons opens two MetaCarta windows. The smaller window presents an input field for entering a keyword. The larger window displays the results of the geographic text search.

Until recently, software had trouble interpreting ambiguous names such as Média. Current technology advances change this. Sorting information geographically is creating a paradigm shift in organizational work flow. Latitude and longitude provide a grounded reference model for seeing patterns and correlating information. This new technology extends the power of geography to text documents and text search/retrieval.

E-mails, reports, news feeds, presentations, web pages and many other document types contain critical intelligence for mission support. Analysts spend millions of hours understanding these documents. On average, about 80 percent of all documents contain at least one reference to a specific geographic location. Geography offers a powerful method of searching through these documents. Combining text search and geographic search allows analysts to see and find relevant information faster.

Geography offers a new framework for identifying and analyzing the overwhelming volume of text documents available today. Technology that automatically plots text documents on maps based on spatial references and addresses that appear within the text accelerates the ability to find relevant information. Open Source Intelligence (OSINT) gathering must cope with an overwhelming number of sources [i.e., intelligence obtained from publicly available sources].

Combining geographic search with full-text keyword search, document categorization, and temporal filtering accelerates the delivery of intelligence to decision makers in the field. Instead of missing essential information, analysts can see geographic trends as they emerge. Viewing a map with text documents represented as geographic features improves situational awareness and enables instant corroboration between disparate sources that discuss proximate locations.

Mission-Critical Application: Homeland Security

There are many mission-critical applications for geographic text search-counterterrorism, homeland security, law enforcement, environmental compliance, mission planning, and emergency planning to name just a few. To illustrate how geographic text search works, let's use a homeland security example. In this scenario, an intelligence analyst is analyzing a bioterrorism threat to a major center of commerce in northern Virginia. He opens a standard Web browser and logs into an analyst's application powered by MetaCarta's Geographic Text Search (GTS) with Esri's ArcGIS. This application accesses a collection of several million intelligence-related documents that include e-mail messages, surveillance reports, public records, news wires, and other proprietary communications.

The analyst zooms the map into the area around Vienna, Virginia. The MetaCarta GTS plug-in adds two additional windows to the ArcMap interface-one window for entering queries and the other for viewing results lists and accessing documents. The analyst enters a text query that includes the word "anthrax" and activates the search. Document icons representing text documents about anthrax within the geographic extent of the current map view are displayed as a layer on the map.

Using geography to focus search efforts reduces information overload. Instead of returning thousands of mostly irrelevant documents that contained the search term, the result set was reduced to less than 50 results that are relevant both geographically and with regard to content. The results window displays a list of document summaries accompanied by query relevance and geographic confidence values to aid analysis.

The analyst selects one of the documents to view and read in its entirety by simply clicking on its hyperlink in the results section. After reading this document, he decides to fine-tune the analysis. He executes a second search that uses the term "research lab" and generates a second results layer on the map. He has identified a biological lab that studies anthrax in Reston, Virginia. This location is a few miles from Vienna and is usually upwind of the lab. By running other, similar searches for dangerous biological agents across a vast collection of intelligence reports, the analyst discovers other similar threats in the same research park in Reston. The analyst writes a report recommending increased security measures for this park.

Beyond a Text Search

Without the ability to search for documents using a specific geographic location, analysts and other knowledge workers must spend many hours performing text searches and manually reading articles and documents looking for direct and indirect references that support missions. In the case of bioterrorism, every piece of information on a specific location provides clues that could reveal a threat. To save lives, analysts must read every document that could identify the location and extent of the threat, and they must do it as fast as possible.

click to enlarge
MetaCarta GTS automatically extracts geographic references from text documents and plots documents on a map. Analysts can search text archives using both keywords and geographic extents as filters. Small colored square icons on the map represent documents that contain desired keywords and geographic references to the location indicated by the icon. Icons directly hyperlink to documents.

For example, without stating that location explicitly, different documents can refer to locations within the same area by various methods: regular latitude and longitude (68°46'17"N 26°18'57"E), Universal Transverse Mercator (UTM) or Military Grid Reference (34TBP950601), natural language relative reference (8 mi. northwest of Herndon, VA), street address (123 Maple St., Vienna, Virginia), and place or feature name (World Trade Center).

Traditional keyword search techniques are unable to identify these locations. Searching for the location "Vienna" illustrates a fundamental limitation. This search yields results such as "cook vienna sausages" or "Vienna Smith said..." that are not useful. A traditional text search for "Vienna" would not return a document with the phrase "Place the bomb at Tysons' Corner," where Tysons Corner is another name for part of the Vienna, Virginia, area.

Continued on page 2

[an error occurred while processing this directive]