Mountains of data
In today’s world, data grows more rapidly than we have ever seen. Mountains of data are created daily; it’s estimated as much as 2.5 quintillion bytes of data and growing. Eighty percent—if not more—of this information is classified as unstructured text. Unstructured texts are data formats that have no predetermined data model. These texts come in the form of Word documents, PowerPoint documents, emails, social media, and even web pages. The other 20 percent of data created daily falls into what we call structured texts, such as tabular data. These data formats are clear and readily available to interpret. All this information is created at a new rapid pace, and with these mountains of data comes the uncertainty they cast due to their sheer size in volume. This makes the process of finding valuable information even more burdensome. Many times, the issues we face aren’t a problem of acquiring information but making sense of it and making sense of it fast.
Finding the needle in a haystack
ArcGIS LocateXT is an entity-extracting software developed to search unstructured text for geolocations and custom lines of text. For example, it can search a web page for city names, GPS coordinates, or other specific placenames that you set, such as Colorado. LocateXT swiftly scans through these unstructured texts and extracts the data as a feature on the map. LocateXT supports multiple coordinate systems that include the following:
- DD—Decimal Degree
- DM—Degrees Decimal Minutes
- DMS—Degrees, Minutes, and Seconds
- UTM—Universal Transverse Mercator
- MGRS—Military Grid Reference System
In addition to these coordinate systems, the custom locations option allows LocateXT to extract placenames in your text, saving time from tedious data filtering. For instance, an analyst wants to plot features that are in a specific country or region. By using a Custom Location File (.lxtgaz), the analyst can trim hours off their workflow and use that time to conduct further investigations using a comprehensive suite of tools. Another time-saving option provided in LocateXT is the Custom Attribute File (.lxtca) that extracts specific text within the unstructured text document. It extracts not only the coordinate of a maintenance issue but also the associated description of the issue. Combining this with the templates in the tool allows you to customize each scan and saves you hours of time that would be spent reading through and making sense of all the data. After extracting your entities from your unstructured text, you can begin bringing order and structure to your data.
Once the entities have been extracted, they are plotted on the map as point features, allowing you to customize their symbology. Furthermore, you can illustrate nonspatial information like time and quantity using timelines, charts, and graphs. For instance, you can record when a maintenance issue is reported and address which months will require the most attention. Visualizing what would otherwise be text and bringing order and simplicity to the chaotic volumes of information.
Putting LocateXT to the test
Now we will walk through an example of how powerful LocateXT can be. In this example, we are recording locations in need of maintenance along the Coal Creek Trail in Colorado. As hikers traveled this trail, they reported maintenance issues and their descriptions for repairs to the department. The information has been submitted using multiple types of sources ranging from a text file to a PowerPoint presentation.
By creating a Custom Attribute File, we can begin an extraction of one of our unstructured text documents containing locations from phone call reports in a text file. In addition to extracting coordinates, LocateXT can extract the pre-text or post-text of a keyword from the document using a Custom Attribute File. In the reports from the hikers, we can see they used terms like maintenance issues and description. We want to create a Custom Attribute File to define maintenance issues and description as keywords so that we can extract the associated post-text. Now when we extract the information in our documents, we can see the extracted text that follows the defined keywords in the features classes attribute table.
After the extraction, we notice that we have more incoming data, this time in different data formats than a text file. We will add them to our input and extract them using the same Custom Attribute File.
Our finished product is a customized map containing all the information we need with the ability to continue working by creating features or adding more data to our layer. With large quantities of data available to everyone, ArcGIS LocateXT can save hours of time and find information that can be lost due to volume in a matter of seconds, making molehills out of mountains.
To learn more about ArcGIS AllSource, visit the ArcGIS AllSource website.
To see the web course used in this blog, visit Mapping Locations from Unstructured Text.