Geospatial data doesn’t always come neatly packaged in the form of file geodatabases and shapefiles. Often, data is hidden away in an unstructured format, such as text-based reports.
To use this data with ArcGIS, you need to convert it into a structured, standardized format. However, it is difficult and time consuming to read and convert unstructured text.
She analyzed thousands of unstructured text files containing police reports from Madison, Wisconsin, and created a map of the crime locations.
You can watch the presentation below. Then read the rest of the blog for a summary of the information you need to implement the same type of workflow in your organization.
Prepare training data
First, Lauren labeled the contents of a subset of text files to define important entities related to crime data. Entities in the presentation included the location, time, and type of crime, time the crime was reported, reporting officer, and weapons used.
An open source text annotation tool named Doccano was used to label the entities.
These labeled text reports served as training data to train an AI model to extract these entities from unstructured text.
Train the model
Next, Lauren used the arcgis.learn module and the training data to train an EntityRecognizer model.
Training such natural language processing models is just like training computer vision models using the arcgis.learn module. You create the model, fit it to the training data, visualize results and save it for later use.
Once satisfied that the model could identify the information they needed, Lauren used it to extract the entities from all the text files. This resulted in a pandas data frame containing the extracted entities for each police report.
Create a feature layer
With the data in a structured data frame, she could use ArcGIS API for Python to geocode the locations and create a point feature layer. Each point represented a crime location.
When added to a map, clicking a point showed the police report and the specific entities extracted for each crime.
Additionally, the extracted crime was clustered into different categories using scikit-learn, a popular machine learning library.
Try it for yourself
Follow these links to additional resources to help you use the arcgis.learn module to extract and map data from unstructured text files: