ArcGIS Pro

Dev Summit 2021: Data Engineering in ArcGIS Pro

Data analysis can help address some of today’s most pressing challenges. However, before data can be leveraged to tell powerful stories, it needs to be thoroughly explored, cleaned, and transformed. These data engineering processes are often labor-intensive; therefore, there is an enormous need for tools to facilitate them.

At the Developer Summit 2021 plenary, Lakeisha Coleman demonstrated how you can use ArcGIS Pro’s new Data Engineering experience to effortlessly turn messy data into analysis-ready data to address hunger and food insecurity issues in the United States.

Watch the plenary video below, and then read the rest of the blog for a summary of the processes that Lakeisha explored in her demo.

 

First, Lakeisha opened the Data Engineering view for the SNAP Participation layer, which contained data about the Supplemental Nutrition Assistance Program (SNAP) benefit, by right-clicking the layer and clicking the Data Engineering button. The resizable Data Engineering view automatically snaps to the lower half of the map view and contains two panels: the fields panel that lists the fields in the layer and the statistics panel that displays a statistics table for the fields.

Showing the new Data Engineering view in ArcGIS Pro
The Data Engineering view contains two panels: the fields panel (left) and the statistics panel.

She then seamlessly explored the fields in the layer from the fields panel and clicked the Update Symbology button for the Median Household Income and the Average Household Size fields to quickly change the symbology of the layer to symbolize by those fields, respectively.

Showing the fields panel in the Data Engineering view that allows for visualization of all fields in a layer.
The fields were visualized from the fields panel.

After visualizing the fields in the layer, Lakeisha selected all the fields from the fields panel and calculated statistics by right-clicking the selected fields and clicking the Add to Statistics and Calculate button. This immediately populated the statistics panel with each field’s descriptive statistics and metrics in a table format.

Showing a populated statistics panel in the Data Engineering view.
The statistics panel was populated with the descriptive statistics and metrics for each field in a table format.

Using the generated statistics, she easily identified the number of null values in each field. She then right-clicked the Number of Nulls cell for the Participants field and clicked Select Null Values to visualize its missing values on the map.

Showing the visualization of all null values for the Participants field on the map.
The missing values for the Participants field were visualized on the map.

After confirming that the locations of the missing values don’t exhibit any obvious patterns, Lakeisha used the Fill Missing Values tool to replace the missing values in the field with estimated values based on spatial neighbors.

The Fill Missing Values tool was run to replace the missing values with estimated values based on spatial neighbors.
The Fill Missing Values tool was run to replace the missing values with estimated values based on spatial neighbors.

Next, she opted to change the skewed distribution for the Participants field to aid in the analysis of the data. The skewed distribution was transformed into a normal distribution using the Transform Field tool and recalculating the statistics.

The Transform Field tool was run to transform the skewed distribution into a normal distribution.
The Transform Field tool was run to transform the skewed distribution into a normal distribution.

Then, Lakeisha ran the Dimension Reduction tool from the Construct tools in the Data Engineering ribbon to reduce the number of population variables by aggregating the highest possible amount of variance into fewer components. Note that she selected Principal Component Analysis (PCA) as the dimension reduction method.

The Dimension Reduction tool was run to reduce the number of population variables into fewer components.
The Dimension Reduction tool was run to reduce the number of population variables into fewer components.

The resulting components were added to the data, listed in the fields panel of the Data Engineering view, and made available for analysis to study hunger and food insecurity issues in the United States.

Finally, Lakeisha showed an ArcGIS Notebook in which she had recorded the details of the aforementioned processes. The Notebook simplifies sharing of code and allows for automation of the data preparation process.

ArcGIS Notebook detailing the data preparation process.
ArcGIS Notebook detailing the data preparation process.

Learn more

Lakeisha’s demo showed how the new Data Engineering experience in ArcGIS Pro can simplify the otherwise tedious task of preparing messy data for processing and analysis. Visit the ArcGIS Pro documentation to learn more about how you can use the Data Engineering experience to help you better understand your data and prepare it for GIS workflows.

About the authors

Aawaj is a product engineer on the ArcGIS Enterprise team.

Esri Solution Engineer, helping to craft solutions using GIS

Connect:

Next Article

What’s new in ArcGIS StoryMaps (April 2024)

Read this article