ArcGIS Hub

Migrating data into ArcGIS Hub - Part 2

ArcGIS Hub gets a lot of love, and rightfully so.  It’s a tool that makes it easy to show the extent of an issue or progress made towards a goal.  It also makes accessing the raw data behind these visualizations simple which increases transparency and builds trust with your community.  The challenge then for most people is in getting (and keeping) high quality, actionable, and relevant data into your ArcGIS Hub site.

As we talked about in Part 1 of this blog series, there are a variety of ways that you can get data into Hub. The tool you choose largely depends upon two things: how often your data updates and your technical skill level.  If your data never changes (imagine a dataset that has only building permits from 2018), then it’s likely that you don’t need a formal extract, transfer, and load (ETL) process.  On the other hand, if your dataset is continually updating on some regular schedule (imagine a table of E911 calls) then it’s much more likely that you will need an ETL process to keep your data in Hub fresh.

Of the five most common tools that were outlined previously, the one I tend to rely on the most is Data Interoperability (also known as FME Desktop).  It offers several key benefits that push it to the top of my list which may or may not apply to you:

The Data Interoperability (DI) extension for ArcGIS Pro (again, also called FME Desktop) has a very intuitive interface.  If you’ve ever seen a flow chart diagram where there’s a start, some stuff in the middle, and an end result, then you already have a basic understanding of how DI works.  For clarity, here are the key terms you’ll need to know:

Scenario

To help illustrate the usefulness of Data Interoperability, here’s a hypothetical scenario: Imagine you work for a city that wants to share building permit information.  The source system overwrites a csv file each night with all the new building permits that were processed that day.  Your task is to create an automated process that takes this new file and adds the records within it to a hosted feature layer in your city’s ArcGIS Hub site.  The overall process will look like this:

  1. Examine the Data
  2. Build the workbench to create the hosted feature layer for the first time:  Read, Transform, and then Write the Data
  3. Confirm that the data written is correct and fill out remaining item information
  4. Alter the workbench so that it appends future records to the (now) existing hosted feature layer:  Update the reader csv source to a new csv file and then update the writer settings to append to the existing hosted feature layer

Step 1:  Examine the data

The first, and most important step to any ETL process is to examine the dataset.  This can take many forms but often it’s easiest to look at a csv file in a spreadsheet program like Microsoft Excel.  Look for the values present in the columns.  Common things to ask are:

Standardization: Are the values consistent in the fields? This applies most to fields that have

Null or blank values: Are there null values where you expect to see information? Similarly, are there blank values where there should really be a null?  A blank field could mean missing data whereas a null value typically means no data.  Having this consistently in your dataset is important as it impacts the usability of data by users as well as in charts and maps.

Dates and times: Are date fields using a consistent format? Are the values in a particular time zone?  Do time fields contain offsets for daylight savings time?  Do time fields use a 12- or 24-hour clock?

Special characters: Are there values that contain commas, dashes, ampersands or other characters that could complicate the reading/writing of data into certain formats?  For example, if there are commas in a field, this could create issues when trying to work with the data in a csv (comma-separated value file).

Location: Do the records in the table have location information? Is it in a consistent format like an address or is there a mix of addresses, street intersections, and parcel numbers?

Links to other datasets: Is there a foreign key present in the data that can be used to associate it to another dataset (as in a parcel APN number for each associated building permit)?

Personally identifiable or restricted data: Perhaps most importantly, does the table contain any information that is restricted by laws or policies? Does the data expose personally identifiable information (like names, addresses, social security numbers, or victim information) that needs to be removed or obfuscated prior to publishing?

Examining your dataset before you begin building an ETL process is extremely important and will guide the transformers you will need to use in the following steps.

Step 2:  Build the workbench to create the hosted feature layer for the first time

After you have examined the raw dataset and understand its nuances and challenges, we can begin building the workbench in Data Interoperability.  We are assuming in this step that we are creating a hosted feature layer (which will be hosted in your organization’s ArcGIS Online account).  The same process can be used if you are relying on services hosted in ArcGIS Enterprise in your infrastructure.

Step 2a:  Read the Data

Since our data is accessible through a csv that is updated nightly, our reader will connect to the csv itself.  The first step here is to open Data Interoperability and open the “Reader” tool.

From within the Reader prompt, we can search for whatever format we need to read (in this case we’re looking for csv) and then start filling in other information like file location.

reader dialog box in FME Desktop

Once we save the configuration Data Interoperability will attempt to read in the data from the csv.  It’s best to have the “enable feature caching” option turned on when building a workbench for the first time.  This option is found in the dropdown next to the Run button.  The interface will show us a cached version of the data from the csv that we can use to visually inspect the data.

Records from FME Desktop reader component

It’s important to note that Data Interoperability can also connect directly to a variety of databases (the “system of record”) but that process requires a read-only login to the database and understanding the table structure of the database both of which are beyond the scope of this introductory blog.

Step 2b:  Transform the Data

There are a variety of transformers available in Data Interoperability and even more in the community that you can utilize.  The most common ones that I use and, coincidentally, what we’ll use in this example are:

AttributeManager: This transformer is my go-to, multi-tool that can change field names, adjust values, and much more.  We’ll use it to change the field names and remove the underscores in the data (because I can’t stand underscores in data… I don’t know why)

DateTimeConverter: Have weird datetime values (or even text strings) in your data that you need to standardize?  We do in this dataset so we’ll convert the data in the “date issued” field into a format that works with ArcGIS Online.

CenterPointReplacer: If you need to create a point feature from a polygon (centroid), this transformer will do it with ease.  We’ll use the CenterPointReplacer to convert a spatial parcel layer (which is polygons) into single points.  As an aside, the VertexCreator is a better choice if you have a coordinate pair and need to create a point.

FeatureMerger: If you need to perform a join of one dataset to another using a key, this is the tool to use.  You can do inner, outer, left and right joins with this transformer.  We’ll use it to join the building permit data to a spatial parcel dataset which will:

ArcGISOnlineBatchGeocoder: Sometimes you just need to sling a bunch of records against a geocoder and see what sticks… this is the transformer for doing just that.  In our case, we’ll use it to make any of the remaining building permits that didn’t match with a parcel have a point.

Once all the transformers in the workbench are wired together, it will appear like this:

transformers in the workbench

There are a few important things to note about our workbench as it currently exists:

Step 2c:  Write the Data

Now that we’ve successfully read and transformed the data, we need to write it to our ArcGIS Online account.  Since this data doesn’t exist as a hosted feature service, this first writer process will create the layer and load the records.  Later, we’ll go into ArcGIS Online and update the item’s details including a thumbnail, summary, description, tags, and more so that users will understand what to expect in the dataset when they find it in your ArcGIS Hub site.

To start this process, we launch the “Writer” dialog and then search for “Esri ArcGIS Online (AGOL) Feature Service” as our output format.  We’ll also configure the output hosted feature service name, ArcGIS Online account to use, target projection, and more.  Inputs with a red exclamation mark (!) or that are highlighted in pink must be filled in and any other inputs are optional.

writer dialog box in the workbench

Once the configuration is done, we’ll run the entire workbench and publish the data to ArcGIS Online.  The workbench now looks like this with all of the readers, transformers, and writer configured:

completed sample workbench

When we run the workbench, the translation log at the bottom of the screen will show messages that give us insight into what’s happening.  Messages highlighted in red typically indicate failed processes and warrant a deeper look.

Step 3:  Confirm that the data written is correct and fill out remaining item information

At this point, we need to login to our ArcGIS Online account and make sure that the data has been written correctly to the hosted feature layer.  This means that we will look at the data in both a table and a map.  We’re looking for general things like improperly written values, unexpectedly null or blank cells where data should be, and even data that appears way outside of the city limits (possibly indicating that either the projection settings were incorrect or poor geocoding results).

Additionally, we need to complete the dataset’s item page details.  These details include:

Making sure that these components of the item page are filled out helps users of our ArcGIS Hub site understand the provenance of the data at a glance while also helping to power the search and filtering capabilities of the site.

Step 4:  Alter the workbench so that it appends future records to the newly created layer

Now that we have a hosted feature layer that has all the “old” building permit records and contains the critical item page information, we can adjust our workbench to keep our data fresh.  There are a few things we need to do to accomplish this.

Step 4a:  Update reader csv source to new csv file

At the beginning of this process, we discovered that our source system that holds all building permit data can create a new csv file nightly that contains only the new building permits processed for that day.  To adjust our existing workbench to take advantage of this, all we need to do is change the dataset source to the new file.  This assumes that the schema of the file is the same as what we used previously and that the file name won’t change (source system will simply overwrite the same file).

writer being reconfigured

Step 4b:  Update writer settings to append to existing hosted feature layer

When we originally built the workbench, the writer was configured to “Insert” records and “Use Existing” as shown below.  Frequently when I build a workbench, I’ll change the feature type handling to “Truncate Existing” since I’m testing the process and want the workbench to delete all the records from the hosted feature layer before writing new ones.  Make sure that you switch these settings back so you append new data rather than wipe out all your old data.

writer reconfigured to update existing layer

Conclusion

Data Interoperability is a great tool to help you more easily migrate your data into your ArcGIS Hub site.  It takes a little bit of time to learn the terms and to figure out how to build a process that works for your data, but the reward is well worth the effort.  If your goal with ArcGIS Hub is to drive transparency or engage with your community, the best way to do it is to make fresh data available.  Your users will thank you and, more importantly, they’ll trust the data and visualizations that you create.

If you’re interested in learning more about Data Interoperability and the many, many things it can do for you, check out the ArcGIS Data Interoperability blog.

Explore this Workbench

Click here to get a copy of the workbench used in this blog.  When deploying this on your computer make sure you have a license (even if it’s a trial license) for ArcGIS Data Interoperability or FME Desktop.

Steps to deploy this workbench on your machine:

Acknowledgement

Special thanks to the City of Worcester, MA’s “Informing Worcester” open data site built on ArcGIS Hub and their building permits and parcel data used in this blog.

 

Hungry for more ArcGIS Hub content and learning?

About the author

Nick O’Day is a senior consultant on Esri’s Professional Services team. He focuses on helping municipal customers with community engagement and organizational collaboration tools. He enjoys building cool stuff with really smart people using the best GIS tools on the planet. Before joining Esri, Nick worked as a Chief Data Officer where he used ArcGIS to "squeeze" locally-grown data into a delicious juice for citizens and staff each day. When he’s not GIS'ing, he’s usually cooking, eating, or learning something new in sunny LA.

Connect:
0 Comments
Inline Feedbacks
View all comments

Next Article

Link charts in ArcGIS AllSource Part 2: Link analysis tools

Read this article