ArcGIS Blog

Announcements

ArcGIS Insights

Introducing Data Engineering in ArcGIS Insights

By Jen Taylor and Maitreyi Gupta

Introducing data engineering in ArcGIS Insights desktop 2022.2. Replace those empty strings and nulls, convert incorrect column types, explore your data, and get your data ready before you dive into analysis.

What is data engineering?

While you have always been able to manipulate your data in Insights, for example, to change the data type or filter your data, data engineering adds more data management functionality. By enabling you to process your data upfront it will streamline your analysis. In Insights desktop 2022.2, you’ll notice a new section called Data Engineering, which is in preview. Data engineering preview offers you a full-functioning, non-beta experience. However, it is only available in Insights desktop and will be enhanced with more tools in future releases.

New workbooks, specifically for data engineering, are now available in the home page where you can clean and prepare your data before you start your analysis.

Data engineering homepage

How to perform data engineering

On opening a new data workbook, you will be greeted with the Add to page dialog box which has been expanded to allow you to sample and filter your data before it is loaded into the data workbook. To filter out specific columns or apply advanced filters, open the import options. To make it easier to work with the data, a preview subset of the data is shown in the table.

Preview of the dataset is shown while filtering data under import options.

Data engineering is always run on your entire dataset, however, to ensure faster processing time as you work with your data, sampling is used to reduce what is shown in the workbook when the data is over a certain threshold (250,000 in the 2022.2 release). Different sampling methods are available, plus the sampling value can be increased.

The data workbook creates the model and displays the data table with the sampled tag (if applicable) for the dataset that was added.

The data model and table for the added dataset.

Datasets will be displayed as a tab in the data table section and, based on the data type, different column tools are available from the dropdown menu to explore your data.

Show column summary shows a chart of the column data. A statistical summary below the chart provides information such as nulls, empty strings, and mean . With the column summary you can obtain more insights about the data to start the preparation process.

Different charts are created, depending on the data type. String columns create a bar chart showing the count of each unique value in the column. Date/Time columns create a time series graph showing the count of features by date or time. Finally, number columns create a histogram showing the distribution of values in the column.

The column summary updates while switching between data types.

After seeing the data distribution, you may want to fix incorrect values, and this can easily be done with the Find and replace tool. Replace those incorrect spellings, nulls, and empty strings.

The find and replace tool can be used for strings, empty strings or nulls.

In addition to changing column values, the column data types can be changed. For example, the temperature in a dataset may be showing as string format and converting it to a double will mean you can perform statistical analysis on it.

In data engineering you have even more control when converting data types. Date/time accepts custom formats that match your data. In the custom format parameter box, enter the format of your data.

Ability to supply custom date formats when converting strings to datetime.

Numeric data types can be integers (no decimal places) or double (decimal places), and you can choose the decimal separator (points or commas).

Ability to specify the separator used in the input data when converting to Double.

The Advanced filter and the Column filter can be used to limit your data to just the records needed for your analysis.

Filter individual columns in the data table.

In addition to removing columns on import, you can also remove them in the workbook.  As with Insights workbooks, new columns of data can also be calculated.

To exclude columns in the output remove them from the data table.

The Create relationships dialog box has been revamped and now supports cross-database joins. Results of the relationship can be previewed before it is run.

Preview the relationship result before finalizing the join.

If you make a mistake or just want to make changes to your model, you can always edit the model tools to either remove (Delete button) or change the criteria used (Edit button).

Model tools can be edited or removed.

Creating the output dataset

Having cleaned and prepared your data, be sure to run the model to create your new dataset, ready for analysis. Output data can be stored locally or in a database. Local datasets can be exported to CSV, shapefile, and GeoJSON files.

Data engineering outputs can be saved to the Datasets tab and then used for analysis.

The data engineering preview in ArcGIS Insights 2022.2 offers you new ways to manage your data and a glimpse into the types of features that will arrive in future releases.

If you’d like to learn more, check out the documentation that describes this new feature in detail.

Share this article

Subscribe
Notify of
0 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Michael Robb(@mike-robb)
April 21, 2020 3:18 pm

How does one Disable http or have http redirect to https with WABde?

Craig Cleveland(@ccleveland)
May 7, 2020 5:12 am
Reply to  Michael Robb

Web AppBuilder Developer Edition inherits these settings from the portal it is associated with. To achieve the desired affect you’ll need to disable HTTP in your portal settings. Here is a help link that describes that process for an ArcGIS Enterprise portal – https://enterprise.arcgis.com/en/portal/latest/administer/windows/configure-https.htm. It is a similar process in ArcGIS Online (if the option exists in your Organization…depending on a few factors it may not be there as all new Organizations are HTTPS only at this point).

GIS Support(@cprail)
May 11, 2020 3:05 pm

Hi Craig, I noticed I am not able to reply. the Blog returns a ‘sorry , that user name already exists!‘ when I try to sign in. I am using one of our support accounts instead. Our portal is already ‘https only’ enabled and yes, a redirect occurs when first hitting the 3344 port where WABde resides – against the portal chosen, however, one can modify the URI from https to http in the browser, hit enter and WABde loads non SSL (not secure) and does not redirect at this moment: [uri] :3344/webappbuilder/?action=setportalurl. Though if one continues signing into the… Read more »

Craig Cleveland(@ccleveland)
May 14, 2020 10:31 am
Reply to  GIS Support

Hi Michael, I believe I’m following what you’re saying and can replicate the behavior on this end. I’ll inquire with the core development team as to whether there’s an alternative solution to this (i.e. to completely disable HTTP) and post the response here if there is. In the meantime I’d encourage you to open a tech support incident on this subject as well.

Amr Eldib(@aeldib)
May 28, 2020 2:36 pm

testing an error when posting comment to the blog

Ingrid Mans(@mansij_montva_gis)
July 30, 2020 8:25 am

I…am so lost. Just to install OpenSSL it appears I also need Perl, among other things?

Craig Cleveland(@ccleveland)
July 31, 2020 4:08 am
Reply to  Ingrid Mans

Hi Ingrid, I’m not intimately familiar with the prerequisites for openssl. I have never been prompted for Perl, but that may simply mean it already exists on my machine. In doing some quick research you may want to look into using an installer like this – https://www.ssl.com/how-to/install-openssl-on-windows-with-cygwin/. It appears you’d be able to install the Perl prerequisites along with openssl. Alternatively it might be a good idea to work with your IT department to help you get what you need.

Matthew Fletcher(@mfletcher18)
August 11, 2020 12:51 pm

The directions in this blog post didn’t work for me, I think is was something funky with the files created in openssl as I already had a signed CA Cert. The directions in an ESRI technical support article named “How To: Use a CA-signed SSL certificate for Web AppBuilder for ArcGIS (Developer Edition)” allowed me to get the WAB Dev Edition secured . Search for it if you get stuck, here’s the current link. https://support.esri.com/en/technical-article/000014185