ArcGIS API for Python

Deep learning models in arcgis.learn

Artificial Intelligence (AI) has arrived. It is not science fiction anymore. Computers already recognize objects in images and understand speech and language at least as well as, if not better than, humans. This has been made possible with rapid advances in hardware, vast amounts of training data, and innovations in machine learning algorithms such as deep neural networks. Deep learning is the driving force behind the current AI revolution and is giving intelligence to today’s self-driving cars, smartphone and smart speakers, and making deep inroads into radiology and even gaming. GIS and Remote Sensing is no different – many tasks that were done using traditional means can be done more accurately than ever, using deep learning.

So… what is deep learning?

Deep learning is a machine learning technique that uses deep neural networks to learn by example. Just like traditional supervised image classification, these models rely upon training samples to “learn” what to look for. However, unlike traditional segmentation and classification, deep learning models don’t just look at individual pixels or groups of pixels. They have higher learning capacity and can learn to recognize complex shapes, patterns and textures at various scales within images. This enables deep learning models to learn from vast amounts of training data in varying conditions. The trained models can then be applied to a wide variety of images at a much lower computational cost and be reused by others.

Deep learning in ArcGIS

One of the things I’m very excited about is the rapidly growing support for deep learning in the ArcGIS. The Image Analyst extension in ArcGIS Pro includes a Deep Learning toolset built just for analysts. A simplified deep learning installer packages the necessary dependencies and simplifies the experience. Data scientists can use Python notebooks in ArcGIS Pro, Enterprise and Online to train these models.

ArcGIS API for Python includes the arcgis.learn module that makes it simple to train a wide variety of deep learning models on your own datasets and  solve complex problems. It includes over fifteen deep learning models that support advanced GIS and remote sensing workflows. Additionally, these models support a variety of data types – overhead and oriented imagery, point clouds, bathymetric data, LiDAR, video, feature layers. tabular data and even unstructured text.

All models in the arcgis.learn module can be trained with a simple, consistent API and intelligent defaults. The models consume exported training data from ArcGIS with no messy pre-processing, and the trained models are directly usable in ArcGIS without needing post-processing of the model’s output. ArcGIS automatically handles the necessary image space to map space conversion.

In this blog post, let’s look at how the deep learning models in arcgis.learn can be tapped into, to perform various GIS and remote sensing tasks.

Let’s start with imagery tasks. One area where deep learning has done exceedingly well is computer vision, or the ability for computers to see, or recognize objects within images. This is particularly useful for GIS applications because satellite, aerial, and drone imagery is being produced at a rate that makes it impossible to analyze and derive insight from.

1. Object Classification

The FeatureClassifier model in arcgis.learn can be used to classify geographical features or objects based on how they appear within  imagery. For those of you who are familiar with deep learning, this leverages image classification models like ResNet, Inception or VGG.

Inference results
Damage assessment using object classification. Damaged houses are shown in red and undamaged ones in blue.

In GIS, such models can be used to perform automated damage assessment after wildfires or classifying swimming pools as clean or algae-infested green pools.

A sample notebook outlining the damage assessment workflow can be found here.

In addition to being applied to satellite imagery, this model can be used out in the field for data collection workflows. In the example below, a plant species identification model is being used to perform a tree inventory using Survey123 and it’s support for integrating such TensorFlow Lite models (currently in beta).

Automated plant species identification for field data collection
Automated plant species identification for field data collection

2. Pixel Classification

The next task we’ll look at is Pixel Classification – where we label each pixel in an image.

Known as  ‘semantic segmentation’ in the deep learning world, pixel classification comes to you in the ArcGIS Python API with the time-tested UnetClassifier model and more recent models like PSPNetClassifier and DeepLab (v3).

Building footprints extracted using arcgis.learn's UnetClassifier model
Building footprints extracted using arcgis.learn's UnetClassifier model

These models can be used for extracting building footprints and roads from satellite imagery, or performing land cover classification.  In the example above, training the deep learning model took only a few simple steps, but the results are a treat to see.

This sample notebook uses the UnetClassifier model trained on high-resolution land cover data provided by the Chesapeake Conservancy.  While it works well, it can be time consuming and expensive to get each pixel labeled within such high-quality training data by human annotators.

Land cover classification using sparsely labeled data
Land cover classification using sparsely labeled data

This is where the additional support that we’ve introduced into the Python API can be leveraged for training such models using sparsely labeled data.  Here we only need to label a few areas as belonging to each land cover class. We can then train a pixel classification model to find the land cover for each pixel in the image.

3. Object Detection

Time to check out another important task in GIS – finding specific objects in an image and marking their location with a bounding box. Better known as object detection, these models can detect trees, well pads, swimming pools, brick kilns, shipwrecks from bathymetric data and much more. The arcgis.learn module includes several object detection models such as SingleShotDetector, RetinaNet, YOLOv3 and FasterRCNN.

FasterRCNN is the most accurate model but is slower to train and perform inferencing. SingleShotDetector and RetinaNet are faster models as they use a one-stage approach for detecting objects as opposed to the two-stage approach used by FasterRCNN.

YOLOv3 is the newest object detection model in the arcgis.learn family. It’s fast and accurate at detecting small objects, and what’s great is that it’s the first model in arcgis.learn that comes pre-trained on 80 common types of objects in the Microsoft Common Objects in Content (COCO) dataset.

Detected catfish in full motion video captured from drone
Detected catfish in full motion video captured from drone

Don’t think you are limited to just images – these models even detect objects in videos! Take a look at locating catfish in drone videos or cracks on roads given vehicle-mounted smartphone videos.

Next, let’s look at a different kind of Object Detection. Now we’re going to detect and locate objects not just with a bounding box, but with a precise polygonal boundary or raster mask covering that object. In the deep learning world, we call this task ‘instance segmentation’ because the task involves finding each instance of an object and segmenting it.

The most popular model for this is MaskRCNN, and arcgis.learn puts it in your grasp. See it in action in the building footprint extraction sample, which highlights how the model is particularly suited for finding buildings, especially when they are right next to each other.

3D reconstruction of building made from masks produced by MaskRCNN

We’ve also used MaskRCNN to reconstruct 3D buildings from aerial LiDAR data. Don’t miss this sample.

4. Point Cloud Segmentation

Talking about 3D, we now have support for true 3D deep learning in the arcgis.learn module.

The PointCNN model can be used for point cloud segmentation. In this task, each point in the point cloud is assigned a label, representing a real-world entity.

3D scene created by classifying buildings and trees in point cloud
3D scene created by classifying buildings and trees in point cloud using PointCNN model.

This model can be used to create 3D basemaps by extracting buildings, ground and trees from raw point clouds. Another example is  extracting power lines and utility poles from airborne LiDAR point cloud. Previously, this was the most labor-intensive part of identifying an electric utility line’s safety corridor for monitoring vegetation and encroachments.

5. Image Enhancement

So far, we’ve seen several examples of extracting information from imagery and point clouds, but I’m really excited to tell you about synthesizing better data from poor quality data.  The SuperResolution model in arcgis.learn does just that, and can be used to improve not just the visualization of imagery but also improve image interpretability.

SuperResolution on aerial imagery
SuperResolution on aerial imagery

This model brings “Zoom in… Enhance” from Hollywood to ArcGIS! It can take low resolution and blurred images as input and turn them into stunning high quality, high resolution images. The model adds realistic texture and details, and produces simulated high resolution imagery. Don’t’ just take my word for it, check out the screenshot above and the sample notebook that does this magic.

6. Classification and Regression on Tabular Data

Now you might be thinking that deep learning only works on imagery and 3d data, but that’s just not true. Deep neural networks work equally well on feature layers and tabular data.

The FullyConnectedNetwork model feeds feature layer or raster data into a fully connected deep neural network. These models can classify areas susceptible to a disease based on bioclimatic factors or predict the efficiency of solar power plants based on weather factors.

Actual vs predicted Solar Energy generation
Actual vs predicted Solar Energy generation

In the plot above the blue line indicates actual solar power generation and the orange line shows the predicted values from the FullyConnectedNetwork model.

7. Entity Extraction from Unstructured Text

Geospatial data doesn’t always come neatly packaged in the form of file geodatabases and shapefiles. Often it’s hidden away in an unstructured format, such as text-based reports. To use this data for spatial analysis, you need to convert it into a structured, standardized format such as feature layers. However, it is difficult and time consuming to read and convert unstructured text.

Crime incident report labelled to show entities that should be extracted
Crime incident report with labelled entities, highlighting entities such as the type of crime, where it occurred, time of incident and when it was reported.

Deep Learning has made a lot of progress in natural language processing and with the EntityRecognizer model in arcgis.learn you can extract meaningful  geospatial information from unstructured text.


Crime points
Feature layer of crime incidents

This sample notebook shows how we used this model to extract information from thousands of unstructured text files containing police reports from Madison, Wisconsin, and created a map of the crime locations.


But, what about that model?

Now, you might be thinking that it’s great that arcgis.learn has support for so many models, but what about that latest and greatest deep learning model that just came out last week? Don’t worry… we’ve got you covered!

We’re adding extensibility support to arcgis.learn so you can integrate external models. The ModelExtension class allows you to bring in any object detection model (pixel classification is next in the pipeline) and integrate it with arcgis.learn. The model is then able to directly use training data exported by ArcGIS and the saved models are ready to use as ArcGIS deep learning packages. Integrating external models with arcgis.learn will help you train such models with the same simple and consistent API used by the other models.

Additionally, arcgis.learn lets you integrate ArcGIS with any prediction or classification model from the popular scikit-learn library using the new MLModel class.


Why are there so many models?

Deep learning is a rapidly evolving field, with innovations and new models coming out each month – and we’re keen on supporting and bringing forth these innovations to ArcGIS at an equally fast pace, giving you the latest and greatest models and enabling you to stay at the cutting edge in applying deep learning methods to GIS.

Each model has its strengths and is better suited for particular tasks. Taking Object Detection for example, FasterRCNN gives the best results, YOLOv3 is the fastest, SingleShotDetector gives a good balance of speed and accuracy and RetinaNet works very well with small objects.

Different models have differing requirements for memory, and differ in their speed of training and inferencing. Deeper neural networks in larger models give more accurate results but need more memory and longer training regimes. They also require larger datasets to train adequately. Some models are lightweight and better suited for deployment on mobile phones.

Just as skilled craftsmen know about each tool in their toolbox, skilled data scientists understand each model based on its unique characteristics, and apply them in the context of the problem that needs to be solved.

Attending the virtual Esri UC? We’ve put together a number of sessions on deep learning with ArcGIS to show you several of these models in action. Check out this blog post to learn more!


About the author

Director of Esri R&D Center, New Delhi & development lead of ArcGIS AI technologies and ArcGIS API for Python. Applying deep learning to the Science of Where!


Leave a Reply

Please Login to comment

Next Article

Earthquake impact analysis using ArcGIS Online and ArcGIS Living Atlas

Read this article