ArcGIS Pro

Performing Feature Extraction & Classification Using Deep Learning with ArcGIS Pro

One of the most powerful uses of Imagery, be it satellite, aerial or street view, has been feature extraction and classification. Whether it is finding an object, like a car or plane, to classifying a structure as damaged or undamaged, finding changes in landscapes or to identify different land cover types.

This has also been a very cumbersome and difficult task with human beings having to pour through many images trying to make sense of them and being prone to error.

With the advent of AI and Deep Learning, being able to automate this process using machines to perform feature extraction to solve real world problems.   Deep Learning is a type of AI like machine learning that uses neural networks with multiple layers, each being able to extract one or more unique features in an image.

With ArcGIS Pro, you can now perform the entire end to end Deep Learning workflow . Now, you may ask, what is the workflow. Well, the Deep Learning workflow mainly consists of these steps

  1. Generate training Data : Generating training samples of features or objects of interest so the machine can learn.
  2. Train Model : Use those training samples to train a deep learning model
  3. Perform Inference : Using the resulting deep learning model to perform analysis on new data. Inferencing is the process in which information learned during the deep learning training process is put to work detecting similar features in the datasets.
The various steps in the Deep Learning workflow
The various steps in the Deep Learning workflow

One of the most important aspects of wildlife management is knowing how many animals there are in a defined area. This information is crucial to creating regulation that ensure sustainable harvest as well as long term sustainability of the species. However given the size and remote terrain in many parts of Alaska , it is not an easy feat.

Each animal species offers its own challenges and requires different methods to track them. For the case of caribou, each year in summer time when caribou are harassed by insects, a herd groups together and escapes to the ridgelines, snowfields, and coastlines. This type of grouping behavior allows biologists to monitor caribou by flying over herds with cameras mounted in small aircraft and taking photographs of the groups. They can use these images to take a count or census of the herd.

While this might sound easy , getting these images of caribou congregation takes a while.  If you want to learn more about how this is done The Alaska department of Fish and Game have documented this in detail on their site , Alaska Department of Fish and Game 


A herd of Caribou in Alaska (click on pic to go to original site - Alaska Department of Fish and Game )

Once the images are collected, they would need to count the caribou. In some herds the number of animals can be as much as 400,000 so this has been a long  excruciating task, requiring  lot of personnel and prone to human error. Using ArcGIS Deep Learning technology, they can achieve this in short amount of time and a lot more efficiently in 3 easy steps.

Here to showcase the technology, we are going to use one of aerial images that their airplanes captured of a big herd shown below.

Single image from aircraft of a herd of Caribou

Step 1 : Generating Training Data 

Prior to training a deep learning model, training samples must be created to represent areas of interest – in this case we will collect caribou and classify them into adult and young.

We will do this using the Label Objects for Deep Learning tool in ArcGIS Pro. We will classify the animal into 2 classes Adult & Calf . 

Training Samples collected from image

For this case we have collected all together 1273 Features with 870 Adults and 403 Calves

Once the training samples are collected, we then export the features to a training format that can be used to train a Deep Learning Model . Using the Export training data tab of the Label Object Tools, the samples were converted to training data. There are a number of metadata formats that can be used for Object Detection like Pascal VOC , KITTI Rectangle and more. For this case we used RCNN_Masks which is used for Instance segmentation.

Converting training samples to training data for training a model

Step 2 : Training a Deep Learning Model

Once we had the required training data, a Deep Learning model was trained . The Train Deep Learning Model Tool (in the Image Analyst Tools toolbox) was used to train the model. The model type used for training was Mask_RCNN with a 90/10 training test ratio.

The tool prepares the data, performs data augmentation and set the right hyper parameters to create a good model. Normalization and augmentation such contrast, brightness, rotation are automatically performed as part of the training process.

Because training can be time consuming, it’s always a good idea to first try with smaller number of epochs e.g. 10-30 epochs. We first trained the model for 30 epochs. It gave fairly good result but in order to improve the accuracy, we set the training to 200 epochs. We also wanted to stop training when model accuracy stops improving so we used “Stop when model stops improving” parameter of the tool. The tool stopped training before 100 epochs in under 2 hours.

Train Deep Learning Model tool with parameters
Train Deep Learning Model tool with parameters

Once complete, the model metrics were analyzed to verify the accuracy of the model. Based on the training and test loss curves, the model has performed well and there is no overfitting.

Training and Validation loss graph
Training and Validation loss graph

Step 3 : Perform Inference

Once the model was created, the Detect Objects for Deep Learning tool was used to perform analysis and Inference. Using one of the image mosaics, the analysis was run on one of the herd gatherings.


Detect Objects Using Deep Learning tool with parameters
Detect Objects Using Deep Learning tool with parameters

For inferencing we leveraged the GPU, which helped us process the data and output the result in couple of minutes. You can see below the detection and classification of adult and calf caribou to be very accurate.

Caribou - Detect Objects output
Caribou - Detect Objects output


This workflow has been able to detect and classify 4,788 Adults and 2,414 Calves in our area of interest which was about 7 acres. In order to get better model accuracy, you might want to try improving your samples by including more examples and more varieties to include geographic and weather conditions. Transfer learning and data preparation greatly decrease time to train efficient models.

Using ArcGIS Pro,  you can perform end-to-end Deep Learning workflows for image analysis like feature extraction and classification just to name a few.

Final Result
Final Result

Resource links

Deep Learning with ArcGIS Pro Tips and Tricks: Part 1

Introducing ready-to-use geospatial deep learning models



About the authors

Sangeet Mathew

Senior Product Engineer on the Imagery team at Esri, with a focus on AI & Image Analysis.


Pavan Yadav

Pavan Yadav is a product engineer with the ArcGIS Imagery team. He is passionate about GeoAI, more specifically Deep Learning, and applying them to solve GIS, Image Analysis, and Image processing problems.

1 Comment
Inline Feedbacks
View all comments

Next Article

Multi-Scale Contour Styling in ArcGIS Pro

Read this article