Where Deep Learning Meets GIS

The field of artificial intelligence (AI) has progressed rapidly in recent years, matching or, in some cases, even surpassing human accuracy at tasks such as image recognition, reading comprehension, and translating text.  The intersection of AI and GIS is creating massive opportunities that weren’t possible before. AI, machine learning, and deep learning are helping us make the world better by helping, for example, to increase crop yield through precision agriculture, understand crime patterns, and predict when the next big storm will hit and being better equipped to handle it.

Broadly speaking, AI is the ability of computers to perform a task that typically requires some level of human intelligence. Machine learning is one type of engine that makes this possible. It uses data-driven algorithms to learn from data to give you the answers that you need. One type of machine learning that has emerged recently is deep learning. Deep learning uses computer-generated neural networks, which are inspired by and loosely resemble the human brain, to solve problems and make predictions.

Machine Learning in ArcGIS

Machine learning has been a core component of spatial analysis in GIS. These tools and algorithms have been applied to geoprocessing tools to solve problems in three broad categories. With classification, you can use vector machine algorithms to create land-cover classification layers. Another example is clustering, which lets you process large quantities of input point data, identify the meaningful clusters within them, and separate them from the sparse noise. Prediction algorithms, such as geographically weighted regression, gives you the ability to model spatially varying relationships. These methods work well in several areas, and their results are interpretable, but they need experts to identify or feed in those factors (or features) that affect the outcome that we’re trying to predict.

The Rise of Deep Learning

Wouldn’t it be great if the machine figured out what those factors/features should be just by looking at the data? That’s where deep learning comes in. It’s inspired by and loosely resembles the human brain. In a deep neural network, there are neurons that respond to stimuli and are connected to each other in layers. Neural networks have been around for decades, but it has been a challenge to train them. Watch the webinar: Integrating Deep Learning with ArcGIS Using Python.

The advent of deep learning can be attributed to three primary developments in recent years—availability of data, fast computing, and algorithmic improvements.

Data: We now have vast quantities of data, thanks to the Internet, the sensors all around us, and the numerous satellites that are imaging the whole world every day.

Computing: We have powerful computational resources, thanks to cloud computing and graphics processing units (GPUs) that have become more powerful than ever and gone down in price, thanks to the gaming industry.

Algorithmic improvements: Finally, researchers have now cracked some of the most challenging aspects of training the deep neural networks through algorithmic improvements and network architectures.

Applying Computer Vision to Geospatial Analysis

One area of AI where deep learning has done exceedingly well is computer vision, or the ability for computers to see. This is particularly useful for GIS, as satellite, aerial, and drone imagery is being produced at a rate that makes it impossible to analyze and derive insight through traditional means.

The figure below shows some of the most important computer vision tasks or use cases and how they can be applied to GIS:

Important computer vision tasks are applied to GIS.

The simplest is image classification, in which the computer assigns a label, such as “cat” or “dog,” to an image. This can be used in GIS to categorize geotagged photos. One of the images above has been classified as a dense crowd. This pedestrian activity classification can be used for pedestrian and traffic management planning during public events. An example of this was demonstrated at the 2018 Esri User Conference Plenary Session by the staff at Cobb County, Georgia.

Cobb County, Georgia, conducts traffic and pedestrian movement planning.

With object detection, the computer needs to find objects within an image as well as their location. This is a very important task in GIS—finding what is in satellite, aerial, or drone imagery, locating it, and plotting it on a map. This can be used for infrastructure mapping, anomaly detection, and feature extraction.

Swimming pools are detected within residential parcels.

For an example of using deep learning to detect and classify swimming pools, see the detailed blog post “Swimming Pool Detection and Classification Using Deep Learning” on Medium or “How We Did It: Integrating ArcGIS and Deep Learning at UC 2018” on the ArcGIS blog.

Another important task in computer vision is semantic segmentation, in which each pixel of an image is classified as belonging to a particular class. For instance, in the first image this this article,  the cat is in the yellow pixels, the green pixels belong to the ground class, and the sky is in blue. In GIS, semantic segmentation can be used for land-cover classification or to extract road networks from satellite imagery.

Land-cover classification uses deep learning.

A nice early example of this work and its impact is the success the Chesapeake Conservancy has had in combining Esri GIS technology with the Microsoft Cognitive Toolkit (CNTK) AI tools and cloud solutions to produce the first high-resolution land-cover map of the Chesapeake watershed. This work is now also available as a tutorial and can be deployed on a Microsoft Data Science Virtual Machine (DSVM) on Azure.

Another type of segmentation is instance segmentation. You can think of this as a more precise object detection in which the precise boundary of each object instance is marked out. Instance segmentation can be used for tasks like improving basemaps. This can be done by adding building footprints or reconstructing 3D buildings from lidar data.

The building above (left) was reconstructed in 3D using masks digitized by human editors. The same building (right) was reconstructed in 3D from the masks produced by the Mask R-CNN model.

Esri recently collaborated with NVIDIA to use deep learning to automate the manually intensive process of creating complex 3D building models from aerial lidar data for Miami-Dade County in Florida. See this detailed post “Reconstructing 3D Buildings from Aerial Lidar with AI: Details” on Medium or “Restoring 3D Buildings from Aerial Lidar with Help of AI” on the ArcGIS Blog to learn how this was done.  

Deep Learning for Mapping

In working with satellite imagery, one important application of deep learning is creating digital maps by automatically extracting road networks and building footprints.

Imagine applying a trained deep learning model on a large geographic area and arriving at a map containing all the roads in the region, then having the ability to create driving directions using this detected road network. This can be particularly useful for developing countries that do not have high-quality digital maps or in areas where newer developments have been built.


Roads can be detected using deep learning and then converted to geographic features.

Good maps need more than just roads—they need buildings. Instance segmentation models like Mask R-CNN are particularly useful for building footprint segmentation and can help create building footprints without any need for manual digitizing. However, these models typically result in irregular building footprints that look more like Antonio Gaudi masterpieces than regular buildings with straight edges and right angles. Using the Regularize Building Footprint tool in ArcGIS Pro can help restore the straight edges and right angles necessary for an accurate representation of building footprints.

Building footprints extracted out of satellite imagery and regularized using the Regularize Building Footprint tool in ArcGIS Pro is shown.

Integrating ArcGIS with AI

ArcGIS has tools to help with every step of the data science workflow including data preparation and exploratory data analysis; training the model; performing spatial analysis; and finally, disseminating results using web layers and maps and driving field activity. To add context and depth to your analysis, you can use  content from Esri’s ArcGIS Living Atlas of the World. This large collection of Esri-curated and partner-provided imagery can be critical to a deep learning workflow.

ArcGIS Pro includes tools for helping with data preparation for deep learning workflows and has been enhanced for deploying trained models for feature extraction or classification. ArcGIS Image Server in the ArcGIS Enterprise 10.7 release has similar capabilities, providing the ability to deploy deep learning models at scale by leveraging distributed computing. The arcgis.learn module in ArcGIS API for Python enables GIS analysts and data scientists to train deep learning models with a simple, intuitive API. ArcGIS Notebooks provides a ready-to-use environment for training deep learning models. ArcGIS includes built-in Python raster functions for object detection and classification workflows using CNTK, Keras, PyTorch,, and TensorFlow. Additionally, you can write your own Python raster function that uses your deep learning library of choice or specific deep learning model/architecture. See this handy guide to get started.

Deep learning is a rapidly evolving field and allows data scientists to leverage cutting-edge research while taking advantage of an industrial-strength GIS. Python has emerged as the lingua franca of the deep learning world with popular libraries like TensorFlow, PyTorch, or CNTK chosen as the primary programming language. ArcGIS API for Python and ArcPy are a natural fit for integrating with these deep learning libraries, giving you more capabilities.

While the examples in this article have focused on imagery and computer vision, deep learning can be used equally well for processing large volumes of structured data such as observations from sensors, or attributes from a feature layer. Applications of such techniques to structured data include predicting the probability of accidents, sales forecasting, and natural language routing and geocoding.

Esri is investing heavily in these emerging technologies and has started a new R&D center in New Delhi, focused on AI and deep learning on satellite imagery and location data. Visit the Esri R&D Center—New Delhi to learn more about our work.


About the author

Rohit Singh

Rohit Singh is the managing director of Esri's R&D Center in New Delhi and leads the development of data science, deep learning and geospatial AI solutions in the ArcGIS platform. He is passionate about deep learning and its intersection with geospatial data and satellite imagery and has been recognized as an Industry Distinguished Lecturer for the IEEE- Geoscience and Remote Sensing Society (GRSS). Rohit is a graduate of the Indian Institute of Technology, Kharagpur, and has worked at computer vision startups and IBM before joining Esri. He conceptualized, designed and developed the ArcGIS API for Python, ArcObjects Java, ArcGIS Engine Java API and ArcGIS Enterprise (Linux) while at Esri.