ArcGIS Blog

AI

ArcGIS Pro

Adding Spatial Context to Predictive Models with Embeddings

By Aawaj Joshi and Karthik Dutt

Predicting housing prices has always been a balancing act between data and context. Traditional models rely on structured predictors such as a house’s age, the number of rooms, or the population of the surrounding census block group. These variables capture important characteristics of a property, but they only tell part of the story.

What they often miss is the broader spatial context. Two homes with nearly identical characteristics can have very different values simply because they exist in different social, economic, and geographic environments.

At this year’s Developer and Technology Summit plenary, Karthik Dutt demonstrates how geodemographic embeddings are changing that paradigm. By representing each location as a learned spatial signature derived from rich demographic, business, and economic data, these embeddings allow predictive models to access layers of geographic context that were previously difficult to encode.

 

Before introducing embeddings, Karthik walks through a classic approach to predict housing prices in the Greater Los Angeles area: using structured predictors and feeding them into a standard gradient boosting model. The results are familiar: in some neighborhoods, predictions are close to reality, but across the map, errors quickly grow. Certain block groups deviate by 10%, 15%, or even 20% or more, highlighting how much variance traditional predictors leave unexplained. The model is doing the best it can with the information available, but the information itself is incomplete.

Certain block groups deviate by 10%, 15%, or even 20% or more.
Certain block groups deviate by 10%, 15%, or even 20% or more

This is where geodemographic embeddings come in.

Karthik incorporates embeddings generated by the Geo-Demographic Foundation Model into his housing price model. These embeddings are computed at the H7 hexagonal resolution, where each hexagon covers roughly 2 square kilometers. Across the United States, this creates a grid of about 1.8 million hex bins, each with its own learned spatial representation derived from signals such as census characteristics, demographic patterns, business presence, and retail demand.

Each hexagon covers roughly 2 square kilometers and has its own learned spatial representation.
Each hexagon covers roughly 2 square kilometers and has its own learned spatial representation

Once the model gains access to spatial features that capture the broader neighborhood context that traditional predictors alone cannot fully describe, the improvements are immediately visible.

Locations that previously appeared in darker shades of orange and red, indicating large prediction errors, now appear lighter orange and yellow, reflecting smaller deviations from actual prices. And quantitatively, the test R² improves by roughly 15%, demonstrating how spatial context can meaningfully improve predictive accuracy.

Locations now appear lighter orange and yellow, reflecting smaller deviations from actual prices.
Locations now appear lighter orange and yellow, reflecting smaller deviations from actual prices
The test R² improves by roughly 15%.
The test R² improves by roughly 15%

While Karthik demonstrated the power of embeddings, he emphasizes that they do not replace traditional predictors. Variables such as house age, number of rooms, and population remain critical because they describe the property itself. Embeddings extend that understanding by capturing the character of the surrounding neighborhood. Together, they can transform ordinary models into spatially aware ones.

Esri is actively working to make embeddings available for use in ArcGIS. Once available, how will you use them to uncover patterns in the places you study?

Share this article