ArcGIS Blog

AI

ArcGIS Pro

Adding Spatial Context to Predictive Models with Embeddings

By Aawaj Joshi and Karthik Dutt

Predicting housing prices has always been a balancing act between data and context. Traditional models rely on structured predictors such as a house’s age, the number of rooms, or the population of the surrounding block. These variables capture important characteristics of a property, but they only tell part of the story.

What they often miss is the broader spatial context. Two homes with nearly identical characteristics can have very different values simply because they exist in different social, economic, and geographic environments.

At this year’s Developer and Technology Summit plenary, Karthik Dutt demonstrates how geodemographic embeddings are changing that paradigm. By representing each location as a learned spatial signature derived from rich demographic, business, and economic data, these embeddings allow predictive models to access layers of geographic context that were previously difficult to encode.

 

Before introducing embeddings, Karthik walks through a classic approach to predict housing prices in the Greater Los Angeles area: using structured predictors and feeding them into a standard gradient boosting model. The results are familiar: in some neighborhoods, predictions are close to reality, but across the map, errors quickly grow. Certain blocks deviate by 10%, 15%, or even 20% or more, highlighting how much variance traditional predictors leave unexplained. The model is doing the best it can with the information available, but the information itself is incomplete.

This is where geodemographic embeddings come in.

Karthik incorporates embeddings generated by the Geodemographic Foundation Model into his housing price model. These embeddings—now available as an alpha layer in ArcGIS Online—are computed at the H7 hexagonal resolution, where each hexagon covers roughly 2 square kilometers. Across the United States, this creates a grid of about 1.8 million hex bins, each with its own learned spatial representation derived from signals such as census characteristics, demographic patterns, business presence, and retail demand.

Once the model gains access to spatial features that capture the broader neighborhood context that traditional predictors alone cannot fully describe, the improvements are immediately visible.

Locations that previously appeared in darker shades of orange and red, indicating large prediction errors, now appear lighter orange and yellow, reflecting smaller deviations from actual prices. And quantitatively, the test R² improves by roughly 15%, demonstrating how spatial context can meaningfully improve predictive accuracy.

While Karthik demonstrated the power of embeddings, he emphasizes that they do not replace traditional predictors. Variables such as house age, number of rooms, and population remain critical because they describe the property itself. Embeddings extend that understanding by capturing the character of the surrounding neighborhood.

Together, they transform ordinary models into spatially aware ones. How will you use embeddings to uncover patterns in the places you study?

Share this article