ArcGIS Blog

3D Visualization & Analytics

ArcGIS Pro

Fast, accurate water body extraction for Reality Mapping using SAM3

By Ashleigh Sier

Water remains one of the more challenging features in Reality Mapping workflows. Its reflective nature, lack of texture, and seasonal variability can introduce noise and artefacts if not handled correctly.

In an earlier blog, we explored how to create water body geometries using traditional image classification workflows in ArcGIS Pro. While that approach is still valid, recent advances in foundation models now make it possible to significantly simplify the process.

In this post, I demonstrate a streamlined workflow using the SAM3 deep learning model and simple text prompts to automatically extract water bodies ready for use in Reality Mapping reconstruction workflows.

Why use SAM3 for water bodies?

SAM3 (Segment Anything Model v3) enables object detection and segmentation using natural‑language prompts rather than manually defined training samples or class schemas. For Reality Mapping projects, this offers some clear advantages:

  • Fewer steps: no training data collection or classifier tuning
  • Consistent results: across rivers, lakes, and reservoirs
  • Scalable processing: for large project areas

For water extraction, prompts such as:

“water, lake, river, dam, creek”

are often sufficient to generate reliable results across a wide variety of environments.

The SAM3 model and associated metadata are available directly from ArcGIS content, making it straightforward to integrate into existing workflows.

You can find the SAM3 model HERE on Esri’s Living Atlas.  Thanks to Eagle Technology Group Ltd in New Zealand for this great submission!

Project context and data characteristics

For this example, I applied the workflow to a Reality Mapping dataset covering approximately 50 square kilometres in Stuttgart, Germany.  This is the same dataset from the previous blog with data provided by GeoFly GmbH – Vexcel Osprey 4.1 – processed with ArcGIS Reality by Esri. Copyright © GeoFly GmbH/Esri.

Key characteristics of the data and processing setup include:

  • Imagery resolution: approximately 5 cm
  • SAM3 inference: run at full native resolution
  • Batch size: 32 (probably could have put this up a bit with my decent Graphics Card)
  • GPU: NVIDIA GeForce RTX 4090
  • Input: adjusted imagery following Reality Mapping alignment

Running SAM3 at native resolution ensures that narrow rivers, complex shorelines, and subtle water boundaries are preserved which is an important factor for achieving high‑quality downstream reconstruction results.

Workflow overview

This workflow begins after image alignment, making it well suited for refining inputs before dense reconstruction.

1. Confirm alignment in Reality Studio

After completing image alignment in a Reality Mapping solution (I used ArcGIS Reality Studio this time), I reviewed the results to ensure consistent alignment across water bodies, shorelines, and river corridors.

Camera Frustums in ArcGIS Reality Studio
Camera Frustums in ArcGIS Reality Studio

2. Prepare the imagery in ArcGIS Pro

Next, I opened the aligned imagery in ArcGIS Pro via a mosaic dataset created from the adjusted frame and camera tables.

For simplicity and performance, I exported a single Cloud Raster Format (CRF) covering the project area. This avoids processing large image collections tile‑by‑tile and provides an optimised raster input for deep learning inference.

3. Run Detect Objects Using Deep Learning

I then used the Detect Objects Using Deep Learning geoprocessing tool with the SAM3 model.

Key parameters included:

  • Input raster: exported CRF
  • Model: SAM3
  • Prompt: water, lake, river, dam, creek
  • Batch size: 32
  • Output: polygon features
Detect Objects Using Deep Learning
Detect Objects Using Deep Learning

Post‑processing and cleanup

Because SAM3 processes imagery in tiles, the initial polygon output can contain small artefacts or discontinuities along tile boundaries. To regularise the geometry and produce cleaner inputs for Reality Mapping, I applied a short post‑processing sequence.

The following steps were used:

  1. Apply a positive buffer of 0.1 m to the detected water body polygons
  2. Run Polygon to Raster
  3. Convert back using Raster to Polygon
  4. Apply Eliminate Polygon Part to remove small internal artefacts

This sequence smooths tile seams and merges fragmented shapes into clean, contiguous water body geometries that are far more suitable for downstream use.

From here, only minor manual cleanup was required. As expected, there were some false positives and a few missing areas, but the most challenging components, like long and reflective river sections, were handled extremely well.

First Results
First Results
Lets take a closer look
Lets take a closer look

Performance and processing time

For this 50 km² dataset, running SAM3 at 5 cm resolution with a batch size of 32 on a GeForce RTX 4090 took approximately 10 hours end‑to‑end.

This reflects a full‑area run without constraining the analysis to a smaller target polygon. While processing time can be reduced by limiting the spatial extent, this approach fits well into a practical production model.

In reality in your day to day job, this works really well if you start the process as you leave on Friday afternoon and return to your desk on Monday morning to review results and perform minor cleanup.

Using the results in Reality Mapping

Once validated, the extracted water body polygons can be:

  • Imported into your preferred Reality Mapping solution for your reconstruction
  • Reused across multiple production runs for the same area

At this stage, you can choose between precise or coarse water body workflows. While both are valid, I generally recommend coarse water bodies:

  • Less editing effort
  • More tolerant of small shoreline or seasonal changes
  • Often reusable across multiple projects
  • Effectively handled during final reconstruction in Reality Studio

Final thoughts

By combining SAM3, text prompting, and Reality Mapping‑ready imagery, water body extraction becomes faster, simpler, and far more scalable without sacrificing output quality.

Although some light cleanup is still required, this workflow removes much of the most time‑consuming and error‑prone work, particularly for large areas and complex river systems. As foundation models continue to mature, approaches like this are quickly becoming a standard component of modern Reality Mapping production pipelines.

Share this article

Leave a Reply