ArcGIS Pro

The R-ArcGIS Bridge at the 2021 Esri Developer Summit Plenary

The R-ArcGIS Bridge is the R integration for ArcGIS Pro that opens up ways for enriching GIS workflows with rich statistical analysis packages of the R language. arcgisbinding, the R package developed by the R-ArcGIS Bridge team, has new enhancements that we are excited to share at the 2021 Esri Developer Summit. In addition to our tech workshop on leveraging the R-ArcGIS Bridge, we will be showcasing the new features of arcgisbinding at the plenary.

Technology Highlights

  1. Working with R notebooks alongside ArcGIS Pro
  2. Creating interactive maps in R notebooks
  3. Calling geoprocessing tools from R

Problem Definition

Ecoregions are geographic regions of ecological systems based on vegetation, climate conditions, and land cover.

In Part 1 of this blog series, we learned about the famous Bailey’s Ecoregions map.  This map is an expert-driven interpretation of the geography of US ecoregions, which was hand-drawn by US Forest Service researchers in 1994.

Bailey’s Ecoregions are hierarchical in their size. The largest regions are domains and they group spatial units with respect to their similarities in precipitation and temperature. Divisions are subgroups inside domains and are defined with respect to similarities in precipitation and temperature levels and patterns. Lastly, divisions are made up of provinces, which are differentiated based on vegetation and other natural land cover similarities.

Ecoregion provinces defined by Bailey are given in the map below:

Bailey's ecoregion provinces for the conterminous United States

Using datasets representing climate and vegetation conditions from the same period of the 1990s, we applied a series of regionalization (clustering) algorithms in ArcGIS and R to create several data-driven interpretations of these US ecoregions.

Working with R Notebooks Alongside ArcGIS Pro

We will use R notebooks inside an ArcGIS Pro Conda environment. The Conda package r-arcgis-essentials will be used to set up this computational environment. If you would like to learn the detailed steps of setting up an R notebook environment to run alongside ArcGIS Pro, please visit this blog.

Once the r-arcgis-essentials package is installed, it enables R notebooks, the R-ArcGIS Bridge, and commonly used spatial R packages, such as sf, sp, and raster.

With our Conda environment correctly configured to power an R notebook, we can bring in the Bailey’s Ecoregions as a spatial R data frame for visualization and analysis.

Working with Cloud-Based Data Sources

Feature services and image services can be directly read in as spatial R data frames using the R-ArcGIS Bridge.  A detailed explanation about accessing remote data sources in R can be found in this blog. The Bailey’s Ecoregions feature service will be directly brought in as follows:

base_ecoregion_url <- 'https://services3.arcgis.com/oZfKvdlWHN1MwS48/arcgis/rest/services/Ecoregions/FeatureServer/0'
base_ecoregion_obj <- arc.open(base_ecoregion_url)

Using the R-ArcGIS Bridge’s conversion functions, the arc type data frame can be converted to commonly used spatial R data types such as sf or sp using arc.data2sf or arc.data2sp, respectively.

The Bailey’s Ecoregions dataset can be easily interactively mapped using the R-ArcGIS Bridge’s integration with esri-leaflet. The script below is used to create an interactive map of Bailey’s Ecoregions:

1. First, data is converted to an sf object (the original arc object can also be used, however the current version of leaflet requires the WGS84 projection):

base_ecoregion_arc <- arc.select(base_ecoregion_obj)
base_ecoregion_sf <- arc.data2sf(base_ecoregion_arc)

2. A color palette is defined for every unique province:

num.clust <- length(unique(base_ecoregion_sf$PROVINCE))
cluster.pal <- colorFactor(rainbow(num.clust), domain=base_ecoregion_sf$PROVINCE)

3. Lastly, a leaflet object is created using the sf object for Bailey’s Ecoregions and the associated color palette:

L<-leaflet(elementId='ecoregion_map') %>%
addProviderTiles(providers$Esri) %>%
addPolygons(data = st_transform(base_ecoregion_sf, 4326),
fillOpacity = 1,
color=~cluster.pal(base_ecoregion_sf$PROVINCE),
label=~sprintf("Ecoregion Province: %s", base_ecoregion_sf$PROVINCE))

Note that the label parameter defines interactive, data-driven text to be displayed, which provides information on the province that is being hovered over.

Calling ArcPy Geoprocessing Tools from R

One of the R packages that we imported into our R notebook was reticulateReticulate is a commonly used package for calling Python functions from R. It is frequently used for calling low-level Python functions and returning results from the Python function as an R data type, thus allowing Python analysis to be performed from an R session.  The new reticulate integration in the R-ArcGIS Bridge allowed us to import ArcPy, which is Esri’s Python package containing hundreds of functions for spatial data science, data conversion and management, and map automation.  Importing ArcPy allows us to call and execute geoprocessing tools directly in the R notebook, side-by-side with our R code.

The geoprocessing tool we used to perform our first data-driven interpretation of the Bailey’s Ecoregions map is ArcPy’s Spatially Constrained Multivariate Clustering tool. This tool defines spatially contiguous clusters (regions) based on a set of attributes, by assigning spatial units with similar attribute values to the same cluster.  It also allows the user to force a “spatial constraint” on the clusters, which ensures that each cluster is spatially contiguous.

The attributes used to create the clusters represent a time-series of different climatic and land-cover variables summarized in each spatial unit for the year 1994.  The most impactful variables for defining ecoregions were determined through trial-and-error, and are discussed in more detail in the previous blog in this series:

1. Maximum FAPAR
2. Mean FAPAR
3. Min FAPAR
4. Range FAPAR
5. Max LAI
6. Mean LAI
7. Minimum LAI
8. Range of LAI
9. Maximum Precipitation
10. Mean Precipitation
11. Minimum Precipitation
12. Range Precipitation
13. Maximum Temperature
14. Minimum Temperature
15. Standard Deviation of Temperature

The following code snippet is used to call the Spatially Constrained Multivariate Clustering function from ArcPy:

ARCPY$stats$SpatiallyConstrainedMultivariateClustering

R-ArcGIS Bridge’s reticulate integration allows writing the result out to an in-memory feature and seamlessly reading it in as a spatial R data frame as follows:

skater_regions <- arc.select(arc.open(out.fc), fields = c('CLUSTER_ID'))
skater_regions.sf <- arc.data2sf(skater_regions)

Performing Ecological Regionalization Using vegclust

Our second data-driven interpretation of the Bailey’s Ecoregions map was created using the R package vegclust.  The vegclust package provides methods for performing clustering on ecological data, so it is appropriate for this analysis on ecoregions.

Like the Spatially Constrained Multivariate Clustering tool, the vegclust function requires us to specify the attributes of interest.  We define a generic (non-spatial) R data frame from the original Bailey’s Ecoregions:

vars <- c("FAPAR_MAX_ZONAL", "FAPAR_MEAN_ZONAL", "FAPAR_MIN_ZONAL", "FAPAR_RANGE_ZONAL",
"FAPAR_STD_ZONAL", "LAI_MAX_ZONAL", "LAI_MEAN_ZONAL", "LAI_MIN_ZONAL", "LAI_RANGE_ZONAL",
"LAI_STD_ZONAL", "PRECIP_MAX_ZONAL", "PRECIP_MEAN_ZONAL", "PRECIP_MIN_ZONAL", "PRECIP_RANGE_ZONAL",
"PRECIP_STD_ZONAL", "TEMP_MAX_ZONAL", "TEMP_MEAN_ZONAL", "TEMP_MIN_ZONAL", "TEMP_RANGE_ZONAL",
"TEMP_STD_ZONAL")

eco.vars <- st_set_geometry(ecoregions_data.sf[vars], NULL)

Clusters (regions) are defined using the following function, which specifies the number of clusters to create, and the clustering model.  Given that we want each input polygon to be a member of only one cluster, we chose Hard C-Medoids (KMdd) as the clustering method:

eco_groups <- vegclust(x = eco.vars, mobileCenters=num.clust, method="KMdd", nstart=20)

We then use arc.write to convert our R ecoregions to a local feature class for further analysis in ArcGIS Pro:

arc.write(out.fc.vegclust, ecoregions_data.sf, overwrite=TRUE)

Lastly, both data-driven ecoregion maps created using ArcGIS and the vegclust package in R are compared against the original, expert-driven Bailey’s Ecoregions map using the new Spatial Association Between Zones tool. For details, please refer to the original blog post on defining data-driven ecoregions.

Inspired? Come Join Us at 2021 Dev Summit

New enhancements to the R-ArcGIS Bridge make it possible to work with multiple programming languages, leveraging functionality from ArcGIS Pro directly, and creating interactive maps for visualizing spatial data to its full potential. If you are interested to learn more come visit the virtual booth at the 2021 Developer Summit, and check out our product page.

About the authors

Orhun is a senior researcher for the Spatial Statistics team. His role at ESRI includes conducting applied and theoretical research into spatial and spatio-temporal machine learning methods, maintaining and adding functionality to the R-ArcGIS Bridge, and creation of educational resources such as learn lessons. He is also a member of the virtual ESRI Science Team where he works on spatial data science applications for solving problems pertaining to Earth systems. In addition to his role at ESRI, Orhun serves as a lecturer at University of Southern California's Spatial Sciences Institute. He holds a Masters and a PhD in Geostatistics, and a PhD minor in Geology from Stanford University's School of Earth, Energy and Environmental Sciences.

Connect:

Nick Giner is a Product Manager for Spatial Analysis and Data Science. Prior to joining Esri in 2014, he completed Bachelor’s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read/watch shows about history.

Connect:

Leave a Reply

Please Login to comment

Next Article

Choosing the right app in ArcGIS for SharePoint

Read this article