ArcGIS Blog

Administration

ArcGIS Enterprise

Building a Scalable and Observable System Using ArcGIS Enterprise on Kubernetes

By Moses Kim

In today’s continuously active environment, organizations rely on ArcGIS Enterprise on Kubernetes to power critical, public-facing applications that can experience sudden and unpredictable traffic surges. As administrators, our mission is clear: keep the system highly available, scalable, and performant, no matter what the workload throws at us.

In this year’s Developer and Technology Summit plenary, Andrew Sakowicz demonstrates how to design, load-test, and monitor a scalable system.

Scalability

A US Wildfire Activity Web Map hosted in ArcGIS.
Apps like the wildfire map Andrew uses for the demo must be able to respond to new activity quickly and reliably.

Andrew showcases a fire warning application. Traffic for this application can spike dramatically or even virally, so during these moments, the system must scale rapidly to maintain performance.

The key to doing this effectively lies in two components:

  1. Pods – the smallest deployable compute units
  2. Nodes – the actual machines hosting those pods
Select pod auto-scaling settings are highlighted in ArcGIS Manager.
ArcGIS Manager lets administrators set the conditions for auto-scaling.

Andrew opens ArcGIS Manager and accesses the fire warning service’s settings. He enables Auto-scaling, which allows the Horizontal Pod Autoscaler (HPA) to add more pod replicas from the minimum number of 2, for maintaining standby capacity, up to the maximum number of 50, when the average CPU usage reaches 50%. 

An administrator can configure the system to deploy up to 100 replicas of a service, each requesting at least 1 CPU core. In Andrew’s case, this translates to around 100 cores, requiring approximately 13 additional nodes. Therefore, the system must also be able to timely scale the number of nodes.

Various Karpenter nodepools, as viewed through Lens.
Creating dedicated nodepools with Karpenter helps optimize node scaling.

To do this, Andrew uses Karpenter, the high-performance autoscaler built for Kubernetes. In addition to quick scaling, Karpenter offers intelligent provisioning and supports creating workload-specific, dedicated node pools for services with volatile traffic patterns. Isolating specific workloads by using these dedicated node pools minimizes resource contention between services and ensures the rest of the cluster remains unaffected. This way, the performance of other services stays predictable.

After defining the pool, Andrew assigns the high-traffic service to that pool directly in ArcGIS Manager.

Observability

To confirm that the auto-scaling is working as intended and not negatively impacting their system, however, administrators need to be able to gather the relevant data and observe changes as they happen. For this purpose, Andrew creates an observability dashboard with Grafana, using ArcGIS Enterprise metrics and Kubernetes cluster utilization metrics stored in Prometheus.

Performance and cluster utilization graphs as shown in a Grafana dashboard.
Observing key metrics through Grafana visualizations is critical to understanding how the system behaves in practice.

In this observability dashboard, Andrew can monitor how the system behaves under increasing demand, gaining clear insight into each stage of scaling and performance stabilization. Key aspects to monitor include:

  • Load: Increases in the rate of service requests should correlate with the scaling up of pods and nodes.
  • Pod scaling: As CPU usage increases from additional load, the number of service pods running should increase as the Horizontal Pod Autoscaler adds pod replicas.
  • Node scaling: As nodes reach capacity from the added pods, the number of nodes running per pool should increase as Karpenter provisions additional nodes within the dedicated pool.
  • Performance: Service response time should adjust to additional load and stabilize quickly.
  • Errors: Error rate should not significantly increase as the load increases.

Conclusion

In his demonstration, Andrew shows how ArcGIS Enterprise on Kubernetes can:

  • Scale intelligently in response to sudden, viral workloads to maintain stable performance under pressure
  • Provide the metrics necessary to fully visualize and observe how a service responds to an increase in load

Through smart application of the scalability and observability tools ArcGIS Enterprise on Kubernetes makes available, you can run a resilient, high-performance system that scales exactly when and how you need it.

Share this article