{"id":2282342,"date":"2024-03-13T10:00:46","date_gmt":"2024-03-13T17:00:46","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2282342"},"modified":"2024-03-15T08:54:40","modified_gmt":"2024-03-15T15:54:40","slug":"dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","title":{"rendered":"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes"},"author":15561,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[37501],"tags":[759712,771782,376702,770992],"industry":[],"product":[36571],"class_list":["post-2282342","blog","type-blog","status-publish","format-standard","hentry","category-administration","tag-arcgis-enterprise-on-kubernetes","tag-business-operations","tag-high-availability","tag-performance-monitoring","product-arcgis-enterprise"],"acf":{"authors":[{"ID":15561,"user_firstname":"Tori","user_lastname":"O hara","nickname":"Tori O'Hara","user_nicename":"tohara","display_name":"Tori O'Hara","user_email":"TOhara@esri.com","user_url":"","user_registered":"2020-03-09 20:04:56","user_description":"Tori is a technical writer on the ArcGIS Enterprise team.","user_avatar":"<img alt='' src='https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=96&#038;d=blank&#038;r=g' srcset='https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=192&#038;d=blank&#038;r=g 2x' class='avatar avatar-96 photo' height='96' width='96' loading='lazy' decoding='async'\/>"},{"ID":7561,"user_firstname":"Bill","user_lastname":"Major","nickname":"bmajor","user_nicename":"bmajor","display_name":"Bill Major","user_email":"bmajor@esri.com","user_url":"","user_registered":"2018-03-22 19:23:54","user_description":"Bill is a lead software development engineer on the ArcGIS Enterprise team, focused on ArcGIS Notebooks, security, Kubernetes, and framework development.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2022\/03\/BillMajor-213x200.jpg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.","flexible_content":[{"acf_fc_layout":"content","content":"<p>When the unexpected happens in your production environment, you hope that it is resilient enough to remain operational and recover as quickly as possible. With an ArcGIS Enterprise on Kubernetes deployment, organizations have access to multiple features that can provide this level of resiliency. In their plenary demonstration, Chris and Bill highlight some of these features, including ArcGIS Enterprise on Kubernetes\u2019s support for the multiple availability zone deployments.<\/p>\n"},{"acf_fc_layout":"kaltura","video_id":"1_312rzjmx","time":true,"start":"387","stop":"752"},{"acf_fc_layout":"content","content":"<h2><strong>Configuring a highly available deployment<\/strong><\/h2>\n<p>For their demonstration, Chris and Bill have set up a highly available deployment that will be undergoing a chaos test. While an ArcGIS Enterprise on Kubernetes deployment has <a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/administer\/minimize-data-loss-and-downtime.htm#ESRI_SECTION1_6AD6913B832B4600A73985C40A47E14F\">built-in high availability<\/a>, Chris and Bill are also leveraging a number of other features to reduce the amount of downtime, data loss, and improve resiliency.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282412,"id":2282412,"title":"grafana dashboard","filename":"grafana-dashboard.png","filesize":259222,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/grafana-dashboard","alt":"","author":"15561","description":"","caption":"The current Grafana dashboard view of Chris and Bill's Kubernetes cluster ahead of the chaos test.","name":"grafana-dashboard","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:46:59","modified":"2024-03-12 16:57:33","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1429,"height":574,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","medium-width":464,"medium-height":186,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","medium_large-width":768,"medium_large-height":308,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","large-width":1429,"large-height":574,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","1536x1536-width":1429,"1536x1536-height":574,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","2048x2048-width":1429,"2048x2048-height":574,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard-826x332.png","card_image-width":826,"card_image-height":332,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/grafana-dashboard.png","wide_image-width":1429,"wide_image-height":574}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h3><em>Cloud native object stores<\/em><\/h3>\n<p>Starting at the 11.2 release, ArcGIS Enterprise on Kubernetes organizations can use cloud native services to <a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/deploy\/cloud-integrations.htm\">integrate cloud object stores.<\/a> Cloud object stores act as the organization\u2019s object store or backup store location. This aids in increasing reliability and resiliency while also reducing the demand on in-cluster resources.<\/p>\n<h3><em>Multi-AZ deployments<\/em><\/h3>\n<p>Also starting at 11.2, ArcGIS Enterprise on Kubernetes deployments can deploy their Kubernetes cluster across <a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/administer\/multi-availability-zone-kubernetes-cluster-administration.htm\">multiple availability zones<\/a>. By using <a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/deploy\/kubernetes-concepts.htm#ESRI_SECTION1_3141E8EB891A4426952DE7702A343A4C\">topology spread constraints<\/a>, administrators can control how scheduling occurs across the cluster. For Chris and Bill\u2019s deployment, each availability zone becomes the separation boundary between replicas of each workload.<\/p>\n<h3><em>Enhanced availability architecture profile<\/em><\/h3>\n<p>The enhanced availability <a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/deploy\/architecture-profiles.htm\">architecture profile<\/a> is designed for use in business or mission-critical production environments and provides the highest level of availability, as it includes increased and expanded redundancy across critical pods. If an organization is configured to use multiple availability zones, the enhanced availability profile is the only profile that guarantees adequate coverage for all stateful workloads in the case of an availability zone failure.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Testing reliability when faced with failure<\/strong><\/h2>\n<p>Chris begins his demonstration by showing the current state of his organization. Due to the interplay between the multiple availability zone deployment, enhanced availability profile, and cloud native object store, the organization is set up to be able to withstand a significant outage.<\/p>\n<p>To put this to the test, Chris stops one of the availability zones, disrupting the cluster. As a result, machines have terminated, and numerous pods are shifting from a running to pending state.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282402,"id":2282402,"title":"pods after failure","filename":"pods-after-failure.png","filesize":197376,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/pods-after-failure","alt":"","author":"15561","description":"","caption":"A view of the pods within Chris and Bill's Kubernetes cluster, immediately after the availability zone has been stopped.","name":"pods-after-failure","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:46:56","modified":"2024-03-12 16:57:44","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":779,"height":433,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","medium-width":464,"medium-height":258,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","medium_large-width":768,"medium_large-height":427,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","large-width":779,"large-height":433,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","1536x1536-width":779,"1536x1536-height":433,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","2048x2048-width":779,"2048x2048-height":433,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","card_image-width":779,"card_image-height":433,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-after-failure.png","wide_image-width":779,"wide_image-height":433}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Observability and monitoring after failure<\/h2>\n<p>With Chris\u2019s chaos test implemented, Bill now needs to evaluate the impact of the zone\u2019s outage. From the number of pods in a pending state, we can see that Kubernetes is attempting to rebalance after the loss in capacity.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282392,"id":2282392,"title":"pods rebuilding","filename":"pods-rebuilding.png","filesize":177830,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/pods-rebuilding","alt":"","author":"15561","description":"","caption":"A view of the pods in Chris and Bill's Kubernetes cluster, showing that Kubernetes is attempting to restart, schedule, and rebalance after the outage.","name":"pods-rebuilding","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:46:51","modified":"2024-03-12 16:57:52","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":778,"height":423,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","medium-width":464,"medium-height":252,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","medium_large-width":768,"medium_large-height":418,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","large-width":778,"large-height":423,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","1536x1536-width":778,"1536x1536-height":423,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","2048x2048-width":778,"2048x2048-height":423,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","card_image-width":778,"card_image-height":423,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/pods-rebuilding.png","wide_image-width":778,"wide_image-height":423}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Next, Bill moves to check ArcGIS Enterprise Manager to check if there are any critical logs. The lack of critical logs shows that the enhanced availability architecture profile is providing the expected resiliency.<\/p>\n<p>Bill then moves on to check perform a basic health check on the core framework services, which shows that the relational store health check has failed.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282472,"id":2282472,"title":"basic health check","filename":"basic-health-check.png","filesize":184935,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/basic-health-check","alt":"","author":"15561","description":"","caption":"The results of the basic health check, showing that the relational store health check has failed.","name":"basic-health-check","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:56:54","modified":"2024-03-12 16:58:01","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1331,"height":599,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","medium-width":464,"medium-height":209,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","medium_large-width":768,"medium_large-height":346,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","large-width":1331,"large-height":599,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","1536x1536-width":1331,"1536x1536-height":599,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","2048x2048-width":1331,"2048x2048-height":599,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check-826x372.png","card_image-width":826,"card_image-height":372,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/basic-health-check.png","wide_image-width":1331,"wide_image-height":599}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>When Bill validates the relational data store, it returns a warning status. This means that the primary relational store is healthy, but the standby is not.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282382,"id":2282382,"title":"data store warnings","filename":"data-store-warnings.png","filesize":133689,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/data-store-warnings","alt":"","author":"15561","description":"","caption":"The results after validating the relational store in ArcGIS Enterprise Manager.","name":"data-store-warnings","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:46:44","modified":"2024-03-12 16:58:12","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1429,"height":390,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","medium-width":464,"medium-height":127,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","medium_large-width":768,"medium_large-height":210,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","large-width":1429,"large-height":390,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","1536x1536-width":1429,"1536x1536-height":390,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","2048x2048-width":1429,"2048x2048-height":390,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings-826x225.png","card_image-width":826,"card_image-height":225,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/data-store-warnings.png","wide_image-width":1429,"wide_image-height":390}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Though not highly available, the organization remains available in this degraded state\u00a0. Bill moves on to monitoring the organization to track current performance through his Grafana dashboard.<\/p>\n<p>The first chart in the dashboard shows the successful HTTP requests over time. This chart shows that while there is an impact to throughput while Kubernetes is rebalancing, all services are continuing to respond and are returning to their expected performance.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282432,"id":2282432,"title":"requests over time","filename":"requests-over-time.png","filesize":482508,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/requests-over-time","alt":"","author":"15561","description":"","caption":"A chart monitoring the HTTP requests occurring over time.","name":"requests-over-time","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:48:27","modified":"2024-03-12 16:58:19","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1429,"height":453,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","medium-width":464,"medium-height":147,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","medium_large-width":768,"medium_large-height":243,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","large-width":1429,"large-height":453,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","1536x1536-width":1429,"1536x1536-height":453,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","2048x2048-width":1429,"2048x2048-height":453,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time-826x262.png","card_image-width":826,"card_image-height":262,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/requests-over-time.png","wide_image-width":1429,"wide_image-height":453}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>The second chart shows request failures. While some errors have occurred, this chart also shows that critical services remain \u00a0operational even while Kubernetes is working to rebalance after the outage.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2282372,"id":2282372,"title":"failed requests over time","filename":"failed-requests-over-time.png","filesize":89778,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\/failed-requests-over-time","alt":"","author":"15561","description":"","caption":"A chart showing the error count of all requests over time.","name":"failed-requests-over-time","status":"inherit","uploaded_to":2282342,"date":"2024-03-12 16:46:14","modified":"2024-03-12 16:58:27","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1429,"height":182,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time-213x182.png","thumbnail-width":213,"thumbnail-height":182,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","medium-width":464,"medium-height":59,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","medium_large-width":768,"medium_large-height":98,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","large-width":1429,"large-height":182,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","1536x1536-width":1429,"1536x1536-height":182,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","2048x2048-width":1429,"2048x2048-height":182,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time-826x105.png","card_image-width":826,"card_image-height":105,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/failed-requests-over-time.png","wide_image-width":1429,"wide_image-height":182}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2><strong>Conclusion<\/strong><\/h2>\n<p>During their presentation, Chris and Bill showed how ArcGIS Enterprise on Kubernetes remains reliable, resilient, and operational, even when faced with critical failures.<\/p>\n<p>For more information on ArcGIS Enterprise on Kubernetes, and to see if this deployment solution is the right one for your organization, use the links below to reference our documentation and other blogs:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/announcements\/determine-whether-arcgis-enterprise-on-kubernetes-is-right-for-you\/\">Determine whether Kubernetes is right for you<\/a><\/li>\n<li><a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/administer\/minimize-data-loss-and-downtime.htm\">Data loss and downtime minimization<\/a><\/li>\n<li><a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/administer\/multi-availability-zone-kubernetes-cluster-administration.htm\">Multi-availability zone Kubernetes cluster administration<\/a><\/li>\n<li><a href=\"https:\/\/enterprise-k8s.arcgis.com\/en\/latest\/deploy\/cloud-integrations.htm\">Cloud integrations<\/a><\/li>\n<\/ul>\n"}],"related_articles":[{"ID":1192872,"post_author":"216642","post_date":"2025-06-30 06:00:25","post_date_gmt":"2025-06-30 13:00:25","post_content":"","post_title":"ArcGIS Enterprise on Kubernetes: Is it for me?","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"determine-whether-arcgis-enterprise-on-kubernetes-is-right-for-you","to_ping":"","pinged":"","post_modified":"2026-01-07 13:01:06","post_modified_gmt":"2026-01-07 21:01:06","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=1192872","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":1868612,"post_author":"15561","post_date":"2023-03-08 10:15:51","post_date_gmt":"2023-03-08 18:15:51","post_content":"","post_title":"Dev Summit 2023: Introducing disorder to an ArcGIS Enterprise on Kubernetes deployment","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"dev-summit-2023-introducing-disorder-to-an-arcgis-enterprise-on-kubernetes-deployment","to_ping":"","pinged":"","post_modified":"2023-03-10 12:51:18","post_modified_gmt":"2023-03-10 20:51:18","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=1868612","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"}],"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/03\/observability-card.png","wide_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Observability and reliability in ArcGIS Enterprise on Kubernetes<\/title>\n<meta name=\"description\" content=\"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes\" \/>\n<meta property=\"og:description\" content=\"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-15T15:54:40+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\"},\"author\":{\"name\":\"Tori O'Hara\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/6aa133818e488bcdeed452b4f04e1090\"},\"headline\":\"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes\",\"datePublished\":\"2024-03-13T17:00:46+00:00\",\"dateModified\":\"2024-03-15T15:54:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\"},\"wordCount\":10,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"ArcGIS Enterprise on Kubernetes\",\"business operations\",\"high availability\",\"performance monitoring\"],\"articleSection\":[\"Administration\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\",\"name\":\"Observability and reliability in ArcGIS Enterprise on Kubernetes\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2024-03-13T17:00:46+00:00\",\"dateModified\":\"2024-03-15T15:54:40+00:00\",\"description\":\"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/6aa133818e488bcdeed452b4f04e1090\",\"name\":\"Tori O'Hara\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=96&d=blank&r=g\",\"caption\":\"Tori O'Hara\"},\"description\":\"Tori is a technical writer on the ArcGIS Enterprise team.\",\"url\":\"\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Observability and reliability in ArcGIS Enterprise on Kubernetes","description":"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","og_locale":"en_US","og_type":"article","og_title":"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes","og_description":"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2024-03-15T15:54:40+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes"},"author":{"name":"Tori O'Hara","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/6aa133818e488bcdeed452b4f04e1090"},"headline":"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes","datePublished":"2024-03-13T17:00:46+00:00","dateModified":"2024-03-15T15:54:40+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes"},"wordCount":10,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["ArcGIS Enterprise on Kubernetes","business operations","high availability","performance monitoring"],"articleSection":["Administration"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","name":"Observability and reliability in ArcGIS Enterprise on Kubernetes","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2024-03-13T17:00:46+00:00","dateModified":"2024-03-15T15:54:40+00:00","description":"ArcGIS Enterprise on Kubernetes has a number of features that can keep production environments reliable, resilient, and operational.","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Dev Summit 2024: Observability and reliability in ArcGIS Enterprise on Kubernetes"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/6aa133818e488bcdeed452b4f04e1090","name":"Tori O'Hara","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ffe337945f9ef5fd76f657da94c70dbad1901aff84ab86f0c094ffaf63bb737e?s=96&d=blank&r=g","caption":"Tori O'Hara"},"description":"Tori is a technical writer on the ArcGIS Enterprise team.","url":""}]}},"text_date":"March 13, 2024","author_name":"Multiple Authors","author_page":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-enterprise\/administration\/dev-summit-2024-observability-and-reliability-in-arcgis-enterprise-on-kubernetes","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Newsroom-Keyart-Wide-1920-x-1080.jpg","primary_product":"ArcGIS Enterprise","tag_data":[{"term_id":759712,"name":"ArcGIS Enterprise on Kubernetes","slug":"arcgis-enterprise-on-kubernetes","term_group":0,"term_taxonomy_id":759712,"taxonomy":"post_tag","description":"","parent":0,"count":12,"filter":"raw"},{"term_id":771782,"name":"business operations","slug":"business-operations","term_group":0,"term_taxonomy_id":771782,"taxonomy":"post_tag","description":"","parent":0,"count":2,"filter":"raw"},{"term_id":376702,"name":"high availability","slug":"high-availability","term_group":0,"term_taxonomy_id":376702,"taxonomy":"post_tag","description":"","parent":0,"count":3,"filter":"raw"},{"term_id":770992,"name":"performance monitoring","slug":"performance-monitoring","term_group":0,"term_taxonomy_id":770992,"taxonomy":"post_tag","description":"","parent":0,"count":12,"filter":"raw"}],"category_data":[{"term_id":37501,"name":"Administration","slug":"administration","term_group":0,"term_taxonomy_id":37501,"taxonomy":"category","description":"","parent":0,"count":422,"filter":"raw"}],"product_data":[{"term_id":36571,"name":"ArcGIS Enterprise","slug":"arcgis-enterprise","term_group":0,"term_taxonomy_id":36571,"taxonomy":"product","description":"","parent":0,"count":972,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=arcgis-enterprise","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2282342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/15561"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2282342"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2282342\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2282342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2282342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2282342"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2282342"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2282342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}