Thank you for your patience on Tuesday 3/20/18 as we worked through the ArcGIS Online disruptions in providing access to your services, maps and apps. We understand the importance of continuing to provide a resilient, redundant and well architected system and we are confident that everything is back to normal and no data was lost.
What happened :
Between 9:00AM and 12:15 PM EST Tuesday March 20, 2018 ArcGIS Online experienced periods of disruption.
Why did it happen :
The ArcGIS Online system that runs on web servers hosted within Amazon Web Services (AWS) experienced timeouts and failures in accessing AWS Services including S3. These failures were the result of S3 network connectivity errors reported and described by AWS on its status page. These connectivity issues in AWS affected ArcGIS Online availability even though the ArcGIS Online system runs on redundant web servers and across multiple data centers.
The ArcGIS Online operations team worked with Amazon during the event to diagnose the problem and then began to work on the reconfigurations needed for the web servers to access the AWS services via alternate network routes. In the meantime, the original network connectivity problem was resolved by AWS.
Based on this incident we are working on an automated approach to detect network connectivity failures and to reroute network connections to the relevant AWS services where applicable.
Additional things we will be working on:
– ArcGIS Online is already taking advantage of redundancy and fail over across data centers. We will investigate and consider additional improvements including network endpoint failover where applicable as well as other mitigation strategies.
– We will continue working with AWS on further details of the root cause of the incident, and will examine engineering and operational improvements.