When issues arise in ArcGIS Enterprise, analysis of robust logging messages is vital to understand and rectify issues as they occur. ArcGIS Enterprise includes multiple log types which can assist administrators in understanding issues relating to performance, access issues, user login troubles, and more. In this blog, we will dive into the logging mechanisms found in ArcGIS Enterprise on Windows and Linux, and how they can be used to gain a deeper understanding of unexpected behaviors and issues as they arise.
Each ArcGIS Enterprise component includes logging that can be accessed by administrators when issues arise. ArcGIS Enterprise logs are divided into various levels that denote the severity of the message. To learn more about these levels, please consult our documentation. Here is a quick summary of how logs are accessed in ArcGIS Enterprise
- ArcGIS Server: Through the ArcGIS Server Manager, under the Logs tab
- Portal for ArcGIS: Through the Portal Admin endpoint, under the Logs subsection
- ArcGIS DataStore: Logs are located on the ArcGIS DataStore Machine in Program files. These logs are displayed in the ArcGIS Server Manager’s Logging utility
To fully understand this, let’s use the following scenario: an ArcGIS Enterprise administrator of county government, Lisa, is responsible for the maintenance and upkeep of the county’s ArcGIS Enterprise base deployment. ArcGIS Enterprise is used by the county to host authoritative data, and to serve out the county’s field operations. ArcGIS Enterprise is part of a distributed collaboration with the county’s ArcGIS Online organization to serve out its public content to its citizens.
Below are three scenarios Lisa could encounter and some ways to use logging to find information and next steps to resolve these scenarios.
Scenario 1: ArcGIS Enterprise portal
Lisa’s colleague is a content creator who was attempting to share a new feature service to ArcGIS Enterprise. When they go through the sharing process in ArcGIS Pro, publishing fails with the following error message “Error: Failed to publish web layer”. Undeterred, they attempt to publish the content directly to the ArcGIS Enterprise portal but receive another error at publication. All previously published content seems to be accessible in the organization, so the content creator reaches out to Lisa to begin a proper investigation.
Lisa begins her investigation by observing the problem alongside the content creator. After noting how the failure occurs, Lisa attempts to access already published services to see if the issue is affecting existing services. Thankfully, no existing services have been affected, however publishing any type of services fails when sharing from ArcGIS Pro and in the ArcGIS Enterprise portal. At this point in time, Lisa decides to review the Enterprise portal logs to gain a better understanding on the specific error.
The Enterprise portal collects logs on administrative events, organization management changes, content creation events, and security issues. These logs have discrete logging levels that captures different levels of event severity. To access these logs, an ArcGIS Enterprise administrator would need to sign into the Portal Administrator Directory and go to Logs.
Lisa starts by accessing the current field of logs by selecting the Query operation. This will return all logs at the default logging level for her to view. To make it easier for her to find the error, Lisa notes the time of the test start and contacts the content creator to ask them to reproduce the error. As the content creator re-attempts the workflow, they see the following SEVERE error pop up in the logs after the test begins: “Error. PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.”
This log message appears at the SEVERE level, indicating that this is a critical error that needs to be rectified immediately. Lisa starts by reading ArcGIS Enterprise documentation to review information on certificates and how they are applied. Knowing that the error is related to a certificate within ArcGIS Enterprise, Lisa finds the following document on certificates: Configure the portal to trust certificates from your certifying authority. The document describes how Portal for ArcGIS checks the certificates on all federated ArcGIS Servers to ensure that they are secure. To verify her findings, Lisa searches Esri Community for this error and confirms that this occurs when the Enterprise portal does not have the intermediate level certificate imported into its security store.
Armed with this knowledge, Lisa accesses the Enterprise portal’s security settings in the Portal Admin Directory. Looking through the imported certificates at this level, she notes that one of them has expired, and needs to be replaced. Upon uploading and setting the correct certificate, Lisa is able to resolve the publishing issue for the content creator. By using the Enterprise portal logs, Lisa was able to explore, diagnose, and resolve the certificate issue that was preventing the content creator from publishing their work.
Scenario 2: ArcGIS Server
The county hosts authoritative layers that enable users to display different types of data such as the counties infrastructure, boundaries, and certain assets. The weatherTracker feature service is used by the county to display the amount of road treatment they have available, as well as the number of snowplows available. The layer receives the most traffic during winter.
Over the years the county has updated the layer with picture attachments of their warehouses and as the size of the feature service grows its performance has declined. On an especially busy day in January, Lisa finds that the service does not load in the web map during peak usage.
Feature service tuning and resilience under load is a major factor in how organizations deploy and serve out their data. In the scenario above, Lisa discovers that a service is becoming completely inaccessible during peak hours. In her troubleshooting process, she finds out the following information:
- The weatherTracker service is the only service that exhibits access issues during peak hours
- The service is almost 1 GB in size due to the picture attachments
- At peak times, the service becomes completely unreachable, and requires a manual restart of the feature service to work.
From the findings above, Lisa goes into the ArcGIS Server Manager to investigate the service in question. She starts by accessing the logs in ArcGIS Server and querying the logs at the Warning level. She sees the following information displayed in the logs at the same time as the service fails to load:
- “Instance of service ‘System/dynamicMappingHost.MapServer’ failed to process a request”
- Initialization failed
- Error performing query operation Error handling service request… no layer or table was initialized
- Processing request took longer than the usage timeout for service
After noting the processing errors, Lisa decided to check on the service itself, taking specific notes on the tuning information she finds.
The weatherTracker service uses a Shared Instance type. Shared instances are good for services that are infrequently used, small, or are cached map services. Since the weatherTracker service was not frequently used at time of publishing, it was published to the Shared Instance resource pool to save on resources. Consulting the documentation on Configuring service instance settings, Lisa resets the service to use a dedicated instance, ensuring that the service has enough on demand resources to continue to work. She confirms throughout the day that the service is accessible and does not crash.
By changing the service to use a dedicated instance and restarting the service, Lisa is able to resolve the service crash, and makes the service run more consistently, even during peak season. To ensure that their settings are still meeting the requirements of the service, Lisa set a reminder to review the service in the off season.
Scenario 3: Collaboration sync issue
Lisa’s county uses their ArcGIS Online organization to publicly share their internal work to its citizens. The county edits the data in their ArcGIS Enterprise organization, which is then shared to the ArcGIS Online organization through distributed collaboration as a reference. In the last two weeks, Lisa has been receiving reports from customer submitted surveys that a web map indicating the status of road work in the county was inaccurate and was missing several important road closures.
After receiving these reports, Lisa accesses the web map to confirm the reports. The web map and underlying feature layers load quickly and fully in the ArcGIS Online organization. Lisa opens the corresponding map within the counties ArcGIS Enterprise deployment which shows discrepancies in construction zones. Construction zones identified in ArcGIS Enterprise are not visible in ArcGIS Online. To test the scale of the issue, Lisa checks an unrelated web map that also relies on feature services being shared through a distributed collaboration: these layers are not affected. Through these observations, Lisa has figured out that only one feature service within the distributed collaboration is experiencing sync issues.
Her distributed collaboration is configured to sync feature services through once a day. In this instance, the source of the layer is in ArcGIS Enterprise so Lisa will begin her investigation there. The status of a distributed collaboration sync can be found in the portal/sharing/rest directory /portal/sharing/rest/portals/0123456789ABCDEF/collaborations endpoint. This directory holds information on current collaborations the organization is part of, collaboration logs, and other important information.
Navigating to this endpoint, Lisa queries the collaboration logs to find that the last time a sync for the road closure feature service succeeded was more than 48 hours ago. The feature service was set to sync every day, so Lisa had identified a possible issue.
Besides logging, the collaboration endpoint also displays information on active sync tasks through /sharing/rest/portals/<portalID>/collaborations/<collaborationID/workspaces/<workspaceID>/syncStatus. At times sync tasks may get stalled, and any task flagging the ‘in_progress’ tag for longer than 24 hours should be re initiated. Lisa confirms that the sync status for the road construction service has been in the in_progress state for over than 48 hours: indicating a stall.
To fix this issue, Lisa begins by selecting “Status ID” for the road construction feature service. In the Supported Operations section of this page, Lisa selects ‘Purge status message’ to clear the stalled status. Upon clearing this, she manually schedules a sync at the portal organization home page. This sync takes some time to complete, so Lisa will return to validate that the sync task works after a few hours. After a few hours, Lisa checks the sync status on the road construction feature service. She confirms that the sync status has succeeded. She verifies this by opening the feature service in both ArcGIS Online and ArcGIS Enterprise and found no differences.
Lisa was able to troubleshoot this issue through observing the sync issue, isolating the problem to a single feature service, and catching the hung sync task in the collaboration logs within ArcGIS Enterprise.
Interpreting meaningful information from logs involves pulling in multiple data sources, repetitive testing, and activity coordination to tell a story. In the examples above, Lisa used common troubleshooting steps while monitoring logs to understand, diagnose, and resolve the organization’s issues. It is common to orchestrate different logging sources to fully understand a behavior or an issue.
This blog shows three common scenarios in which logging may be useful to troubleshoot issues, but logging can also be used to collect information about how ArcGIS Enterprise is being used and accessed. For more information on logging within ArcGIS Enterprise please read our documentation on logging!