ArcGIS Enterprise

Design your ArcGIS Enterprise backup strategy

To err is human – to really mess it up takes a computer. We think of that adage whenever a machine fails and we’re left scrambling to recover lost data and return operations back to normal. For any organization, a disaster event – whether it’s from power loss, machine crash, or a flood – can be catastrophic. That’s why it’s important to design and implement a customized, considerate backup strategy for your ArcGIS Enterprise deployment. Being confident and disciplined in your disaster recovery abilities saves you money and time, and reduces risk to your employees and customers in the event of a disaster.

About the WebGIS DR tool

The webgisdr tool is a complete backup operation for your ArcGIS Enterprise deployment. The tool takes a snapshot of your ArcGIS Enterprise portal content, your ArcGIS Server services, and the 2D and 3D data hosted in your ArcGIS Data Store, as well as the current settings of all three software components. Be aware of what webgisdr won’t back up: the data in your file geodatabases and enterprise geodatabases, as well as your tile caches.

It’s run via your command prompt – meaning you can use Windows Task Scheduler or a Linux cron job to automate backups. The output file is compressed and saved to the backup location you specify.

Craft your strategy

The first step in creating your ideal backup strategy is defining all aspects of your system and your disaster recovery needs. Because ArcGIS Enterprise customers differ so widely in their organization size and usage of the software, it’s impossible to prescribe a perfect backup plan for every customer. What’s best for you depends on a mix of several factors. How much content do you have, and how much content do your users create each week? Do you have service-level agreements defining your system’s allowed downtime frequency? How much space is available in your infrastructure for backup content?

Disaster recovery is all about minimizing data loss and downtime. To agree on tolerable limits for these, you should quantify your organization’s recovery point objective (RPO) and recovery time objective (RTO). Your RPO, as a policy, is the furthest point back in time you’ll allow a restore – in other words, how much data loss you’re willing to tolerate. If you set your RPO as one week, for example, your backups will need to be at least as frequent. Your RTO is the longest you can have your system (or a part of your system) be unavailable – in other words, how much downtime you’re willing to tolerate. This relates to your backup policy with the size of your backups, and how long a restore might take.

Once you have determined your organization’s disaster-recovery needs and objectives, it’s time to figure out your backup strategy. There are two decision points when setting up your webgisdr schedule:

  1. Backup scope. Do you want to only take full backups, or also use incremental backups?
  2. Backup frequency. How frequently do you want to make backups? If you are using a combination of multiple backup tools, how does that affect your backup frequency?

A general tradeoff you’ll find is that when you choose to back up more data more frequently, you’ll gain greater control and minimize data loss, but do so at greater resource cost.

Backup scope

A key feature of the webgisdr tool is the ability to choose between full and incremental backups –using both kinds of backups lets you shorten your RPO without dramatically increasing your storage usage.

Full backups take a complete snapshot of everything they cover, regardless of creation date. A full backup of a certain directory would clone the entire contents of that directory.

Incremental backups start from a full backup, and only grab the content created since that full backup. If all that’s been created since the last full backup was a single new file, the incremental backup would just clone that file and add it to the backup location. Incremental backups are intended to fill the gaps between full backups, letting you restore to a more recent point in time while taking up a lot less storage space.

Let’s visualize the two options you have here.

Take full system backups only.

With this option, if you back up your deployment once a week, you put your organization at risk of losing a good deal of data in the event of a failure. Say that in the GIF below, your system goes down on day five of a backup cycle – that’s a lot of content gone! Alternatively, you could take full backups daily, but the downside there is storage costs; full backups take a lot of disk space and time to create.

Full backups every six days
In this scenario, full backups are scheduled every six days.

Take full system backups and more frequent incremental backups.

Here, you’d schedule incremental webgisdr backups between full backups. This option gives you greater control, as you can choose from several recovery points to restore. With that flexibility from the incremental backups, you can minimize the amount of work lost in the restore. And you’re doing so without taking full backups every day, which can get prohibitively space- and compute-intensive.

In this scenario, full backups are still taken every six days, but incremental backups are also taken each day in between. Each incremental backup contains all changes to content and settings since the last full backup, meaning each day's incremental backup is larger than the previous day's.

Disk space

A key consideration that’s often overlooked is the need for adequate disk space with your temp folder. When you use the webgisdr tool, your backup content is first copied to the temp folder on your machine (or each of your machines), before being moved to and consolidated in your backup location. This means each machine will briefly host up to double its content – and if there isn’t enough space on the machine, the backup will fail. Remember this when you design your backup strategy.

Backup frequency

How often you take or schedule backups of your system or its components is the most tailored aspect of your backup strategy. This is dependent on your choice – for example, if you are using incremental backups, you wouldn’t have to take full backups as often to guarantee a certain RPO.

Putting it all together

Now that you know your options, it’s time to sort out what combination is best for your organization. We want to repeat: No one backup strategy is right for every organization, even at a broad level. You know your needs and storage capacities best, and taking them into account for a proactive, careful backup strategy will help protect your organization if disaster ever strikes.

About

Scott is a product engineer on the ArcGIS Enterprise team. Follow him on Twitter: @macd_sm.

About Jon Quinn

Jon is a product engineer on the ArcGIS Enterprise team, focusing on high availability and disaster recovery.

Next Article

Deep Learning in ArcGIS Pro: Integrating Hierarchical Models with the ArcGIS Platform

Read this article