Distributed Network Backups Provide an Alternative to Tape
By Christopher Eykamp
Those backup tapes--made daily, changed weekly, and moved off-site regularly--turn up blank or otherwise unusable in your moment of need. The machine wasn't writing properly or the tapes were defective or, more likely, someone just goofed up. The tapes weren't changed. Someone wasn't paying attention.
Something went terribly wrong and now your data is gone.
Someone has to pay.
It'll probably be you...
It could be copy from an ad in a computer magazine or it could be your worst nightmare. Scenes like this don't play out every day, but they do happen. In five years, I have seen it happen twice in two different agencies. Luckily, in both cases, the problem was detected before data was irreparably lost. Making tape backups is a tedious and dull task. It requires regular attention and supervision and rarely yields dividends. That's why it is so easy to not pay attention--why it's easy to put off making backups for another week or two.
Maybe There Is a Better Way.
If your operation, like many others, has converted from UNIX to Windows NT, you probably have a whole network of workstations loaded with ArcInfo and/or ArcView GIS that read data from a central server. If your workstations are newer models, they probably have fairly large hard drives. Since the bulk of your data is stored centrally, it's likely that the workstation disks have considerable available space. Why not use that space to make backups of the critical data from your central server? The concept is simple and implementation is straightforward.
Four Reasons to Use This Strategy
- It's easy. The process can be fully automated by using Microsoft's Task Scheduler or another software program that allows you to schedule the running of batch files or programs. Because there are no tapes to change, the process requires little maintenance once it's in place.
- It's convenient. If you need to recover data, it is far easier to retrieve it from a machine on a LAN rather than to recover it from a tape.
- It's cheap. If you already have unused disk capacity, the cost is near zero. Even if you decide to purchase an extra hard drive or two, drives are cheaper than tape units and can be converted to other uses should your needs change. And, of course, there are no tapes to purchase.
- It's safe. There are fewer points of failure. If critical data is backed up in five different locations, it is unlikely that all five locations will fail simultaneously. Hard drives are much less temperamental and require less human intervention than tape drives. This system can provide added protection if some of the machines on your LAN are in physical locations removed from the central server. The data backed up on them will be safe in case of a fire or other catastrophic event.
Of course, if you are currently using tape as a backup medium, it would probably be wise to continue. Networked backups can still be used to provide that extra bit of insurance against a tape system failure that will occur sooner or later. As a bonus, retrieving data from a recent backup on a hard drive is easier than recovering the same information from a tape even if your tape backups are running smoothly.
Implementing a Networked Backup System
First, look at the amount of data to be backed up and the available disk resources. Ideally, you want to provide many redundancies by storing each piece of data several times. Compressing data helps reduce the space consumed by backups. Vector data can typically be squeezed to half its size. Imagery data files are large and, depending on format, may not compress well. However, image data is generally static and is therefore a good candidate for a one-time backup onto CD–ROM. Imagery rarely needs to be part of a regular backup system.
Second, decide on a schedule to back up files. It may seem tempting to run backups daily, but this is not always the best practice. If your data rarely changes, it might make more sense to run backups weekly and keep several older versions of the data. Additional backups can be run manually if a large amount of data has changed in a short time.
Different schedules can be established for each workstation. One can back up the central server daily, while another workstation can perform a weekly backup and keep data for a month before dumping it. Alternatively, rapidly changing data can be backed up on three machines on a daily basis and relatively static data backed up on a weekly or monthly basis onto a fourth machine. Your particular mix of data and resources will determine the optimum schedule.
Third, implement the plan. To make this step easier, I have developed a script written in Microsoft's Visual Basic Scripting language (VBScript) that automates the process. Download the script.
Running the Script
The backup script also requires a copy of PKZip Command Line, available from the PKWARE Web site. PKZip must be located on your system path so that it can be run from a DOS prompt. Alternatively, the backup script could easily be modified to work with the command line interface of WinZip or any other command-line driven compression utility.
Basically, the backup script executes a series of calls to the makeBackup routine. This routine takes two parameters: the source directory to back up and a destination zip file to store the data. If you have used the backups variable to specify that multiple generations of backups will be retained, existing backup files are renamed and, if needed, the oldest copy is deleted. Next, PKZip runs and creates the backup archive. All activity is logged to the file specified by the backupLog variable. If errors are detected, a popup message is displayed. Otherwise, the script runs silently.
Customizing the script is simple. Change the name and location of the log file, set the number of old copies of data to keep, and change the series of makeBackup calls to fit your specific situation. The script makes copies of local data on a remote machine as well as backing up remote data locally. The script configuration assumes that each machine will have its own customized copy of the script for copying local data to one or more machines on the network and bringing remote data from one or more sources. By running the script locally, a customized schedule can be established for each machine using the NT AT command.
Suggestions for Modifying the Script
- For sensitive data, encrypt the zip files before storing them on a remote drive.
- The script can be modified to check if a blank CD–ROM is loaded on a particular drive and write the data to it if it is blank.
- An incremental backup routine can be implemented by storing a full copy of the data weekly or monthly and writing only the changes on a daily basis.
- The script can be changed so that it adjusts the number of backup copies dynamically based on the disk space available.
The backup strategies outlined above will not work in every instance. Large organizations may have too much data to implement an effective networked backup system. But smaller GIS shops, especially those with newer workstations and fat hard disks, may find that making networked backups is a good complement or even a substitute for traditional backups on tape.
About the Author
Christopher Eykamp is a GIS consultant working for BTG, Inc., in Okinawa, Japan. He is currently implementing a comprehensive geodatabase for the United States Air Force. His recent projects have included developing a wartime runway management and repair system using Visual Basic and MapObjects. He can be reached by e-mail or via his Web site. He wrote an article describing how Perl can be used with ARC Macro Language (AML) scripts that ran in the July–September 1999 issue of ArcUser magazine.
Learn more about VBScript at DevGuru Web site.
Visit the author's Web site for other scripts, tips, and tricks.