Back Up and Restore a Disk Store

You do backup and restore operations differently for online and offline distributed systems.

Online Backup

The vFabric GemFire online backup operation creates a backup of disk stores for all members running in the distributed system when the backup command is invoked.
Note: Do not try to create backup files from a running system using your operating system's file copy commands. You will get incomplete and unusable copies.

The backup works by passing commands to the running system members. Each member with persistent data creates a backup of its own configuration and disk stores. The backup does not block any activities in the distributed system, but it does use resources.

Preparing for Backup

  1. You might want to compact your disk store before running the backup. If auto-compaction is turned off, you may want to do a manual compaction to save on how much data will be copied over your network by the backup. For more information on configuring a manual compaction, see Manual Compaction
  2. Run the backup during a period of low activity in your system. The backup does not block system activities, but it uses file system resources on all hosts in your distributed system and can affect performance.
  3. Configure each member’s cache.xml with any files or directories you want backed up in addition to the disk store files. We recommend that you back up:
    • cache.xml
    • application jar files
    • any other files that the application needs when starting (a file that sets the classpath would be one example)
    Any directory that you specify is copied recursively, with any disk stores that are found excluded from this user-specified backup. Example:
  4. Back up to a SAN (recommended) or to a directory that all members can access. Make sure the directory exists and has the proper permissions for your members to write to it and create subdirectories.

    The directory you specify for backup can be used multiple times. Each backup first creates a top level directory for the backup, under the directory you specify, identified to the minute.

    You can use one of two methods:
    • Use a single physical location, such as a network file server. Example:
    • Use a directory that is local to all host machines in the system. Example:
  5. Make sure there is a file for the distributed system in the directory where you run the gemfire command. The file is required by the backup command so that it can connect to the specified distributed system and instruct members to back up their disk stores. Make sure that locators or mcast-port are correctly set in the file to connect to the distributed system that you want to back up.
  6. Make sure all members with persistent data are running in the system. Offline members cannot back up their disk stores. The tool gives a message telling you about any members that are offline:
    The backup may be incomplete. The following disk stores are not online:
        DiskStore at /home/dsmith/dir3

Running the Backup Command

  1. If you have disabled auto-compaction, run manual compaction:
    gemfire compact-all-disk-stores
  2. Run the backup command, providing your backup directory location. Example:
    gemfire backup /export/fileServerDirectory/gemfireBackupLocation
  3. The tool reports on the success of the operation. If the operation is successful, you see a message like this:
    Connecting to distributed system:[26340]
    The following disk stores were backed up:
    	DiskStore at /home/dsmith/dir1
    	DiskStore at /home/dsmith/dir2
    Backup successful.
    If the operation does not succeed at backing up all known members, you see a message like this:
    Connecting to distributed system:[26357]
    The following disk stores were backed up:
    	DiskStore at /home/dsmith/dir1
    	DiskStore at /home/dsmith/dir2
    The backup may be incomplete. The following disk stores are not online:
    	DiskStore at /home/dsmith/dir3

    A member that fails to complete its backup is noted in this ending status message and leaves the file INCOMPLETE_BACKUP in its highest level backup directory. Offline members leave nothing, so you only have this message from the backup operation itself.

  4. Validate the back up. To ensure that the backup can be recovered, it's a good idea to validate the backed-up files. Run the validate-disk-store command on the backed-up files for each disk store.
    cd 2010-04-10-11-35/straw_14871_53406_34322/diskstores/ds1
    gemfire validate-disk-store ds1 dir0 dir1 [... dirN]
    Repeat for all disk stores of all members.

What the Online Backup Saves

For each member with persistent data, the backup includes the following:
  1. Disk store files for all stores containing persistent region data
  2. Any files or directories you have configured to be backed up in cache.xml <backup> elements. Example:
  3. Configuration files from the member startup.
    •, with the properties the member was started with
    • cache.xml, if used
    These configuration files are not automatically restored, to avoid interfering with any more recent configurations. In particular, if these are extracted from a master jar file, copying the separate files into your working area could override the files in the jar. If you want to back up and restore these files, add them as custom <backup> elements.
  4. A restore script, written for the member’s operating system, that copies the files back to their original locations. For example, in Windows, the file is restore.bat and in Linux, it is

Disk Store Backup Directory Structure and Contents

Offline Members: Manual Catch-Up to an Online Backup

If you must have a member offline during an online backup, you can manually back up its disk stores. Do one of the following:
  • Keep the member’s backup and restore separated, doing offline manual backup and offline manual restore, if needed.
  • Bring this member’s files into the online backup framework manually and create a restore script by hand, from a copy of another member’s script:
    1. Duplicate the directory structure of a backed up member for this member.
    2. Rename directories as needed to reflect this member’s particular backup, including disk store names.
    3. Clear out all files but the restore script.
    4. Copy in this member’s files.
    5. Modify the restore script to work for this member.

Restore an Online Backup

The restore script copies files back to their original locations. You can do this manually if you wish.
  1. Restore your disk stores when your members are offline and the system is down.
  2. Read the restore scripts to see where they will place the files and make sure the destination locations are ready. The restore scripts refuse to copy over files with the same names.
  3. Run the restore scripts. Run each script on the host where the backup originated.
The restore copies these back to their original location:
  1. Disk store files for all stores containing persistent region data
  2. Any files or directories you have configured to be backed up in the cache.xml <backup> elements

Offline File Backup and Restore

With the system offline, you copy and restore your files using your file system commands.

To back up your offline system:
  1. Validate, and consider compacting your disk stores before backing them up.
  2. Copy all disk store files, and any other files you want to save, to your backup locations.
To restore a backup of an offline system:
  1. Make sure the system is either down or not using the directories you will use for the restored files.
  2. Reverse your backup file copy procedure, copying all the backed up files into the directories you want to use.
  3. Make sure your members are configured to use the directories where you put the files.
  4. Start the system members.