Monday, October 21, 2013

Partitioned Backup Retention

I manage some servers on Amazon EC2, and recently I've been rewriting our backup scripts. Previously we had shell scripts that made a tarball, encrypted it via openssl, and dumped it in a directory. Old backups were cleaned up with a find command, and then the whole directory was synced to s3 with s3sync. Our retention policy was simply to keep the last month's daily backups, mostly because this was easy to do with find. S3 storage is cheap but definitely not free, so we'd like to do a bit better.

Requirements

I set out to find a new rotation scheme based on the following assumptions:
  1. The density of backup points as a function of age should be exponential, with many more recent backups than older backups.
  2. Backups will be triggered regularly, but might be skipped due to downtime, script failure, etc. As such, our rotation scheme should simply take a set of dates and return a list of those to delete. No special marking of backups to retain longer should be required when they are made.
  3. No incremental backups.
  4. Keep as few backups as possible while still covering a month.

Existing Options

Taking a quick look at wikipedia's backup rotation scheme page, we find a few existing options:

First In, First Out is what we were using previously, and fails on point 1 above.

Grandfather-father-son accomplishes most of what I'm looking for, but the most obvious way to use it and keep a month's backups would require more storage than I'd like - 24 hourly backups, 7 daily, 4 weekly. I tried to figure how to use the Python GrandFatherSon module to keep a more sparse set, but couldn't quite figure out what it was doing.

Tower of Hanoi is formulated for use rotating removable media. I'm looking for something similar, but that fits better with point 2 above.

A New Scheme

I make backups every hour. Just after I make a backup, I gather the datestamps of all the backups I've got stored on s3 and pass them to the rotation function. It returns a set of datestamps of backups that should be deleted.

As configuration for the rotation function, select a set of retention targets, or approximate ages around which backups should be kept. I selected 1 day, 7 days and 28 days.

The rotation function partitions the set of existing backup dates it is passed into groups between retention targets. In my case, there is a group made between now and 1 day ago, a group between 1 and 7 days old, and a group between 7 and 28 days old. Then delete all but the oldest and newest backup within each such group. As a special case, delete all but the youngest backup older than the last retention target.

Results


The plot above shows the resulting distribution of backups over time, with the retention targets shown as the blue lines.

We cover a month with 8 backups.

Code

A script with the rotation function (using my retention targets) and a program to visualize the results with matplotlib are on github. CC0 license.

Gotchas

Initially I kept only the oldest backup in each group. This resulted in big gaps when a backup aged into a new group where it was deleted as an older one was still present, while the younger group now had no backup.

For similar reasons, it is important to keep one backup older than the oldest retention target.

No comments:

Post a Comment