A couple of weeks ago, I wrote a tutorial on how to use duplicity to make encrypted, incremental backups over various available protocols. The process was a manual one, though, and would always require you to type the actual command and wait for the run to finish. It also had you manually keep track of what you did back up and what you didn’t back up. It was basically a nice and easy way to back up several folders, more or less archiving them in case you ever need them again.
This week I’m going to share a script with you that I’ve used for a quite some time. It is based upon a script I found over at the Linode forums which was written to backup to Amazon S3. All I’ve done is strip some of the unnecessary stuff intended for S3 and changed some of the includes/excludes. The credits for the script thus goes to the original author. I’m going to explain the script to you, though, and let you know how you can tweak it to suit your needs.
There are several things you need for the script to work:
- A GnuPG key (see my previous tutorial on how to generate one)
- The password of your GnuPG key (you should know this)
- The public key ID of your GnuPG key (see my previous tutorial)
- A backup server accessible over SSH/SFTP
- Duplicity installed (see my previous tutorial)
I would really recommend using a dedicated GnuPG key for this script, meaning you don’t use the GnuPG key you use for this script for anything else. So when you’ve gathered all that, let’s get started with the script!
The Script
This is the full script. It may look long or complicated right now, but I’m going to go through it line by line. The script will work on various Linux distributions. I’ve tested in on Ubuntu and CentOS.
#!/bin/bash
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo “$stamp: $*” >> /var/log/backup.log
}# Export your GnuPG passphrase to an ENV variable so you don’t have to type it every time
export PASSPHRASE=<GnuPG passphrase># Identifier for your GnuPG key
GPG_KEY=<GnuPG key identifier># Backups older than this will be removed
OLDER_THAN=”6M”# The source of your backup, often a local directory
SOURCE=/# The destination (relative to the home directory of the user you’re logging in as)
DEST=”sftp://<path>”# Check if a full backup is necessary
FULL=
if [ $(date +%d) -eq 1 ]; then
FULL=full
fi;trace “Backup for local filesystem started”
trace “… removing old backups”
# Comment this line (and the one above to keep the log clean) to disable backup removal
duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1trace “… backing up filesystem”
# Full backup run
duplicity \
${FULL} \
–encrypt-key=${GPG_KEY} \
–sign-key=${GPG_KEY} \
–include=/etc \
–include=/home \
–include=/root \
–exclude=/var/tmp \
–include=/var \
–exclude=/** \
${SOURCE} ${DEST} >> /var/log/backup.log 2>&1# And we’re done
trace “Backup for local filesystem complete”
trace “————————————”# Reset the ENV variable
export PASSPHRASE=
It’s a bash script, meaning there’s little requirements but a recent shell. If you’re not running a distribution’s release that’s older than five years, you should definitely be fine. The first line indicates that it’s a bash script:
#!/bin/bash
The next line creates a function called ‘trace’. This function enables easy logging to a log file. It prepends every log line with a timestamp. The log file is in /var/log and it’s called ‘backup.log’. This function is used several times in the script, as you will notice.
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo “$stamp: $*” >> /var/log/backup.log
}
The next line exports your GnuPG passphrase as an environment variable, meaning it’s available to every application that runs under the same user as the script. We do this to make sure duplicity doesn’t ask for a password when it runs, enabling you to run this with cron (I’ll show you that in a bit).
# Export your GnuPG passphrase to an ENV variable so you don’t have to type it every time
export PASSPHRASE=<GnuPG passphrase>
Not to worry, we’ll reset environment variable when the script is done, so it’ll be out of the environment.
Next, we’ll assign your GnuPG key ID to a variable (this is the key duplicity will use for signing and encryption):
# Identifier for your GnuPG key
GPG_KEY=<GnuPG key identifier>
We set the threshold for backup removal (I’ll get back to this a couple of lines down) to 6 months. Feel free to change it to suit your needs:
# Backups older than this will be removed
OLDER_THAN=”6M”
Set the source of the backup, in this case the root directory of your server:
# The source of your backup, often a local directory
SOURCE=/
And the destination, which is a path relative to the home directory of the user you’re logging in as on the remote server:
# The destination (relative to the home directory of the user you’re logging in as)
DEST=”sftp://backups.example.net/server1backups”
And with the final variable we’ll determine whether we need a full backup or not:
# Check if a full backup is necessary
FULL=
if [ $(date +%d) -eq 1 ]; then
FULL=full
fi;
What the above does, is set the FULL variable to ‘full’ when it’s the first day of the month. This variable is used when creating the actual backup. If it’s set to ‘full’, duplicity makes a complete backup. Otherwise, it does an incremental backup. Leaving it like this means you get a full backup every first day of the month and incremental backups to that one on all the other days. You could also skip the three lines that set FULL to ‘full’ to never make a full backup but the first time and then just have incremental backups from that point on.
After having set all variables, the script starts by adding information to the log file:
trace “Backup for local filesystem started”
trace “… removing old backups”
This calls the trace function with the quoted text as an argument. That text is thne appended to the log.
Next, it’s removal time:
# Comment this line (and the one above to keep the log clean) to disable backup removal
duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1
Remember above, when we set the OLDER_THAN variable? This is where it is used. This command uses standard duplicity functionality to remove backups older than a number of X, where X can be months or weeks or even years. It does so at the destination you pass to it. If you do not want old backups to be removed, comment this line like this:
# Comment this line (and the one above to keep the log clean) to disable backup removal
# duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1
And it will skip the removal of backups.
Now it’s time for the actual backup process:
trace “… backing up filesystem”
# Full backup run
duplicity \
${FULL} \
–encrypt-key=${GPG_KEY} \
–sign-key=${GPG_KEY} \
–include=/etc \
–include=/home \
–include=/root \
–exclude=/var/tmp \
–include=/var \
–exclude=/** \
${SOURCE} ${DEST} >> /var/log/backup.log 2>&1
It starts with a trace to indicate the actual backup has started. Next, it’s on to the duplicity command. This is basically a very standard duplicity command but with a few additional parameters to include and exclude certain paths. The following line adds the value of the FULL variable to the command, which can be empty or ‘full’. In the case of ‘full’, duplicity does a full backup. Otherwise, it does an incremental one.
${FULL} \
The encrypt-key and sign-key parameters use the GPG_KEY you’ve set above. These are used to encrypt and sign the data.
The following couple of lines include and exclude several directories. The backslash (\) at the end is use to indicate the command continues on a new line.
–include=/etc \
–include=/home \
–include=/root \
–exclude=/var/tmp \
–include=/var \
–exclude=/** \
What we do here, is include the etc, home and root directories. We then exclude /var/temp but include var. With –exclude=/** we exclude everything else.
A couple of things: if you want to exclude a subdirectory of a directory you are including, you have to do it before including the directory the subdirectory is in. Otherwise it will back it up anyway. The same logic is applied the other way around: if you want to include a subdirectory of a directory you are excluding, you have to do it before excluding the directory the subdirectory is in. You also have to include everything you do want to back up before you are excluding everything else (–exclude=/**). A different approach could be that you exclude a couple of directories like /tmp and /var/tmp (or even /var/cache) and then just include everything else (–include=/**) . That would look like this:
–include=/tmp/secretfiles \
–exclude=/tmp \
–exclude=/var/cache \
–exclude=/var/tmp \
–include=/** \
The above includes everything except /tmp (it does include /tmp/secretfiles), /var/cache and /var/tmp.
Feel free to tweak the includes and excludes to suit your needs and be sure to check the result of the backup before you go to sleep ;-) The log file in /var/log/backup can help you with that.
The backup command ends with:
${SOURCE} ${DEST} >> /var/log/backup.log 2>&1
Which uses the SOURCE and DEST variables defined before as the source and destination for the duplicity command. It then outputs everything that comes from the command to the log file.
Finally, we end the backup script by writing something to the log and resetting the ENV variable with your GnuPG passphrase:
# And we’re done
trace “Backup for local filesystem complete”
trace “————————————”# Reset the ENV variable
export PASSPHRASE=
After having done this, you GnuPG passphrase is no longer “out there”.
So, that was the script! Not all that hard if you think about it. But let’s get it to run automatically!
Getting it up and running
Now we’ve got the script all finished, let’s add it to cron. Cron is a time-based job scheduler. You can let it run whatever command you want at whatever time (or time interval) you want. We’re going to make this backup run every night at 01.00 (1 AM).
First, make sure the file is on the server. I always prefer to put these in /opt, but in this case /root is fine as well. Since the script contains the password for the GnuPG key, it’s better to not have it visible to other users. /root is protected from outsiders by default, so I’m going to use that in my example. Upload the script to /root and name it ‘duplicity-backup’ (without an extension).
Then, on either CentOS or Ubuntu, run (only use sudo if you’re not logged in a root):
(sudo) crontab -e
This will open up a text editor with the crontab for root. Crontab contains the lines of the jobs cron has to execute. There are other ways to add cron jobs, but for now this is easiest. Add the following line at the end of the file:
0 1 * * * /root/duplicity-backup
The first five characters are to indicate when the job should be run. The first number is the minute (0), the second number is the hour (1), the third the day of the month (* for any), the fourth the month (* for any) and the fifth the year (* for any). It then ends with the command to run. This job will thus run any year, in any month, on any day at 01.00 (1 AM). Now save the file and you’re all set!
From now on, your backups should work. It would be good to check it the first couple of days, see if it all runs well before relying on them.
Final notes
There are other ways to handle backups. For instance, you could make a python script that does the above and more (like dump databases, package certain files). Using duplicity, though, is something that I can really recommend for whatever language you decide to write your backup script in. I can also recommend to backup the GnuPG key to a secure location (a USB stick in a vault or something), as that key is the key to your backups.
Happy back-upping!
Related Posts:
- How to Rapidly Install Java, OpenJDK & Oracle JDK on your VPS - December 14, 2015
- It’s been a great ride - December 14, 2015
- Cheap Windows VPS – $21/quarter 1GB KVM-based Windows VPS in 11 worldwide locations - November 30, 2015
It’s a nice tutorial but seriously, why not just use duply as the frontend.
I am absolutely on your side. why reinvent the wheel if duply does `just that`. duply is just one shell script which can be dropped in your PATH somewhere. after configuration, deploy a `duply backup` in /etc/cron.daily and `duply backup_verify_purge` in /etc/cron.weekly and you are done.
Well, it’s always good to see alternatives so kudos for that. People need options and some people like tinkering.
Duply is indeed a good alternative. I personally wanted to show this approach to people since it’s quite a simple way to have full control over things.
Linking to previous threads needed for this would help imo
You can see all of mpkossen’s articles now, by clicking tutorials at the top or http://www.lowendbox.com/tag/tutorials-2/
There is a link in the first sentence :-)
Keep in mind that with restoring and verifying the backups, when you have used the same GPG key to create the backup, the actual signing key used is different from the encryption key. Based on the previous article, you end up with an error like this:
because:
The backup is still valid, but this will throw you off a bit, as you’ll have to specify the sign key separately on restore/verify.
Side notes:
– There’s a parameter ‘–encrypt-sign-key’ which is simply ‘–encrypt-key’ and ‘–sign-key’ rolled in one. Save those bytes! ;)
– If you specify an incremental backup (‘inc’) *and* you specify ‘–full-if-older-than’, then Duplicity will perform a full backup if there isn’t one yet (and subsequently, delete full backups older than the specified age [unless incremental backups that were made after that date rely upon that full backup]). This is handy for scripting too, as you don’t need to have a full backup in place before you add an incremental backups to the cron.
Why not just use a normal backup solution like http://www.backupthat.com instead of all that backend nonsense?
cause backupthat sucks and is totally unrelated for what this article is trying to accomplish