Lately I’ve been looking for a better backup solution, and I settled on rsync. If you want a good tutorial, you can find one easily enough by googling. But I had a lot of trouble finding good documentation about the --exclude-from
option. What I found was either too cursory or just plain wrong. So I did some tests, and here are my results. I’m pleased to say that rsync’s --exclude-from
option has very full functionality!
This option names a file, as in --exclude-from=my-excludes.txt
. That file contains a list of patterns, and if one of the patterns matches a given file or directory, rsync leaves that file/directory out of the backup.
These patterns may be full filenames, as in log
, or they may use wildcards, as in *.bak
. If a directory is excluded, then all the files within it are excluded, too. (This sounds obvious, but I’ve seen other programs with exclude files that just leave out the directory!)
You can even include bits of a path in your exclude file to further limit what gets excluded. For instance, if you say photos/thumb
, then the thumb
directory will be excluded whenever it appears inside a folder called photos
. It will not be excluded if it appears in, say, body-parts
. Also, rsync will still include your photos
folders; just their thumbs
folders will be left out.
Finally (and this is the part I was really interested in and found wrongly-documented elsewhere), you can include specific paths. You do this by starting the path with a slash: /root/tmp
. This forward slash does not indicate a file at the root of your filesystem, like a normal unix path. Rather it tells rsync to anchor this path to the base of the top-level source folder.
Note that if you’re using this feature, you must use it consistently with how you specify your source folder. Remember that in rsync, if you name your source folder without a trailing slash, rsync copies that folder and everything in it, whereas if you give it a trailing slash, rsync copies just the folders’ contents. Well, anchored paths in the --exclude-from
file must take account of this. Suppose your source folder is root
. Then your excludes file would have a line like /root/tmp
. If if your source folder is root/
, then your excludes file needs a line like /tmp
.
To put this all into an example, suppose you have this file structure and you want to exclude all the red files:
root/
important.txt
important.txt.bak
mysql/
ecom.db
log/mysql.log
photos/thumb/
tmp/note.txt
www/
index.html
log/www.log
photos/thumb/
tmp/index-ver2.html
In other words, you want to exclude: * All *.bak
files. * All log
directories. * All photo/thumb
directories. * The top-level tmp
directory.
Then you could either call rsync like this:
rsync -avz --exclude-from=excludes.txt root backups
with excludes.txt
as follows:
*.bak
log
photos/thumb
/root/tmp
Or you could call rsync like this:
rsync -avz --exclude-from=excludes.txt root/ backups
with an excludes.txt
like this:
*.bak
log
photos/thumb
/tmp
UPDATE: One more note: You might think that ending a line in the excludes file with a trailing slash would work analogously to naming a source directory with a trailing slash: “include this directory but not its contents.” That way if you were backing up your whole filesystem, you could have a line like /mnt/
that would create a /mnt
directory but ignore all its contents. But in fact, rsync just seems to ignore trailing slashes in the excludes file. If you want to exclude everything in the /mnt
directory, you need a line like this instead: /mnt/*
.