rsync and --exclude-from

2010-03-20

Lately I’ve been looking for a better backup solution, and I settled on rsync. If you want a good tutorial, you can find one easily enough by googling. But I had a lot of trouble finding good documentation about the --exclude-from option. What I found was either too cursory or just plain wrong. So I did some tests, and here are my results. I’m pleased to say that rsync’s --exclude-from option has very full functionality!

This option names a file, as in --exclude-from=my-excludes.txt. That file contains a list of patterns, and if one of the patterns matches a given file or directory, rsync leaves that file/directory out of the backup.

These patterns may be full filenames, as in log, or they may use wildcards, as in *.bak. If a directory is excluded, then all the files within it are excluded, too. (This sounds obvious, but I’ve seen other programs with exclude files that just leave out the directory!)

You can even include bits of a path in your exclude file to further limit what gets excluded. For instance, if you say photos/thumb, then the thumb directory will be excluded whenever it appears inside a folder called photos. It will not be excluded if it appears in, say, body-parts. Also, rsync will still include your photos folders; just their thumbs folders will be left out.

Finally (and this is the part I was really interested in and found wrongly-documented elsewhere), you can include specific paths. You do this by starting the path with a slash: /root/tmp. This forward slash does not indicate a file at the root of your filesystem, like a normal unix path. Rather it tells rsync to anchor this path to the base of the top-level source folder.

Note that if you’re using this feature, you must use it consistently with how you specify your source folder. Remember that in rsync, if you name your source folder without a trailing slash, rsync copies that folder and everything in it, whereas if you give it a trailing slash, rsync copies just the folders’ contents. Well, anchored paths in the --exclude-from file must take account of this. Suppose your source folder is root. Then your excludes file would have a line like /root/tmp. If if your source folder is root/, then your excludes file needs a line like /tmp.

To put this all into an example, suppose you have this file structure and you want to exclude all the red files:

root/
     important.txt
     important.txt.bak
     mysql/
           ecom.db
           log/mysql.log
     photos/thumb/
     tmp/note.txt
     www/
         index.html
         log/www.log
         photos/thumb/
         tmp/index-ver2.html

In other words, you want to exclude: * All *.bak files. * All log directories. * All photo/thumb directories. * The top-level tmp directory.

Then you could either call rsync like this:

rsync -avz --exclude-from=excludes.txt root backups

with excludes.txt as follows:

*.bak
log
photos/thumb
/root/tmp

Or you could call rsync like this:

rsync -avz --exclude-from=excludes.txt root/ backups

with an excludes.txt like this:

*.bak
log
photos/thumb
/tmp

UPDATE: One more note: You might think that ending a line in the excludes file with a trailing slash would work analogously to naming a source directory with a trailing slash: “include this directory but not its contents.” That way if you were backing up your whole filesystem, you could have a line like /mnt/ that would create a /mnt directory but ignore all its contents. But in fact, rsync just seems to ignore trailing slashes in the excludes file. If you want to exclude everything in the /mnt directory, you need a line like this instead: /mnt/*.

blog comments powered by Disqus Prev: Added Some Old Things Next: A Linq Catalog