Archive for the ‘linux’ Category

rsync and –exclude-from

Saturday, March 20th, 2010

Lately I’ve been looking for a better backup solution, and I settled on rsync. If you want a good tutorial, you can find one easily enough by googling. But I had a lot of trouble finding good documentation about the --exclude-from option. What I found was either too cursory or just plain wrong. So I did some tests, and here are my results. I’m pleased to say that rsync’s --exclude-from option has very full functionality!

This option names a file, as in --exclude-from=my-excludes.txt. That file contains a list of patterns, and if one of the patterns matches a given file or directory, rsync leaves that file/directory out of the backup.

These patterns may be full filenames, as in log, or they may use wildcards, as in *.bak. If a directory is excluded, then all the files within it are excluded, too. (This sounds obvious, but I’ve seen other programs with exclude files that just leave out the directory!)

You can even include bits of a path in your exclude file to further limit what gets excluded. For instance, if you say photos/thumb, then the thumb directory will be excluded whenever it appears inside a folder called photos. It will not be excluded if it appears in, say, body-parts. Also, rsync will still include your photos folders; just their thumbs folders will be left out.

Finally (and this is the part I was really interested in and found wrongly-documented elsewhere), you can include specific paths. You do this by starting the path with a slash: /root/tmp. This forward slash does not indicate a file at the root of your filesystem, like a normal unix path. Rather it tells rsync to anchor this path to the base of the top-level source folder.

Note that if you’re using this feature, you must use it consistently with how you specify your source folder. Remember that in rsync, if you name your source folder without a trailing slash, rsync copies that folder and everything in it, whereas if you give it a trailing slash, rsync copies just the folders’ contents. Well, anchored paths in the --exclude-from file must take account of this. Suppose your source folder is root. Then your excludes file would have a line like /root/tmp. If if your source folder is root/, then your excludes file needs a line like /tmp.

To put this all into an example, suppose you have this file structure and you want to exclude all the red files:


root/
     important.txt
     important.txt.bak
     mysql/
           ecom.db
           log/mysql.log
     photos/thumb/
     tmp/note.txt
     www/
         index.html
         log/www.log
         photos/thumb/
         tmp/index-ver2.html

In other words, you want to exclude:

  • All *.bak files.
  • All log directories.
  • All photo/thumb directories.
  • The top-level tmp directory.

Then you could either call rsync like this:


rsync -avz --exclude-from=excludes.txt root backups

with excludes.txt as follows:


*.bak
log
photos/thumb
/root/tmp

Or you could call rsync like this:


rsync -avz --exclude-from=excludes.txt root/ backups

with an excludes.txt like this:


*.bak
log
photos/thumb
/tmp

UPDATE: One more note: You might think that ending a line in the excludes file with a trailing slash would work analogously to naming a source directory with a trailing slash: “include this directory but not its contents.” That way if you were backing up your whole filesystem, you could have a line like /mnt/ that would create a /mnt directory but ignore all its contents. But in fact, rsync just seems to ignore trailing slashes in the excludes file. If you want to exclude everything in the /mnt directory, you need a line like this instead: /mnt/*.

Sort by File Size with du

Tuesday, March 2nd, 2010

Here is a handy Perl script I wrote a while back to sort the output from du -sh by file size. The standard sort command can’t do this because it doesn’t know how to compare values like “488M” and “5.0K.” My code will sort any lines where this values appear in the first field. I’m sure the Perl could be more compressed, but keeping it easy to read like this is more my style:


#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my @lines;

while (<>) {
        chomp;
        push @lines, [unabbrev($_), $_];
        # print "$_: ", unabbrev($_), "\n";
}

# print Dumper \@lines;

for my $line (reverse sort { return $a->[0] <=> $b->[0] } @lines) {
        print $line->[1], "\n";
}

sub unabbrev {
        my $val = shift;
        if ($val =~ m/^\s*(\d+(\.\d+)?)([KMGB]?)/) {
                if ($3 eq 'K') {
                        $val = $1 * 1000;
                } elsif ($3 eq 'M') {
                        $val = $1 * 1000000;
                } elsif ($3 eq 'G') {
                        $val = $1 * 1000000000;
                } else { # B or nothing
                        $val = $1;
                }
        }
        return $val;
}

It’d be fun to re-write this in ruby—or even better, add it as a feature to GNU sort.

UPDATE: Reading the documentation for GNU coreutils (which contains sort), I see that sort does have an -h option (for –human-numeric-sort). Strangely, this option is not documented in the man page on Ubuntu 9.10 and is unrecognized by /usr/bin/sort. I guess I’ve got an old version.

If anyone is looking for a small open source project, sort’s implementation of this option could still be improved. Right now, according to the online docs, “values with different precisions like 6000K and 5M will be sorted incorrectly.” It’d be great if it fully implemented the rules for block size used by other coreutils programs.

Another ruby rtouch

Tuesday, March 17th, 2009

Well, I decided it wasn’t really fair to criticize ruby for lacking a variable-time touch command when Perl doesn’t have one either. So I wrote a ruby version that uses utime just as the Perl version. Here it is:

#!/usr/bin/env ruby

# == Synopsis
#
# rtouch: recursively touch files.
#
# == Usage
#
# rtouch [OPTIONS] file [files...]
#
# -h, --help
#   show help
#
# -t, --time [[CC]YY]MMDDhhmm[.SS]
#   use the given time instead of the current time
#
# files: the files to create or touch.
# If directories, rtouch will update their time and everything within them.

require 'find'
require 'FileUtils'
require 'getoptlong'
require 'rdoc/usage'

def parse_time(tstr)
  if tstr =~ /^(\d\d\d\d|\d\d)?(\d\d)(\d\d)(\d\d)(\d\d)(\.(\d\d))?$/
    if $1
      if $1.length == 2
        year = $1.to_i + ((Time.new.year / 100).floor * 100)
      else
        year = $1
      end
    else
      year = Time.new.year
    end
    secs = $7 ? $7 : 0
    return Time.local(year, $2, $3, $4, $5, secs)
  else
    raise "bad time parameter"
  end
end

opts = GetoptLong.new(
            [ '--help', '-h', GetoptLong::NO_ARGUMENT ],
            [ '--time', '-t', GetoptLong::REQUIRED_ARGUMENT ]
           )

time = Time.new
begin
  opts.each do |opt, arg|
    case opt
    when '--help'
      RDoc::usage 0
    when '--time'
      time = parse_time arg
    end
  end
rescue Exception
  puts $!
  RDoc::usage 1
end

if not ARGV.length > 0
  RDoc::usage 1
end

ARGV.each do |dir|
  if File.exists? dir
    Find.find(dir) do |path|
      File::utime time, time, path
    end
  else
    FileUtils.touch dir
    File::utime time, time, dir
  end
end

Fancy rtouch

Monday, March 9th, 2009

Well, it turns out Ruby’s library functions for touch don’t let you specify a modification time; you can set the file to the current time only. I could just call out to the touch binary, but that wouldn’t be very portable. So I’m back to Perl. Here is the program with a -t [[CC]YY]MMDDhhmm[.SS] option, just like touch(1):

#!/usr/bin/perl -w
use strict;
use File::Find;
use Getopt::Std;
use Time::Local;

my $mtime;
my %opts;
getopts('ht:', \%opts);

if ($opts{h}) {
  usage();
  exit 0;
}

if ($opts{t}) {
  if ($opts{t} =~ m/(\d\d\d\d|\d\d)?(\d\d)(\d\d)(\d\d)(\d\d)(\.(\d\d))?/) {
    my @now = localtime;
    my $cent = $now[5] + 1900;
    my $secs = $now[0];
    if ($1) {
      if (length $1 > 2) {
        $cent = $1;
      } else {
        $cent = 100 * int($cent / 100) + $1;
      }
    }
    if ($7) {
      $secs = $7;
    }
    @now = ();
    $now[0] = $secs;		# seconds
    $now[1] = $5;		# minutes
    $now[2] = $4;		# hours
    $now[3] = $3;		# day of the month
    $now[4] = $2 - 1;		# month (0..11)
    $now[5] = $cent - 1900;	# years since 1900

    $mtime = timelocal(@now);
  } else {
    usage();
    exit 1;
  }
} else {
  $mtime = time;
}

for my $dir (@ARGV ? @ARGV : ('.')) {
  if (-e $dir) {
    find sub {
      utime $mtime, $mtime, $_;
    }, $dir;
  } else {
    open NOTHING, ">$dir";
    close NOTHING;
    utime $mtime, $mtime, $dir;
  }
}

sub usage {
  print "USAGE: $0 [-t [[CC]YY]MMDDhhmm[.SS]] [files...]\n";
}

I debated whether rtouch should create nonexistent files. The regular touch command creates any files that don’t exist. But since rtouch is recursive, I’m not sure creating files makes sense. But I figured it could still be convenient, so you could give it a bunch of arguments with the intent, “Touch all these files and everything in them, creating empty files whenever one doesn’t exist.”

(In case you haven’t guessed, this week is spring break!)

First Draft of rtouch in Ruby

Sunday, March 8th, 2009

Okay, here is the same thing, but in Ruby. Still no option-passing:

#!/usr/bin/ruby

require 'find'
require 'FileUtils'

dirs = (ARGV.length > 0 ? ARGV : ["."])

dirs.each do |dir|
  Find.find(dir) do |path|
    FileUtils.touch path
  end
end

rtouch

Sunday, March 8th, 2009

By the way . . .

Upgrading Wordpress is annoying! There was lots of “delete this folder–except for file x and folder y.” Because of how I organize things, at one point I found it useful to write a recursive touch script. Here it is:

#!/usr/bin/perl -w
use strict;
use File::Find;

my @dirs = @ARGV ? @ARGV : ('.');

find sub {
    system("touch", $_);
}, @dirs;

It’s pretty simple: for instance, it doesn’t pass along any options to the touch program. But I thought I’d put that off until I can rewrite it in ruby. This Perl version was just because I needed it done quick.