Sort by File Size with du

2010-03-02

Here is a handy Perl script I wrote a while back to sort the output from du -sh by file size. The standard sort command can’t do this because it doesn’t know how to compare values like “488M” and “5.0K.” My code will sort any lines where this values appear in the first field. I’m sure the Perl could be more compressed, but keeping it easy to read like this is more my style:

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my @lines;

while (<>) {
  chomp;
  push @lines, [unabbrev($_), $_];
  # print "$_: ", unabbrev($_), "\n";
}

# print Dumper \@lines;

for my $line (reverse sort { return $a->[0] <=> $b->[0] } @lines) {
  print $line->[1], "\n";
}

sub unabbrev {
  my $val = shift;
  if ($val =~ m/^\s*(\d+(\.\d+)?)([KMGB]?)/) {
    if ($3 eq 'K') {
      $val = $1 * 1000;
    } elsif ($3 eq 'M') {
      $val = $1 * 1000000;
    } elsif ($3 eq 'G') {
      $val = $1 * 1000000000;
    } else { # B or nothing
      $val = $1;
    }
  }
  return $val;
}

It’d be fun to re-write this in ruby—or even better, add it as a feature to GNU sort.

UPDATE: Reading the documentation for GNU coreutils (which contains sort), I see that sort does have an -h option (for –human-numeric-sort). Strangely, this option is not documented in the man page on Ubuntu 9.10 and is unrecognized by /usr/bin/sort. I guess I’ve got an old version.

If anyone is looking for a small open source project, sort’s implementation of this option could still be improved. Right now, according to the online docs, “values with different precisions like 6000K and 5M will be sorted incorrectly.” It’d be great if it fully implemented the rules for block size used by other coreutils programs.

blog comments powered by Disqus Prev: Viewing Unit Test Output in Visual Studio Next: C# XmlTextReader Tutorial