Password Dictionary Variations

2010-01-31

I’ve been playing around with WPA this weekend, which (unlike WEP) requires a brute force attack against the passphrase. WPA actually supports passphrases up to 63 characters long, so if you use a random string of them sufficiently long, your network is essentially uncrackable (at least that way). But most people don’t bother with this, so brute force attacks are still feasible.

There are lots of dictionaries out there, but I couldn’t find any that includes variants, such as with 1337 spellings. It seems worthwhile to test these, too, so I wrote a quick perl script to generate such a list, given a dictionary as input. It also takes a simple text file defining which substitutes to make for each letter. This is a quick example I whipped up (leet.txt):

a A @ 4
b B 8 6
c C (
d D
e E 3
f F
g G
h H #
i I 1 ! | l
j J i
k K
l L 1 I ! |
m M
n N
o O 0 *
p P
q Q
r R
s S 5 $
t T 7 +
u U v V
v V \/ ^ u U
w W
x X %
y Y
z Z 2

Note that it supports multi-character substitutions, as under v.

The script is really pretty naive. A single eight-character word is going to generate at best 2^8 variations (256) or at worst 6^8 (1679616). Even my /usr/share/dict/words has 98569 lines, which would produce 165,558,069,504 separate combinations. A better dictionary would produce even more. I haven’t tried how long this would run, but it sounds bad. I’ve read that aircrack can test 50 to 300 words per second, so that many combinations would require at best 551,860,232 seconds, or 6388 days. And that’s just the eight-char passwords. You better go make a sandwich.

So a real implementation would discriminate about the possibilities and only emit likely variants, such as the original, the original lowercased, the original uppercased or camel-cased, the original with full leet spellings, with the most common leet substitutions, etc. Emitting everything just isn’t practical.

In addition, there are a few scripts you’d want to run ahead of mine. You’d want something that reads a dictionary file and combines shorter words to make two- and three-word strings. The naive version of this is simple permutation code, but again you’d want something smarter in practice. I’m not sure how to write something like that without AI.

You’d also want a script upstream that throws out everything too short or too long. WPA’s shortest passphrase requires eight characters, so you might as well not waste time on anything shorter. And if you’re planning on testing spelling variants, then anything beyond a certain length (say twelve? fifteen?) is going to get silly. With my script, eight is already silly!

But with those caveats, here is the script:

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $variants_file = "leet.txt";

my %variants = ();

open IN, "<$variants_file"	or die "Can't open $variants_file for reading: $!\n";
while (<IN>) {
	chomp;
	$_ =~ s/^\s+//;
	$_ =~ s/\s+$//;
	next unless $_;
	my @fields = split /\s+/, $_;
	my $letter = $fields[0];
	$variants{$letter} = \@fields;
}
close IN;

# print Dumper \%variants;

while (<>) {
	chomp;
	my $orig = lc $_;
	my $vs = get_variants($orig);
	for my $v (@$vs) {
		print "$v\n";
	}
}

sub get_variants {
	my $orig = shift;
	my $ret = [];
	my $len = length $orig;
	return $ret unless $len > 0;

	my $first_letter = substr($orig, 0, 1);
	my $vs = $variants{$first_letter} || [$first_letter,];

	if ($len > 1) {
		my $the_rest = substr($orig, 1);

		my $sub_variants = get_variants($the_rest);
		for my $letter (@$vs) {
			for my $stem (@$sub_variants) {
				push @$ret, "$letter$stem";
			}
		}
	} else {
		return $vs;
	}

	return $ret;
}

UPDATE: Since my worst-case scenario isn’t that realistic (1,679,616 variants per 8-char word), I thought I’d run my script on the whole dictionary and see how many variants I really got. I learned that some words in there are so long, they crash my script by soaking up all the memory (e.g. “electroencephalograph’s”). Oops. But if I limit the script to 8-10-char words (41598 total), I get this result:

$ egrep '^.{8,10}$' /usr/share/dict/words | ./pw-vary.pl | wc -l
3238454512

3,238,454,512: that’s not as bad as my worst-case estimate, even with the nine- and eight-char words, but it still seems pretty bad. To check them all, a fast computer would need 124 days.

UPDATE: I’ve reimplemented this password dictionary variator in C and removed the out-of-memory error condition on long words.

blog comments powered by Disqus Prev: C Version of Password Dictionary Variator Next: DLL Inspector

Illuminated Computing

Password Dictionary Variations

Paul A. Jungwirth

Code

Writing

Talks