Decline in Inverse and Leveraged ETFs

February 5th, 2010

So this post isn’t about programming, but it has a lot of math, and it’s about understanding how something works. Lately my investing has used some ETFs, but I’ve read that if your ETF is inverse or leverage, then it gradually loses money over time—simply because of the mathematics. I wanted to investigate just how this worked.

First, a bit of background: an ETF is an Exchange-Traded Fund, a pretty new thing. It’s not a stock, but a type of derivative used to track something else, such as a sector of stocks. In this regard it’s a bit like an index, but more flexible. So you could have an ETF tracking the performance of pharmaceutical stocks, or financial stocks, or the S&P 500. They are a little different from index-based mutual funds. You can trade them any time during the day, and they don’t have the high minimum purchases required by some mutual funds. On the other hand, you pay a commission, unlike a non-load mutual fund. (In some cases this may not be true.)

But the really interesting thing about them is that they can be leveraged or inverse. A leveraged fund aims to achieve some multiple of the daily change of the underlying index. So while the SPY fund tracks the S&P 500, you can get 2x the S&P 500 (both up and down) with the SSO fund. This greater volatility gives you the potential for more profit with each trade, diminishing the relative size of the commission. It’s a bit like you invested twice as many dollars as you actually have.

An inverse fund is a leveraged fund with a negative multiplier. You can get funds at -1x, -2x, or I think even -3x. This essentially lets you short the market, something difficult for retail investors to manage. Also, unlike with shorting, you don’t risk an unlimited amount of money. You only risk the money you use to buy the ETF.

The fly in the ointment is that ETFs, when leveraged or inverse, gradually decline in value. Let’s say we have an index that start at 100, declines to 80 (-20%), then goes back up to 100 (+25%). Now let’s say we were tracking it with four ETFs, a 1x, a -1x, a 2x, and a -2x. They all start at 40. Here are the results:

Fund t0 t1 t2
Index 100 80 100
1x ETF 40 32 40
-1x ETF 40 48 36
2x ETF 40 24 36
-2x ETF 40 56 28

As you can see, all our ETFs lost money, except the 1x version. Conspicuously, the -1x and 2x ETF lost the same amount, but the -2x ETF lost more. If we had reversed the order of changes (100 to 125 to 100), we would see the same results.

So I set out to analyze the mathematics behind all this. Let’s start with some equations describing our targeted fund, the index. I’ll call its initial value T. Our intermediate value will be T’. Our final value will be T”—but of course this is equal to T. Let’s call the first percent change d, and the second percent change d’. That gives us these equations:

T’ = T + dT
T = T” = T’ + d’T’

If we substitute T for T’ and solve for d’, we get:

T = (T + dT) + d’(T + dT)
0 = dT + d’(T + dt)
-dT / (T + dT) = d’
d’ = -d / (1 + d)

Now let’s write out the equations for our derived fund, D (the ETF). We’ll use D, D’, D” to correspond with T, T’, T”. The multiplier of our ETF will be f. So in a 1x ETF, f = 1, but in a -2x ETF, f = -2. That gives us these equations:

D’ = D + fdD
D” = D’ + fd’D’

Substituting, we get:

D” = (D + fdD) + f * (-d/(1+d)) * (D + fdD)

So far so good, but I’m really interested in the percent lost for each up-and-down cycle. The absolute loss is D – D”, so the percent loss is (D – D”) / D. Let’s call this L. That gives us:

L = (D – D”) / D
L = (D – ((D + fdD) + f * (-d/(1+d)) * (D + fdD))) / D
L = (D – D – fdD – f * (-d/(1+d)) * (D + fdD)) / D
L = -fd – f * (-d/(1+d) * (1 + fd)
L = fd/(1+d) * (1 + fd) – fd
L = fd(1+fd) / (1+d) – fd(1+d) / (1+d)
L = (fd(1+fd) – fd(1+d)) / (1+d)
L = fd(1+fd-1-d) / (1+d)
L = fd(fd-d) / (1+d)
L = fd2(f-1) / (1+d)
L = f(f-1) * d2 / (1+d)

That final equation tells us that the loss is proportional to f and d. Well, d is kind of obvious: the more our underlying index changes, the more the ETF will change. But f is quite interesting. The more leveraged we are, the more we lose on each cycle. Although neither factor is as simple to write out as we might like, consider this. f(f-1) is almost f*f, or f2:

f*(f-1) < f*f
f*(f-1) < f2

The effect of f is almost a square. We could imagine this as f1.9, although that isn’t quite right.

At first glance, it seems we could make a similar simplification with d, since d2/(d+1) is a little less than d2/d:

d2/(d+1) < d2/d
d2/(d+1) < d

So d’s effect would be a little less than d: sort of like d0.9. But this is misleading, because d2/(d+1) only approximates d for large values, and our d will (probably) never go above 1. In fact it’s only interesting for values like 0.01 or 0.1 or maybe (gulp) 0.5. At that range, we get values like:

d d2/(d+1)
0.01 0.00009
0.02 0.00039
0.03 0.00087
0.04 0.00153
0.05 0.00238
0.10 0.00909
0.20 0.03333
0.30 0.06923
0.40 0.11428
0.50 0.16666

The graph looks like this:

Or at close range, like this:

All this means that comparatively speaking, f has a greater effect than d.

We could chart f’s effect for different levels of leverage:

f f(f-1)
1 0
2 2
3 6
4 12
5 20
6 30
-1 2
-2 6
-3 12
-4 20
-5 30

In other words, each “step” up in leverage increases your loss, and going inverse counts as one additional “step.”

By multiplying these values with those from d’s table, you can see your loss for each cycle. For a 2x or -1x ETF, you lose about 2% of your investment for each 10% cycle, or 0.02% for each 1% cycle. For a -2x ETF, you lose about 6% for a 10% cycle or 0.06% for a 1% cycle. Best not to stay in these investments for too long!

The next question is: how long is right? I guess you could look at the last few years to count the number of cycles, and try to figure out an ETF’s theoretical decline if the market had no net change. But the conventional wisdom answer seems to be about a week, and this sounds right to me. You’re really betting on the direction of the next move, and if you get that wrong, it’s going to cost you. So to make money with these ETFs, you need to get not just the direction right, but the timing, too. It’s hard enough just to get the direction! This need to be right about timing makes them risky for the same reason options are risky: your bet only has so long to play out.

Now here is another thought. Take a look at a graph of f(f-1):

As you can see, between 0 and 1, this graph dips below 0. That means that if you had a x0.5 ETF, say, you would actually make money over time. I wonder if there are practical reasons to prevent this, or if the return is just too small, so you’d do better putting your money in a bank. Anyway, it’s an interesting thought. . . .

Variation on C Password Dictionary Variator

February 3rd, 2010

So I was thinking, since my password app generates too many possibilities, what if we limited its output so that any given letter was translated into the same variant each time it appears in the word? In other words, if your word was “aa” and your leet.txt file had “a A @,” then you’d get just “aa,” “AA,” and “@@,” not “aA,” “a@,” “Aa,” etc. So I made that change, and it seems so reasonable, I made it the default. To get all combinations, use the “-a” option. Here is the code:


/**
 * pw-vary.c - implements pw-vary, a program to obfuscate passwords by replacing letters with leet.
 *
 * Makes a lot of assumptions about limits. Also assumes ASCII strings. Pretty lazy!
 *
 * Copyright (c) 2009 by Paul A Jungwirth
 */
#include
#include
#include

#define LINE_LEN		1024
#define MAX_VARIANTS	20
#define LONGEST_RESULT	4092

#define is_whitespace(c) (c == ' ' || c == '\t' || c == '\n' || c == '\r')
#define set_or_die(v, s) if (!v) { perror(s); exit(EXIT_FAILURE); }
#define unset_or_die(v, s) if (v) { perror(s); exit(EXIT_FAILURE); }

#ifdef DEBUG
#define DEBUGGING(s) s;
#else
#define DEBUGGING(s)
#endif

typedef struct t_variant {
	char letter;
	char *variants[MAX_VARIANTS];
	int count;
	char line[LINE_LEN];
	int maxlen;
} t_variant;

static t_variant *init_variant(char letter) {
	t_variant* ret;

	ret = (t_variant*)malloc(sizeof(t_variant));
	set_or_die(ret, NULL);

	ret->letter = letter;
	ret->count = 1;
	ret->maxlen = 1;
	ret->variants[0] = ret->line;

	return ret;
}

static char *find_line_start(char *line) {
	int i = 0;
	char c;

	c = line[i];
	while (c) {
		if (!is_whitespace(c)) return &line[i];
		c = line[i];
		i++;
	}

	return NULL;
}

static void read_leet(t_variant **variants, FILE *f) {
	char line[LINE_LEN];
	char *line_start;
	int len;
	int i, j;
	char first_letter, c;
	int in_whitespace;
	t_variant *vs;

	while (fgets(line, LINE_LEN, f)) {
		// strip leading white space and check for contents
		line_start = find_line_start(line);
		if (!line_start) continue;

		// Create a new entry in variants for the letter.
		first_letter = line_start[0];
		variants[first_letter] = init_variant(first_letter);
		vs = variants[first_letter];

		// List all the possible variants.
		strncpy(vs->line, line_start, LINE_LEN);
		line_start = vs->line;
		len = strlen(line_start);
		i = j = 1;
		in_whitespace = 1;
		while (i < len && j variants[j] = &line_start[i];
					vs->count++;
					in_whitespace = 0;
					j++;
				}
			}
			i++;
		}

		// find the longest variant for this letter:
		for (i = 0; vs->variants[i]; i++) {
			len = strlen(vs->variants[i]);
			if (len > vs->maxlen) vs->maxlen = len;
		}
	}
	unset_or_die(ferror(f), "reading leet file");
}

/*@unused@*/
static void print_variants(t_variant **variants) {
	int i, j;

	for (i = 0; i variants[j]) {
				puts(variants[i]->variants[j]);
				j++;
			}
			puts("====");
		}
	}
}

/**
 * do_word_all and do_word_selective:
 * Each iterates through all the variant combinations for the given word.
 *
 * do_word_all does this with two arrays, each the same length as the
 * original string:
 *   - maxes holds the number of variants for each letter in the string.
 *   - indices holds the current variant number to be printed. We keep
 *     incrementing this array, "carrying" when necessary, just like
 *     counting.
 * This algorithm tests all possible combinations.
 *
 * do_word_selective causes a slightly more complex variation:
 * instead of testing all combinations, we assume that all instances of a
 * given letter use written with the same variant, cutting down on the
 * number of combinations to print. In this version, we rely on three arrays:
 *   - maxes & indices are only as long as the number of *unique* letters.
 *     We use them for "counting" just as before.
 *   - poses has one entry for each char in the string, and it stores
 *     an int for indexing into the two other arrays.
 */

static void do_word_selective(char *buffer, t_variant **variants, char *orig) {
	char c;
	char *end;
	int i, j, k;
	int len, newlen, len2, maxes_len;
	int *maxes;
	int *indices;
	int *poses;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes & indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);
		poses = (int*)calloc(len, sizeof(int));
		set_or_die(poses, NULL);

		newlen = 1; // start with 1 for the .
		j = 0;
		for (i = 0; i count;
				} else {
					maxes[j] = 1;
				}
				poses[i] = j;
				j++;
			} else {
				// point wherever the first letter points
				poses[i] = poses[k];
			}
			newlen += maxes[poses[i]];
		}
		maxes_len = j;

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen >= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j >= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[poses[i]]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = maxes_len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 && j-- > 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
		free(poses);
	}
}

static void do_word_all(char *buffer, t_variant **variants, char *orig) {
	char *end;
	int i, j;
	int len, newlen, len2;
	int *maxes;
	int *indices;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes & indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);

		newlen = 1; // start with 1 for the .
		for (i = 0; i count;
				newlen += vs->maxlen;
			} else {
				maxes[i] = 1;
				newlen++;
			}
		}

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen >= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j >= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[i]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 && j-- > 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
	}
}

static void do_all_words(t_variant **variants, FILE *f, int do_all) {
	char line[1024];
	char buffer[LONGEST_RESULT];

	while (fgets(line, 1024, f)) {
		if (do_all) do_word_all(buffer, variants, line);
		else do_word_selective(buffer, variants, line);
	}
	unset_or_die(ferror(f), "reading file");
}

int main(int argc, char **argv) {
	t_variant *variants[255];		/* 255 pointers, each to an array of 20 pointers to strings. */
	FILE *f;
	int i;
	int arg_start, do_all;

	// Argument processing:
	if (argc > 1 && (strcmp(argv[1], "-a") == 0)) {
		do_all = 1;
		arg_start = 2;
	} else {
		do_all = 0;
		arg_start = 1;
	}

	bzero(variants, sizeof(variants));

	f = fopen("leet.txt", "r");
	set_or_die(f, "opening leet file");
	read_leet(variants, f);
	unset_or_die(fclose(f), "closing leet file");
	DEBUGGING(print_variants(variants));

	// read words from argv or, if no args, from stdin
	if (argc > arg_start) {
		for (i = arg_start; i < argc; i++) {
			f = fopen(argv[i], "r");
			set_or_die(f, argv[i]);
			do_all_words(variants, f, do_all);
			unset_or_die(fclose(f), argv[i]);
		}
	} else {
		do_all_words(variants, stdin, do_all);
	}

	return EXIT_SUCCESS;
}

C Version of Password Dictionary Variator

February 3rd, 2010

I wrote a program to expand password dictionaries the other day, but due to its recursive algorithm it choked on words greater than 11 characters long, such as “electroencephalograph’s,” the longest word in my /usr/share/dict/words. So now I’ve tweaked the algorithm and also reimplemented the whole thing in C. It’s not real pretty C, but I tried to aim more for performance. It doesn’t crash on “electroencephalograph’s” anymore, but generating just the combinations for that one word (all 61,917,364,224 of them) took 43,499 seconds of CPU time (42653.61 user, 846.95 system). And that’s not even the time-consuming part! Just imagine running that many combinations through aircrack, all to test a single word.

Anyway, here is the code:


/**
 * pw-vary.c - implements pw-vary, a program to obfuscate passwords by replacing letters with leet.
 *
 * Makes a lot of assumptions about limits. Also assumes ASCII strings. Pretty lazy!
 *
 * Copyright (c) 2009 by Paul A Jungwirth
 */
#include
#include
#include

#define LINE_LEN	1024
#define MAX_VARIANTS	20
#define LONGEST_RESULT	4092

#define is_whitespace(c) (c == ' ' || c == '\t' || c == '\n' || c == '\r')
#define set_or_die(v, s) if (!v) { perror(s); exit(EXIT_FAILURE); }
#define unset_or_die(v, s) if (v) { perror(s); exit(EXIT_FAILURE); }

#ifdef DEBUG
#define DEBUGGING(s) s;
#else
#define DEBUGGING(s)
#endif

typedef struct t_variant {
	char letter;
	char *variants[MAX_VARIANTS];
	int count;
	char line[LINE_LEN];
	int maxlen;
} t_variant;

static t_variant *init_variant(char letter) {
	t_variant* ret;

	ret = (t_variant*)malloc(sizeof(t_variant));
	set_or_die(ret, NULL);

	ret->letter = letter;
	ret->count = 1;
	ret->maxlen = 1;
	ret->variants[0] = ret->line;

	return ret;
}

static char *find_line_start(char *line) {
	int i = 0;
	char c;

	c = line[i];
	while (c) {
		if (!is_whitespace(c)) return &line[i];
		c = line[i];
		i++;
	}

	return NULL;
}

static void read_leet(t_variant **variants, FILE *f) {
	char line[LINE_LEN];
	char *line_start;
	int len;
	int i, j;
	char first_letter, c;
	int in_whitespace;
	t_variant *vs;

	while (fgets(line, LINE_LEN, f)) {
		// strip leading white space and check for contents
		line_start = find_line_start(line);
		if (!line_start) continue;

		// Create a new entry in variants for the letter.
		first_letter = line_start[0];
		variants[first_letter] = init_variant(first_letter);
		vs = variants[first_letter];

		// List all the possible variants.
		strncpy(vs->line, line_start, LINE_LEN);
		line_start = vs->line;
		len = strlen(line_start);
		i = j = 1;
		in_whitespace = 1;
		while (i < len && j variants[j] = &line_start[i];
					vs->count++;
					in_whitespace = 0;
					j++;
				}
			}
			i++;
		}

		// find the longest variant for this letter:
		for (i = 0; vs->variants[i]; i++) {
			len = strlen(vs->variants[i]);
			if (len > vs->maxlen) vs->maxlen = len;
		}
	}
	unset_or_die(ferror(f), "reading leet file");
}

/*@unused@*/
static void print_variants(t_variant **variants) {
	int i, j;

	for (i = 0; i variants[j]) {
				puts(variants[i]->variants[j]);
				j++;
			}
			puts("====");
		}
	}
}

static void do_word(char *buffer, t_variant **variants, char *orig) {
	char *end;
	int i, j;
	int len, newlen, len2;
	int *maxes;
	int *indices;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes & indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);

		newlen = 1; // start with 1 for the .
		for (i = 0; i count;
				newlen += vs->maxlen;
			} else {
				maxes[i] = 1;
				newlen++;
			}
		}

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen >= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j >= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[i]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 && j-- > 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
	}
}

static void do_all_words(t_variant **variants, FILE *f) {
	char line[1024];
	char buffer[LONGEST_RESULT];

	while (fgets(line, 1024, f)) {
		do_word(buffer, variants, line);
	}
	unset_or_die(ferror(f), "reading file");
}

int main(int argc, char **argv) {
	t_variant *variants[255];		/* 255 pointers, each to an array of 20 pointers to strings. */
	FILE *f;
	int i;

	bzero(variants, sizeof(variants));

	f = fopen("leet.txt", "r");
	set_or_die(f, "opening leet file");
	read_leet(variants, f);
	unset_or_die(fclose(f), "closing leet file");
	DEBUGGING(print_variants(variants));

	// read words from argv or, if no args, from stdin
	if (argc > 1) {
		for (i = 1; i < argc; i++) {
			f = fopen(argv[i], "r");
			set_or_die(f, argv[i]);
			do_all_words(variants, f);
			unset_or_die(fclose(f), argv[i]);
		}
	} else {
		do_all_words(variants, stdin);
	}

	return EXIT_SUCCESS;
}

Password Dictionary Variations

January 31st, 2010

I’ve been playing around with WPA this weekend, which (unlike WEP) requires a brute force attack against the passphrase. WPA actually supports passphrases up to 63 characters long, so if you use a random string of them sufficiently long, your network is essentially uncrackable (at least that way). But most people don’t bother with this, so brute force attacks are still feasible.

There are lots of dictionaries out there, but I couldn’t find any that includes variants, such as with 1337 spellings. It seems worthwhile to test these, too, so I wrote a quick perl script to generate such a list, given a dictionary as input. It also takes a simple text file defining which substitutes to make for each letter. This is a quick example I whipped up (leet.txt):


a A @ 4
b B 8 6
c C (
d D
e E 3
f F
g G
h H #
i I 1 ! | l
j J i
k K
l L 1 I ! |
m M
n N
o O 0 *
p P
q Q
r R
s S 5 $
t T 7 +
u U v V
v V \/ ^ u U
w W
x X %
y Y
z Z 2

Note that it supports multi-character substitutions, as under v.

The script is really pretty naive. A single eight-character word is going to generate at best 2^8 variations (256) or at worst 6^8 (1679616). Even my /usr/share/dict/words has 98569 lines, which would produce 165,558,069,504 separate combinations. A better dictionary would produce even more. I haven’t tried how long this would run, but it sounds bad. I’ve read that aircrack can test 50 to 300 words per second, so that many combinations would require at best 551,860,232 seconds, or 6388 days. And that’s just the eight-char passwords. You better go make a sandwich.

So a real implementation would discriminate about the possibilities and only emit likely variants, such as the original, the original lowercased, the original uppercased or camel-cased, the original with full leet spellings, with the most common leet substitutions, etc. Emitting everything just isn’t practical.

In addition, there are a few scripts you’d want to run ahead of mine. You’d want something that reads a dictionary file and combines shorter words to make two- and three-word strings. The naive version of this is simple permutation code, but again you’d want something smarter in practice. I’m not sure how to write something like that without AI.

You’d also want a script upstream that throws out everything too short or too long. WPA’s shortest passphrase requires eight characters, so you might as well not waste time on anything shorter. And if you’re planning on testing spelling variants, then anything beyond a certain length (say twelve? fifteen?) is going to get silly. With my script, eight is already silly!

But with those caveats, here is the script:


#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $variants_file = "leet.txt";

my %variants = ();

open IN, "<$variants_file"	or die "Can't open $variants_file for reading: $!\n";
while (<IN>) {
	chomp;
	$_ =~ s/^\s+//;
	$_ =~ s/\s+$//;
	next unless $_;
	my @fields = split /\s+/, $_;
	my $letter = $fields[0];
	$variants{$letter} = \@fields;
}
close IN;

# print Dumper \%variants;

while (<>) {
	chomp;
	my $orig = lc $_;
	my $vs = get_variants($orig);
	for my $v (@$vs) {
		print "$v\n";
	}
}

sub get_variants {
	my $orig = shift;
	my $ret = [];
	my $len = length $orig;
	return $ret unless $len > 0;

	my $first_letter = substr($orig, 0, 1);
	my $vs = $variants{$first_letter} || [$first_letter,];

	if ($len > 1) {
		my $the_rest = substr($orig, 1);

		my $sub_variants = get_variants($the_rest);
		for my $letter (@$vs) {
			for my $stem (@$sub_variants) {
				push @$ret, "$letter$stem";
			}
		}
	} else {
		return $vs;
	}

	return $ret;
}

UPDATE: Since my worst-case scenario isn’t that realistic (1,679,616 variants per 8-char word), I thought I’d run my script on the whole dictionary and see how many variants I really got. I learned that some words in there are so long, they crash my script by soaking up all the memory (e.g. “electroencephalograph’s”). Oops. But if I limit the script to 8-10-char words (41598 total), I get this result:


$ egrep '^.{8,10}$' /usr/share/dict/words | ./pw-vary.pl | wc -l
3238454512

3,238,454,512: that’s not as bad as my worst-case estimate, even with the nine- and eight-char words, but it still seems pretty bad. To check them all, a fast computer would need 124 days.

UPDATE: I’ve reimplemented this password dictionary variator in C and removed the out-of-memory error condition on long words.

DLL Inspector

January 31st, 2010

Since I’ve been experimenting with 32- and 64-bit DLLs, along with the different /clr compiler options, I decided to write a quick utility to help me see how they differ. It has a small GUI:

I used this utility to build a bunch of dlls with various compiler options. I’ll post about that shortly. Here is the code for DllInfo, if anyone is interested. There is still some work to do, but what I’ve got so far served my purpose well enough:


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Reflection;
using System.Runtime.InteropServices;
using System.IO;

// TODO: About DllInfo...
// better display in currentFileBox

// Based on Kris Stanton's code at http://blogs.msdn.com/kstanton/archive/2004/03/31/105060.aspx

namespace DllInfo
{
    public enum MachineType
    {
        Native = 0,
        I386 = 0x014c,
        Itanium = 0x0200,
        x64 = 0x8664
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct IMAGE_DOS_HEADER
    {
        [FieldOffset(60)]
        public int e_lfanew;    // byte offset to the NT header
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct IMAGE_NT_HEADERS32
    {
        [FieldOffset(0)]
        public uint Signature;
        [FieldOffset(4)]
        public IMAGE_FILE_HEADER FileHeader;
        [FieldOffset(24)]
        public IMAGE_OPTIONAL_HEADER32 OptionalHeader;
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct IMAGE_NT_HEADERS64
    {
        [FieldOffset(0)]
        public uint Signature;
        [FieldOffset(4)]
        public IMAGE_FILE_HEADER FileHeader;
        [FieldOffset(24)]
        public IMAGE_OPTIONAL_HEADER64 OptionalHeader;
    }

    public struct IMAGE_FILE_HEADER {
        public ushort Machine;
        public ushort NumberOfSections;
        public ulong TimeDateStamp;
        public ulong PointerToSymbolTable;
        public ulong NumberOfSymbols;
        public ushort SizeOfOptionalHeader;
        public ushort Characteristics;
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct IMAGE_OPTIONAL_HEADER32
    {
        [FieldOffset(0)]
        public ushort Magic;
        [FieldOffset(208)]
        public IMAGE_DATA_DIRECTORY DataDirectory;
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct IMAGE_OPTIONAL_HEADER64
    {
        [FieldOffset(0)]
        public ushort Magic;
        [FieldOffset(224)]
        public IMAGE_DATA_DIRECTORY DataDirectory;
    }

    public struct IMAGE_DATA_DIRECTORY
    {
        public uint VirtualAddress;
        public uint Size;
    }

    public partial class DllInfoForm : Form
    {
        String currentDll = null;
        bool validDll = false;
        String machineType = null;
        bool isManaged = false;
        int moduleCount;
        bool hasAssembly;

        public DllInfoForm()
        {
            InitializeComponent();
            infoPanel.Paint += new PaintEventHandler(panelOnPaint);
        }

        void panelOnPaint(object obj, PaintEventArgs pea)
        {
            Graphics g = pea.Graphics;
            Brush brush = new SolidBrush(ForeColor);

            g.DrawString("valid?: ", Font, brush, 0, 0);
            g.DrawString(currentDll == null ? "" : validDll.ToString(), Font, brush, 100, 0);
            g.DrawString("machine: ", Font, brush, 0, 20);
            g.DrawString(machineType, Font, brush, 100, 20);
            g.DrawString("managed?: ", Font, brush, 0, 40);
            g.DrawString(validDll ? isManaged.ToString() : "", Font, brush, 100, 40);
            g.DrawString("has assembly?: ", Font, brush, 0, 60);
            g.DrawString(validDll ? hasAssembly.ToString() : "", Font, brush, 100, 60);
            g.DrawString("module count: ", Font, brush, 0, 80);
            g.DrawString(validDll ? moduleCount.ToString() : "", Font, brush, 100, 80);
            // TODO: is it IL or binary?
        }

        void updateDisplay() {
            if (currentDll == null)
            {
                currentFileBox.Text = "";
            }
            else
            {
                currentFileBox.Text = currentDll;
            }
            infoPanel.Invalidate();
        }

        private void chooseFile(object sender, EventArgs ea)
        {
            OpenFileDialog d = new OpenFileDialog();
            d.Filter = "dll and exe files (*.dll, *.exe)|*.dll;*.exe|All files (*.*)|*.*";
            d.Title = "Select a dll";
            if (d.ShowDialog() == DialogResult.OK)
            {
                currentDll = d.FileName;
                try
                {
                    loadDll(currentDll);
                    updateDisplay();
                }
                catch (BadImageFormatException)
                {
                    MessageBox.Show("bad image format exception");
                }
            }
        }

        public void loadDll(String dll)
        {
            byte[] data = new byte[4098];
            FileInfo fi = new FileInfo(dll);
            FileStream f;
            int n;
            ushort m;

            try
            {
                f = fi.Open(FileMode.Open, FileAccess.Read);
                // TODO: better to catch file-opening exceptions and show the user the problem.
            }
            catch
            {
                validDll = false;
                return;
            }

            n = f.Read(data, 0, 4096);
            f.Flush();
            f.Close();
            f.Dispose();

            // We must have read at least far enough to read dos_header->e_lfanew
            // Marshal.OffsetOf warns that the managed struct offsets may differ from the unmanaged,
            // but this is only true if the data types are not blittable.
            // Since C# lets me obtain points to these structs, I know that they are blittable,
            // so I can disregard the warning about Marshal.OffsetOf.
            if (n < (int)Marshal.OffsetOf(typeof(IMAGE_DOS_HEADER), "e_lfanew") + sizeof(int))
            {
                validDll = false;
                return;
            }

            unsafe
            {
                fixed (byte* d = data)
                {
                    IMAGE_DOS_HEADER *dos_header = (IMAGE_DOS_HEADER*)d;

                    // Make sure we read enough bytes to follow e_lfanew and read everything else:
                    // Use the 64-bit values since they are a few bytes larger.
                    // If we're running on 32 bits, there will still be other bits in the dll after the header,
                    // so we won't reject any false positives.
                    if (n e_lfanew
                        + (int)Marshal.OffsetOf(typeof(IMAGE_NT_HEADERS64), "OptionalHeader") // +24
                        + (int)Marshal.OffsetOf(typeof(IMAGE_OPTIONAL_HEADER64), "DataDirectory") // +224
                        + (int)Marshal.OffsetOf(typeof(IMAGE_DATA_DIRECTORY), "Size")  // +4
                        + sizeof(uint)) // +4
                    {
                        validDll = false;
                        return;
                    }

                    IMAGE_NT_HEADERS32* nt32_header = (IMAGE_NT_HEADERS32*)(d + dos_header->e_lfanew);

                    m = nt32_header->FileHeader.Machine;

                    validDll = Enum.IsDefined(typeof(MachineType), (Int32)m);
                    if (validDll)
                    {
                        machineType = ((MachineType)m).ToString();

                        // 0x10b indicates a PE32 assembly
                        // 0x20b indicates a PE32+ assembly (i.e. 64-bit, hence either x64 or Itanium)
                        if (nt32_header->OptionalHeader.Magic == 0x20b)
                        {
                            isManaged = ((IMAGE_NT_HEADERS64*)nt32_header)->OptionalHeader.DataDirectory.Size > 0;
                        }
                        else
                        {
                            isManaged = nt32_header->OptionalHeader.DataDirectory.Size > 0;
                        }
                    }
                }
            }

            if (validDll)
            {
                Assembly a;
                Module mod;
                PortableExecutableKinds peKind;
                ImageFileMachine machine;

                try
                {
                    a = Assembly.LoadFile(dll);
                    hasAssembly = true;
                    moduleCount = a.GetModules().Length;
                    // TODO: sanity check that PEKind matches what we inferred from the binary?
                    /*
                    m = a.GetModules()[0]; // TODO: multi-module assemblies?
                    m.GetPEKind(out peKind, out machine);
                    dllType = peKind.ToString();
                    machineType = machine.ToString();
                      */
                }
                catch (BadImageFormatException)
                {
                    // This is not quite right:
                    // The file may have an assembly but be opened on the wrong platform (32 vs 64 bits).
                    hasAssembly = false;
                    moduleCount = 0;
                }
                catch (FileLoadException)
                {
                    hasAssembly = true;
                    moduleCount = -1;
                }

            }
        }
    }
}

Enabling 64-bit Support in Visual Studio 2008

January 23rd, 2010

With all my confusion about 32- vs. 64-bits in C# and C++, I wanted to build some C++ exes and dlls in 64 bits to work out how everything fit together. But it took me a while to figure out how to do this in Visual Studio. For a C# project, there is an easy-to-find option under Build called Platform Target, with choices called Any CPU, x86, and x64. But there is nothing quite so evident among the properties for a C++ project.

To build a 64-bit version for your C++ app, you should open the Properties dialog and then click the Configuration Manager button in the upper-right corner. Then select the Active Solution Platform dropdown menu, and choose <New...>. Select x64, and request to copy over the old Win32 properties. Now you can build your project for either 32 or 64 bits. Visual Studio maintains separate properties for each. You use the Configuration Manager to set the “Active” platform, which is what actually gets built, but you can adjust the properties of either platform using the Platform dropdown at the top of the properties dialog.

Once you’ve enabled both platforms for your project, make sure x64 is set as the active target, and take at look at the x64 properties under Linker:Advanced. You should see one called Target Machine, set to MachineX64 (/MACHINE:X64). Under the Win32 version, this read MachineX86 (/MACHINE:X86), but Visual Studio should have flipped it automatically when you created the x64 target. If not, set it correctly now.

Note that if you have a mismatch between the Platform and the linker’s setting, you’ll get a build error like this:

fatal error LNK1112: module machine type ‘X86′ conflicts with target machine type ‘x64′

So you need MachineX86 with a Win32 platform, and MachineX64 with the x64 platform.

Now you should be able to build both 32- and 64-bit versions of your app!

Referencing a non-CLR DLL

January 18th, 2010

In my last post, I mentioned that I configured the C++ DLL to compile with the /clr option. I found this to be necessary in order to use the Visual Studio Add Reference command, so that the C# app could find the DLL at runtime. For some reason, Add Reference failed unless I used /clr. But strangely, I found that if I compiled without /clr and copied the DLL into my bin/Debug folder, everything still worked. And that isn’t too surprising: P/Invoke lets you call plain old Win32 DLLs.

But why wasn’t Add Reference working? Well, it turns out that Add Reference only lets you reference assemblies, and a win32 dll is not an assembly; you need CLR support for that. I had assumed Add Reference was some kind of generic build dependency, but I guess it’s something more specific. This is kind of disappointing. Who ever heard of an IDE without some way of managing project dependencies?

At first I thought my only option was to use Pre- and Post-Build commands. If you go to the Build Events tab of the C# properties window, you can see spaces for defining DOS commands that run before and after your build. VS supplies a bunch of macros you can use for important paths of other bits of information. So one way of solving my problem is to copy the DLL by hand. In the pre-build box, I entered this command:


copy "$(SolutionDir)$(Configuration)/MathDll.dll" "$(TargetDir)"

But that seems sort of like a hack. It leaves Visual Studio completely innocent of the project dependency. Later I found a slightly better (but still not so great) way of doing it. I got rid of this pre-build step, and under the C# project I used the Add Existing Item command. Then I navigated to the dll and added it to the C# project. I set its Copy to Output Directory property to Copy if newer. I left the Build Action property as Content, which seems to mean that Visual Studio just leaves it untouched. So now, as long as I’ve set up the dll portion to run first, the dll gets copied properly. This seems a little better than the DOS way, because at least Visual Studio is somewhat aware of that the C# project uses the dll, and that fact is also less hidden from developers (i.e. me, six months later).

UPDATE: On second thought, my first approach seems better. Because of the $(Configuration) macro, it works whether I’m building a Debug or Release version. But the Add Existing Item approach forces me to point to a dll in either the Debug or Release directory.

Perhaps the best solution is simply to compile with /clr. But I wanted to get both ways working, because I want to try both versions to see if either will resolve another error I’m getting, to be explained later. . . .

UPDATE 2: One problem I’ve found with the Pre-Build shell code is that even with $(Configuration), one line doesn’t work for all configurations. This is because if you set up, say, an “x64″ configuration in addition to the default “Any CPU” one, then the folder structure isn’t parallel. The default configuration has a bin/Debug folder, whereas the x64 configuration has a bin/x64/Debug folder. This difference is not captured in the $(Configuration) macro. You can work around this because Visual Studio maintains separate property values for each configuration, so you can set up different pre-build shell commands for different configurations. But this is sort of annoying.

Calling a C++ DLL from C#

January 18th, 2010

If I’m going to borrow some C++ code for a folder-picker dialog box, then first I have to figure out how to write a C++ DLL that is useable from C#. Microsoft has a nice tutorial about creating and using DLLs, but it’s in pure C++. “No problem,” I thought. “I know how to use P/Invoke.” Well, it turns out that it’s not so easy.

To keep things simple, I figured I’d write an app to do math, as in that MS tutorial. So I used Visual Studio to create a C# console application, and I wrote this code:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;

namespace MixedDll
{
    class Program
    {
        [DllImport("MathDll.dll")]
        static extern int Add(int a, int b);

        static void Main(string[] args)
        {
            Console.WriteLine(Add(21, 21));
            Console.ReadLine();
        }
    }
}

As you can see, I’m planning to write a dll called MathDll.dll, which will export one function: int Add(int a, int b). This requires adding a second project to our solution. In Visual Studio, each “solution” can have multiple “projects,” where each project builds some artifact like a dll or exe. As far as I can tell, you may only use one language within a given project.

Following the MS tutorial, I chose to create an empty C++ project rather than following any of the templates. Then in the settings (under Configuration Properties:General), I changed Configuration Type from .exe to .dll. I also added support for the Common Language Runtime (/clr). Then I added two files. First was MathDll.h:


extern "C" {
    __declspec(dllexport) int Add(int a, int b);
}

The second file was MathDll.cpp:


#include "MathDll.h"

extern int Add(int a, int b) {
    return a + b;
}

Finally, I added a reference from the C# project to the C++ project. In the Solution Explorer on the right, I right-clicked on References under MixedDll. Then I went to the Projects tab and selected MathDll. Oops! Visual Studio wouldn’t let me add the reference! Everything built, but I couldn’t get the IDE to supply access to the dll at runtime. Well, I figured I could do that myself. So after building everything, I manually copied MathDll.dll to MixedDll/bin/Debug. That’s where MixedDll.exe gets built, so I figured it should be able to pick up the dll.

Unfortunately, my little hack didn’t work. When I ran the program, I got a BadImageFormatException. It turns out this is what happens when you mix 32-bit and 64-bit binaries. As far as I can tell, a 32-bit exe can only use 32-bit dlls, and a 64-bit exe can only use 64-bit dlls. My C++ project was building as 32-bit, but my C# app was building as “Any CPU” (under Properties:Build:Platform Target), which I guess results in a 64-bit product.

The right solution would have been to build a 64-bit dll, but for some reason Visual Studio wasn’t giving me that option. So I switched the C# app to 32-bit and tried again. This time I was able to add the reference (no more kludgy manual copy)! When I rebuilt everything and ran the program, I saw my output: 42!

Choose Folder Dialog Box

January 18th, 2010

I know that whenever you start learning a new platform, you discover surprising omissions, but still I was shocked at something I learned recently about C#. As one of my first C# programs, I decided to write a little app that sets a random desktop background based on the files within a given folder. To my surprise, C# doesn’t give you a decent-looking dialog box to choose a folder. There is SaveFileDialog and OpenFileDialog, and even FontDialog and ColorDialog. These all seem pretty good. But then there is FolderBrowserDialog. It’s a tiny box with a tree-view that lets you navigate around the filesystem. Clicking those little plusses and minusses feels like setting DIP switches. You can’t start the user in whatever folder you want; he’s got to begin in one of a few pre-defined locations like My Documents or Desktop. It doesn’t seem to be very smart about shortcuts. You can’t see the path. There is no list along the left-side of commonly-visited places, as we have come to expect in every filesystem dialog these days. This is such a nasty, unprofessional bit of UI, I’m amazed it is still around and Microsoft hasn’t fixed it just to relieve the embarrassment. Certainly I’d be embarrassed to use it in one of my programs. Others must find it pretty bad also, because several replacements exist for it, including at least two commercial ones. I suspect Microsoft is pretty embarrassed about it, too, because they use something else for Word and Visual Studio–something that looks a lot like OpenFileDialog, but that lets you select a folder. I wonder how many times Microsoft has implemented this thing? But that just makes the ugly little FolderBrowserDialog more a mystery. If Microsoft keeps implementing a nice version of this dialog, one that looks like it actually belongs and wasn’t written by a kid for a school project, why don’t they push it into the API?

So anyway, I set out to find a better folder dialog. At first I thought about writing my own, but soon I decided it’d be easier to use an existing implementation. I liked the look of XFolderDialog, written in C++, so I figured I’d just compile it as a DLL and then call it with C#’s DLLImport. I figured I could simplify things by exporting just a single function, something like this:

int OpenDialog(LPCTSTR initialFolder, LPCTSTR title, CWnd* parentWindow, LPTSTR selectedPath) {
    XFolderDialog d(initialFolder, 0, parentWindow);
    d.SetOsVersion(CXFolderDialog::XFILEDIALOG_AUTO_DETECT_OS_VERSION);
    d.SetTitle(title);
    int result = d.DoModal();
    if (result == IDOK) {
        wcscpy(selectedPath, (LPCTSTR)d.GetPath());
    }
    return result;
}

Unfortunately it has all proved not so easy! I still haven’t quite worked everything out yet, but I have learned a ton about C# Interop, Windows DLLs, this new thing called “Managed C++” (now C++/CLI), and Visual Studio. There are so many lessons I’ve had to learn, I thought I’d better start this blog so I could remember them and sort them all out. The next few posts will go through them one-by-one. I can’t promise a happy ending, though. If all my efforts fail, I may have to fall back on a 100% C# implementation like this one here.

Hello world!

January 18th, 2010

I’ve been tackling a lot of new languages lately–ruby, Objective C, C#, a tiny taste of lisp–as well as learning the Windows API (C++). My experience is really in the open source Unix world: Java, Perl, Python. There has been so much to figure out, I thought I’d start a notebook to keep track of it all. And why not put it online so other people can benefit? Lately my projects have been in C#/C++, so that will probably dominate the posts for a while.