EVAL in Redis

2012-10-11

I had a chance to play with Redis’s new EVAL command recently. This command is only available in Redis 2.6 (currently at Release Candidate 8), and it lets you run Lua code on the Redis server. The nice part about EVAL is that your Lua code is atomic, so you can get transactional semantics.

Redis already had the MULTI command, which let you issue several commands as an atomic unit, but the problem there is that none of the commands give you a return value until the entire block is complete, so you can’t make decisions inside the MULTI command. Here is an example of when that’s important:

I’ve been working on a web crawler lately, and it needs to crawl a site with millions of pages. I wanted to store the pages to be crawled in a queue called seeds, and the crawler thread(s) would grab a page off the queue, crawl it, and then push newly-discovered pages back onto the queue. But I also needed to keep track of which pages I’d seen, because I didn’t want to crawl them twice. So I kept those pages in a set called crawled. You only queue a page on seeds if it isn’t in crawled. Simple enough. I decided to store both seeds and crawled in Redis, so that if the crawler crashed it could pick up again where it left off.

The trick is that if you’re running multiple crawlers in parallel, then the queue-a-seed-unless-already-crawled part needs to be atomic, but it requires two steps: check in crawled, then push to seeds. And you can’t use MULTI, because whether to queue depends on crawled.

Now you could use external syncrhonization to handle this, which may be easier if you’re just running several threads within a single process, but if you’re running several processes, perhaps even distributed across multiple machines, then Redis is a convenient synchronization point since you’re going there already.

In the Redis EVAL command, you pass several arguments. The first is a string with the Lua code you want run. The next is an integer telling how many more arguments to treat as Redis keys, available in Lua from the KEYS array as KEYS[1], KEYS[2], etc. Any further arguments are available from the ARGV array. (Yes, Lua array indexes start at 1.)

The Lua script can run Redis commands via redis.call() and redis.pcall(). The former passes errors back to you; the latter lets you trap errors in Lua.

Here was my EVAL command (called via a Ruby client):

redis.eval("if redis.call('SISMEMBER', KEYS[1], ARGV[1]) == 0 " +
           "then redis.call('RPUSH', KEYS[2], ARGV[1]) " +
           "end",
           ['crawled', 'seeds'], [seed])

A couple things to note: First, we are passing our key names in the KEYS array rather than hard-coding them into the Lua script. This is per the Redis documentation, and allegedly it helps Redis optimize our code, although I don’t understand the explanation why.

Actually I suspect it is only an optimization when your key names are variable, so you get better performance vs. passing them in ARGV. But since we use the same key names every time, I’d bet it would be just as fast to hard-code them in Lua rather than pass them to KEYS. It may even be faster that way!

Second, we are saying if redis.call(...) == 0 rather than simply if not redis.call(...). This is because ISMEMBER returns 0 or 1, and to Lua both those values are true.

I think this is a very nice example of why you might use Redis’s new EVAL command and how it works. I hope it was helpful to you!

blog comments powered by Disqus Prev: Highlighting Lines in Github Next: db_leftovers gets CHECK constraints