Scaling Sidekiq

2017-04-12

Sidekiq is a great option for handling background jobs in Ruby projects. Here I’ll show you how to get the best utilization out of a box dedicated to running Sidekiq jobs. Whether you have one machine or ten, the goal is to work off as many jobs as possible from each machine.

CPU Utilization

To do that, we want to keep every core busy. You can monitor your CPU activity with a tool like vmstat(8). If you say vmstat 10, you’ll get a new row every ten seconds, like so:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 9  0  92056 553620 174128 6268208    0    0     1    25 1323 1443 23 77  0  0  0
 8  0  92056 552628 174128 6268492    0    0    14    38 1307 1447 22 78  0  0  0
 8  0  92056 551760 174132 6268672    0    0     3    49 1317 1461 23 77  0  0  0
 8  0  92056 550544 174140 6268832    0    0     1    28 1353 1490 23 77  0  0  0
 8  0  92056 550296 174148 6268940    0    0     1    35 1240 1360 22 78  0  0  0

The last few columns show percentage of CPU time doing user work (us), system work (sy)—which more or less means executing kernel system calls—, idle (id), and blocked on I/O (wa, “waiting”). Add the first two together to get how much your CPU is working.

There is also st, which means “stolen time”! If you are running a VM, this means the time your hypervisor gives to someone else. You will probably see all zeros in this column, so I’ll ignore it going forward. We want to get our idle and wait time as close to zero as possible, without drowning the machine in too much work. Above we have a machine that is doing a good job keeping busy. You can see that its user and system time are at 100, and the other columns are zero.

So how do we do this? Usually it means running lots of jobs at the same time, not one-after-another. Even a one-core machine can juggle many jobs. That’s because jobs typically have to block on I/O, for example when they talk to the network or save things to disk. While that job is waiting for an answer, your CPU can work on another one.

(This is not really relevant here, but disk I/O is a little different than other I/O. Technically, reading/writing with a regular file can’t “block” but only “sleep”. That completely messes up non-blocking I/O for regular files, and even the newer aio functions have many limitations and in fact are implemented by threads in userspace—but fortunately we are not talking about single-threaded non-blocking I/O; we are talking about multiple processes/threads. Whether you call it blocking or sleeping, the CPU will still schedule different work if something is stuck on a regular file read. And if this paragraph doesn’t make sense, feel free to dismiss it as a pedantic footnote. :-)

Threads

Sidekiq is great here because it supports multi-threading. Multiple threads let you do concurrent work in one Ruby process (at least as long as you are not still stuck on Ruby 1.8). Without threads you’d need a separate process for each concurrent job, and that can use up memory quickly, especially with something like Rails. It is always sad to have more CPU available that you can’t use because you’re out of RAM.

In practice, threads mostly help if you are using a concurrent Ruby implementation like JRuby. MRI Ruby has a Global Interpreter Lock (GIL), which prevents two threads from executing at once. Still, even in MRI you will still see some benefit, because when one thread blocks on I/O, MRI can make progress on another. So despite the GIL, MRI can still make sure at least some thread is running.

With Sidekiq, you can say how many threads to run with the concurrency setting. Normally you’d set this in your sidekiq.yml file. Note that each thread needs its own database connection! That means if your concurrency is 10, then in your database.yml you must have a pool of 10 also (or more). Otherwise the threads will halt each other waiting to check out a database connection, and what is the point of that? They might even get timeout errors.

Processes

But wait, there’s more! We can push the concurrency up and up, and still see something like this from vmstat:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0  92048 585840 174148 6269120    0    0     0    16  956  990  7 18 75  0  0
 1  0  92048 585212 174160 6269256    0    0     1    23 1012 1037  8 17 75  0  0
 1  0  92048 580456 174164 6269312    0    0     1    33  896  884  8 17 75  0  0

Why is the idle CPU time stuck at 75?

It turns out this is a four-core machine, and even with multiple threads, a single MRI process can only use one core, because of the GIL. That’s not what we want at all!

So the answer is to run more processes—at least one per core. You can see how many cores you’ve got by saying cat /proc/cpuinfo. If you are using something like god, it is easy to put several processes into a group and control them together, like so:

app_root = '/var/www/myapp/current'
4.times do |i|
  God.watch do |w|
    w.name     = "myapp-sidekiq-#{i}"
    w.group    = "myapp-sidekiq"
    w.log      = File.join(app_root, 'log', "#{w.name}.log")
    w.pid_file = File.join(app_root, 'current', 'tmp', 'pids', "#{w.name}.pid")
    w.start    = <<-EOS.gsub("\n", " ")
      cd '#{app_root}' &&
        bundle exec ./bin/sidekiq --environment production
                                  --pidfile '#{w.pid_file}'
                                  --logfile '#{w.log}'
                                  --daemon
    EOS
    # ...
  end
end

If you aren’t using god and don’t know how to do this with your own process manager, then I think Sidekiq Enterprise has a similar feature called Swarms.

Scheduling cores

That gets us almost there, but there is still a problem. You might still see idle time stuck somewhere, like (on a 4-core system) 25. The problem is that while the kernel does its best to use your whole CPU, it can still wind up putting two Sidekiq processes on the same core, and since they are such long-lived processes, they are stuck that way, competing for their shared core while another one sits idle. The kernel doesn’t know ahead of time that they are going to run for days and keep so busy.

Fortunately we can still force each process onto its own core. For that we use the taskset(1) command. When you say taskset -c 2 date, you are telling Linux to run date on core 2. (Core numbers start from zero, as you can see in /proc/cpuinfo.) So our god config would become:

w.start    = <<-EOS.gsub("\n", " ")
  cd '#{app_root}' &&
    bundle exec taskset -c #{i} ./bin/sidekiq --environment production
                                              --pidfile '#{w.pid_file}'
                                              --logfile '#{w.log}'
                                              --daemon
EOS

After that, we’ll have one process on each core. At this point, you should start experimenting with your concurrency setting, to make each core fully utilized. The right setting there will depend on how much blocking a job does, but I have seen useful numbers up to 20. Just try some things out, and watch vmstat.

Note that you don’t have to increase the connection pool size in database.yml as you add cores (just concurrency). That’s because each process is a separate thing, each with its own pool. But you do have to increase the max connections on your database server. For instance with Postgres you’d want to set max_connections in postgresql.conf. Here you need to allow enough connections for concurrency times cores (times servers), plus some more for your actual Rails app serving web requests, plus some more for anything else you’ve got going on. That can be a lot of connections! Don’t be surprised if improving your job throughput exposes a bottleneck elsewhere in your system.

Conclusion

Tuning Sidekiq can be complicated, because of the double layers of threads plus processes. You need a confident understanding of how Ruby and your operating system handle concurrency, and it helps to use tools like vmstat to measure what’s going on and verify your understanding. If you have a box dedicated to just Sidekiq jobs, my recommendation is to run one process per core, using taskset to keep them separate, and then tune concurrency from there. Hopefully this will help with your own projects!

blog comments powered by Disqus Prev: Doing Many Things Next: Postgres Permissions

Illuminated Computing

Scaling Sidekiq

CPU Utilization

Threads

Processes

Scheduling cores

Conclusion

Paul A. Jungwirth

Code

Writing

Talks