Hammer All the Cores

My current workstation has 8 CPU cores (each core can handle a stream of instructions independently, so it’s more-or-less like having 8 CPUs – 8 different “brains” that can each be running its own thing at the same time). My last computer had 2, so I’m guessing my next one will have 32. They seem to be hitting a wall on how fast a single CPU can be, so the next best thing is to stack more and more of them together.

The only problem is that most programs can only use a single core. It’s a lot more complicated to write a program to spread its work across multiple cores, and some programs couldn’t take advantage of that anyway. So there are many times when I’m running a program that’s working one core as hard as it can, while the other seven are mostly idle. The nice thing about that is that one heavy program doesn’t bog down the system, since other programs can sail along on the other cores. Most of the time, that’s great. But if you have a task that you want to complete quickly, it would be nice to spread it across more of them.

For instance, I recently needed to increase the volume on a series of 15 podcast files. You can do that with ffmpeg, using a command like this:

ffmpeg -i file.mp3 -vn -sn -dn -af volume=10dB new/file.mp3

That reads file.mp3, ignoring the video and subtitle streams, and writes it into the same filename in a ’new’ subdirectory, bumping up the volume.

But it takes a minute or two per file, and I have 15 of these files, so I don’t want to type that command for each file every couple minutes. So the simple thing to do is to wrap a loop around it:

time (for i in *.mp3; do  ffmpeg -i $i -vn -sn -dn -af volume=10dB new/$i  2>/dev/null; done)

A couple of new things here. First, I wrapped the whole thing in a ’time’ call so it would tell me how long it took. I also sent the output of ffmpeg to /dev/null, so it’s not filling up the screen. The loop runs ffmpeg for each MP3 file, substituting the filename for $i in the command.

But here’s where I run into the problem I started this post about, because it runs one command at a time, and the whole thing took 29 minutes. How could I run them in parallel? Well, an easy way is to run them in the background, so the for loop won’t wait for each one to finish before starting the next. Like this:

for i in *.mp3; do (</dev/null ffmpeg -i $i -vn -sn -dn -af volume=10dB new/$i 2>/dev/null)&; done

The new thing here is that the & puts the ffmpeg command in the background. I give ffmpeg its input from /dev/null, because otherwise when you put it in the background, it stalls and complains because it’s watching for input from standard input (the keyboard, usually). I had to remove the time call because, since this puts the commands in the background, it finishes immediately. So I timed this manually, and it took a little over five minutes.

That’s a big improvement, but now there’s a new problem: I’m running 14 processes that can each use one CPU to its limit, but I only have 8 cores, so they’re having to share. That’s not a problem at this scale, because FreeBSD multitasks very well, and I didn’t have anything else important going. But what if I had a hundred, or a thousand files? Running that many ffmpeg processes in parallel could bring a machine to its knees.

So I’d really like to limit how many run at once, putting them into a queue so that one starts after another finishes, but a certain number can run at once. Now, there are programs that are designed to do just that, and I could install one of them and learn how to use it. But one thing I like about Unix is that, if you know the basic tools, you can put them together to do complicated tasks for unexpected tasks that come along. It’s like when you’re working on a car and the shop manual says, “You will need a door handle clasp removal tool to remove the door handle clasp.” Yeah, right. I’m not buying a tool that I’ll only use once. I have pliers and screwdrivers; I’ll get it off just fine, probably after some swearing and bloody knuckles.

So my inclination is to look first to the standard tools, and there’s one that fits the bill here: xargs. Xargs is a great program that takes a stream of text input and passes it to a program as arguments. I use it in combination with find every day in commands like this one that searches through a tree of files for a phrase:

find . -type f -print0 | xargs -0 grep phrase

But xargs also has a queuing ability, because you can tell it how many times it can run its argument at once. So I dropped the for loop (since xargs effectively does its own loop), and rewrote my command:

time (echo *.mp3 | xargs -n1 -I %% -P6  ffmpeg -i %% -vn -sn -dn -af volume=10dB new/%%  2>/dev/null)

I was able to bring back time, since this doesn’t background anything. The arguments to xargs tell it to take one argument from the pipe at a time (-n1), replace %% in the ffmpeg command with that argument (-I %%), and run up to 6 processes at a time (-P6). This took just over 7 minutes, and it never pushed the load average over about 6.5, which means I still had a CPU or two available for doing other work without getting slowed down. If I let it run 8 at a time, it might shave another minute off.

So in the final analysis, I got a 4-times speedup on the task, using very basic tools available on any *nix system, without any complicated programming effort. And I learned a little something about xargs in the process. Good deal.