unix Archive

Concatenating Videos with FFmpeg

I capture screencasts with OBS and then process them with ffmpeg before uploading. Surprisingly, OBS doesn’t have a pause button, so if I have to stop for anything, I stop it and restart when I come back, and then I have to concatenate multiple videos into a single one later.

The ffmpeg docs say you can concatenate like this:

ffmpeg -i "concat:video1.mp4|video2.mp4|video3.mp4" [other arguments]

That doesn’t always work, though. I don’t know exactly why, but it has something to do with mismatched aspects of the files. Sometimes it stops after the first file and ignores the rest of them. So it’s more reliable to use the other method, where you put the filenames of your videos in a separate file like this:

file video1.mp4
file video2.mp4
file video3.mp4

And then you include that file like this:

ffmpeg -f concat -i filelist.txt [other arguments]

That always works, but it’s a hassle to create the separate file each time. So I finally wrote the shell script below to automate the process.

#!/bin/sh
#
# myff - script to take a list of files from stdin,
#   save the list to a file for ffmpeg to concat from

FFTEMP=$(mktemp)
if [ ! -f $FFTEMP ]; then
    echo Unable to create temp file $FFTEMP
    exit
fi

for i in `cat -`; do
    echo -n "file "
    readlink -f $i
done >$FFTEMP

ffmpeg -safe 0 -f concat -i $FFTEMP "$@"

rm $FFTEMP

Now I can just pass the video files to that on standard input, and it’ll make the temporary file for me, give it to ffmpeg, and delete it afterwards. I use it like this:

echo video?.mp4 | myff [other arguments]

A couple of technical notes. ffmpeg is picky about filenames and paths. It expects the video files and the list file to be in the same directory by default, and doesn’t allow full pathnames in the list unless you pass `-safe 0` first on the command line. Also, this was written on FreeBSD, but I think it should be completely portable to other *nix systems, as long as they have standard `mktemp` and `readlink`.

One flaw is that it doesn’t handle spaces in filenames. That’s okay with me, because I don’t allow them out of old habit. It would have to be more complicated to tell the difference between spaces in filenames and spaces between filenames. By treating them all the same, I can pass the filenames in from `echo`, `ls`, or whatever other tool is handy.

Hope this comes in handy for someone who ran into the same frustration with “-i concat:” that I did.

Fixing the Internet

I’ve been experimenting lately with IPFS, the InterPlanetary File System, and learning more about distributed information systems like it. I think I mentioned this kind of thing in passing in a podcast a year or so ago, so I thought I’d do more of an explanation of it. First I demonstrate the client-server model which most Internet applications use, and why it’s increasingly fragile now that a handful of corporations control so much of our access to and ability to share information. Then I describe the distributed model that I expect will replace it, using IPFS as an example.

This stuff is in use, but very much under development, so I expect to do more videos and articles on it in the future, as more uses are found for it and applications built on top of it. Right now, it’s kind of like the early 1990s again, where you had to be a hobbyist, if not an expert, to really use the Internet well. It’ll take some time before these distributed systems can be used as handily as we can currently browse the web. But in the long run, we’ll have an information system that’s about as anti-fragile as possible.

IPFS key for this video: QmcMjiKBG7yoLdqXQJADBd7SqGh4Tmr5wzTF7g5rQ4cYZR

Spinning Up a New Digital Ocean Droplet (VPS)

I have a $5/month virtual server at Digital Ocean, which I use for some light work and for an extra location outside my usual networks from which to test connectivity. I noticed recently that they’d increased the RAM and disk space included for that price. It turns out I could have just clicked a button to expand it, but I decided to make a new droplet and move everything to it, since that’s really how you’re supposed to handle the cloud – lean toward spinning up new systems rather than getting attached to the ones you have.

So I recorded myself going through the process of starting and configuring the droplet, setting it up far enough to start installing packages. If you’re inspired to get a droplet for yourself, feel free to use my referral link, which I think gives me $5 when someone signs up with it. I’m also available if you need a sysadmin for your servers.

Fixing BBDB in Emacs with bbdb-migrate

I recently upgraded Emacs and BBDB, and it stopped working to auto-complete addresses in Gnus. The error turned out to be that it was trying to run bbdb-migrate to update the database, and I wasn’t loading that. So I just needed to add this to my .emacs:

(require 'bbdb-migrate)

And do a C-x C-e at the end of that line to execute it. Then the next time I tried to use BBDB by auto-completing an address, it took a few moments to migrate the database, then worked fine.

I’m not sure if I have to leave it enabled, so I just commented out the line for now, and I’ll see if it keeps working after my next restart of Emacs, whenever that might be.

Getting bwn driver working on a Dell Latitude D520

I run FreeBSD on a Dell Latitude D520 laptop. One issue in installing it is that the wireless doesn’t work out of the box, so you have to install firmware for it. In this machine’s case, the needed firmware is in the net/bwn-firmware-kmod port. So you have to connect with the Ethernet port long enough to get that installed, or pull it in some other way, like a flash drive.

After installing the port, though, it still wasn’t working. It installs three files in /boot/modules:

  • bwn_v4_lp_ucode.ko
  • bwn_v4_n_ucode.ko
  • bwn_v4_ucode.ko

But looking at the dmesg errors, it was searching for a file called bwn_v4_ucode5.ko. So I had to do this to get it working:

cp /boot/modules/bwn_v4_ucode.ko /boot/modules/bwn_v4_ucode5.ko

I don’t know why the discrepancy, but this gets mine running, so I haven’t dug into it further. It seems like laptops vary greatly, even within the same model number sometimes, so you never know. I figured I’d put it on public record, in case anyone else gets the same error message and goes hunting for a solution.

Hammer All the Cores

My current workstation has 8 CPU cores (each core can handle a stream of instructions independently, so it’s more-or-less like having 8 CPUs – 8 different “brains” that can each be running its own thing at the same time). My last computer had 2, so I’m guessing my next one will have 32. They seem to be hitting a wall on how fast a single CPU can be, so the next best thing is to stack more and more of them together.

The only problem is that most programs can only use a single core. It’s a lot more complicated to write a program to spread its work across multiple cores, and some programs couldn’t take advantage of that anyway. So there are many times when I’m running a program that’s working one core as hard as it can, while the other seven are mostly idle. The nice thing about that is that one heavy program doesn’t bog down the system, since other programs can sail along on the other cores. Most of the time, that’s great. But if you have a task that you want to complete quickly, it would be nice to spread it across more of them.

For instance, I recently needed to increase the volume on a series of 15 podcast files. You can do that with ffmpeg, using a command like this:

ffmpeg -i file.mp3 -vn -sn -dn -af volume=10dB new/file.mp3

That reads file.mp3, ignoring the video and subtitle streams, and writes it into the same filename in a ‘new’ subdirectory, bumping up the volume.

But it takes a minute or two per file, and I have 15 of these files, so I don’t want to type that command for each file every couple minutes. So the simple thing to do is to wrap a loop around it:

time (for i in *.mp3; do  ffmpeg -i $i -vn -sn -dn -af volume=10dB new/$i  2>/dev/null; done)

A couple of new things here. First, I wrapped the whole thing in a ‘time’ call so it would tell me how long it took. I also sent the output of ffmpeg to /dev/null, so it’s not filling up the screen. The loop runs ffmpeg for each MP3 file, substituting the filename for $i in the command.

But here’s where I run into the problem I started this post about, because it runs one command at a time, and the whole thing took 29 minutes. How could I run them in parallel? Well, an easy way is to run them in the background, so the for loop won’t wait for each one to finish before starting the next. Like this:

for i in *.mp3; do (</dev/null ffmpeg -i $i -vn -sn -dn -af volume=10dB new/$i 2>/dev/null)&; done

The new thing here is that the & puts the ffmpeg command in the background. I give ffmpeg its input from /dev/null, because otherwise when you put it in the background, it stalls and complains because it’s watching for input from standard input (the keyboard, usually). I had to remove the time call because, since this puts the commands in the background, it finishes immediately. So I timed this manually, and it took a little over five minutes.

That’s a big improvement, but now there’s a new problem: I’m running 14 processes that can each use one CPU to its limit, but I only have 8 cores, so they’re having to share. That’s not a problem at this scale, because FreeBSD multitasks very well, and I didn’t have anything else important going. But what if I had a hundred, or a thousand files? Running that many ffmpeg processes in parallel could bring a machine to its knees.

So I’d really like to limit how many run at once, putting them into a queue so that one starts after another finishes, but a certain number can run at once. Now, there are programs that are designed to do just that, and I could install one of them and learn how to use it. But one thing I like about Unix is that, if you know the basic tools, you can put them together to do complicated tasks for unexpected tasks that come along. It’s like when you’re working on a car and the shop manual says, “You will need a door handle clasp removal tool to remove the door handle clasp.” Yeah, right. I’m not buying a tool that I’ll only use once. I have pliers and screwdrivers; I’ll get it off just fine, probably after some swearing and bloody knuckles.

So my inclination is to look first to the standard tools, and there’s one that fits the bill here: xargs. Xargs is a great program that takes a stream of text input and passes it to a program as arguments. I use it in combination with find every day in commands like this one that searches through a tree of files for a phrase:

find . -type f -print0 | xargs -0 grep phrase

But xargs also has a queuing ability, because you can tell it how many times it can run its argument at once. So I dropped the for loop (since xargs effectively does its own loop), and rewrote my command:

time (echo *.mp3 | xargs -n1 -I %% -P6  ffmpeg -i %% -vn -sn -dn -af volume=10dB new/%%  2>/dev/null)

I was able to bring back time, since this doesn’t background anything. The arguments to xargs tell it to take one argument from the pipe at a time (-n1), replace %% in the ffmpeg command with that argument (-I %%), and run up to 6 processes at a time (-P6). This took just over 7 minutes, and it never pushed the load average over about 6.5, which means I still had a CPU or two available for doing other work without getting slowed down. If I let it run 8 at a time, it might shave another minute off.

So in the final analysis, I got a 4-times speedup on the task, using very basic tools available on any *nix system, without any complicated programming effort. And I learned a little something about xargs in the process. Good deal.

Using Jails with ZFS on FreeBSD - Part 1

For FreeBSD administrators, ZFS and jails combine to make virtualization easy, fast, and secure. A FreeBSD jail is a virtual machine which can only access the resources assigned to it when it was created, so its processes have no access to the rest of the machine. ZFS is an advanced filesystem that makes it very easy to create and destroy filesystems whenever they are needed. Together, they make it a matter of moments to create a new virtual system for testing, walling off network services, or other projects.

This article, Part 1, will walk you through setting up a host FreeBSD system to be ready for jails. Part 2 will cover creating a jail to run a network service. In the terminology I’ll use here, the “host” system is the main OS, which can control and look inside its jails. The “jailed” or “guest” system can only see what resources the host has assigned to it, and cannot see outside itself.

Create a ZFS pool (if necessary)

If you already have a ZFS pool on your system and want to put your jails in it, you can skip this step. This sets up a ZFS pool named zf on one or more hard drives, in which you will then create your ZFS filesystems. Depending on how many free drives you have available, use one of these commands, substituting in the device names for your drives:

zpool create zf /dev/ada2                             # one drive, no redundancy
zpool create zf mirror /dev/ada2 /dev/ada3            # two drives, mirrored
zpool create zf raidz  /dev/ada2 /dev/ada3 /dev/ada4  # 3+ drives, striped

Create and mount a filesystem for your jails

We will create one filesystem called zf/jail, mount it on /usr/jail, and give it the options we want all our jails to have. Then those options will be inherited by all filesystems created beneath it:

zfs create zf/jail
zfs set mountpoint=/usr/jail zf/jail
zfs set compression=on zf/jail

You probably want to turn on compression, unless you know you’re going to be storing mostly already-compressed files in the jail. You can also turn that on and off per-jail later, so use whatever you want as the default here.

If your pool has a single drive, you may also want to use what I call “poor man’s RAID,” by telling ZFS to store two copies of every file. If the drive fails entirely, you will still lose everything, so it’s not as good as multiple drives or a replacement for regular backups. But if individual sectors fail or there are occasional bit errors, ZFS will be able to repair a file by making a new copy based on the other good sector, so you might be able to get by until you’re ready to replace it. To turn on two copies:

zfs set copies=2 zf/jail

Now create a filesystem in which to build a fresh FreeBSD install. Give it a dotfile name, because you won’t actually be using this one as a live system, so that’s an easy way to keep it separate from them in scripts:

zfs create zf/jail/.freebsd-10x64

Unpack FreeBSD into the new jail

Go to your favorite FreeBSD mirror site and fetch the distribution files matching your architecture and the release you want to use. You can get your architecture with uname -p, and see your release with uname -r (dropping any -pX patchlevel from the end). In my case, my architecture is amd64 and my release is 10.2-RELEASE, so I fetched from ftp://ftp5.us.freebsd.org/pub/FreeBSD/releases/amd64/10.2-RELEASE.

You don’t need the kernel, ports, or doc archives, so grab the other four. (You probably don’t need games either, but it’s small.) Download them into somewhere handy. I put them in /root.

Unpack them into your new jail:

cd /usr/jail/.freebsd-10x64
tar -xJvf /root/base.txz
tar -xJvf /root/lib32.txz
tar -xJvf /root/src.txz
tar -xJvf /root/games.txz

Setup the fresh install

You need to copy a few things into your new FreeBSD install and setup a few things to make it a bootable OS of its own:

cp /etc/resolv.conf /usr/jail/.freebsd-10x64/etc/  # so the jail can do DNS

Edit /root/.profile and add this line, if you aren’t already defining and exporting ENV. The reason for this will appear later:

ENV=$HOME/.shrc ; export ENV

Now we’ll chroot into the filesystem, so that the following commands will treat the jailed filesystem as if it is the root filesystem. These are setup details that would normally be handled by the installer. The last line updates the guest OS with any available updates.

chroot /usr/jail/.freebsd-10x64

passwd               # (set the password for root in the jail)
mkdir /usr/ports
mkdir /usr/home
ln -s /usr/home /home
cd /etc/mail
make aliases
freebsd-update fetch install

Now edit /root/.shrc (still chrooted into the jailed filesystem) and add the following line, plus any other environment variables or aliases that you want to set when you run a shell within the jail. This will put JAIL:{hostname} in your command prompt later whenever you enter a jail as root, so you won’t get confused about whether you’re in the host or the guest. You don’t want to do a rm -rf * at some point, thinking you’re in the jail, and then realize you already exited and are wiping something out on the host.

PS1='JAIL:{\h} \$ '

Edit /etc/rc.conf and add a few lines to keep the jail from running things it doesn’t need to:

sendmail_enable="NONE"
syslogd_flags="-ss"
rpcbind_enable="NO"

Edit /etc/make.conf and add these lines. The important thing here is that we’re going to have each jail mount /usr/ports read-only from the host system, so all your jails don’t have to download and maintain their own copies of the ports tree. But since they won’t be able to write in /usr/ports, they need to download distfiles, build ports, and store packages somewhere local. If you don’t want them in /var, choose somewhere else, just not under /usr/ports.

WITH_PKGNG=yes
WRKDIRPREFIX=/var/ports
DISTDIR=/var/ports/distfiles
PACKAGES=/var/ports/packages
INDEXDIR=/usr/ports

Now run pkg once to setup the pkg directories, and ignore the error it spits out.

If there’s anything else you can think of that you want all your jails to have, go ahead and put it in place now. For instance, if you want a particular user account in every jail, create it now. When you’re ready, exit to get out of chroot and back to the full host.

Create a snapshot of this prepared FreeBSD image

Now that you have this fresh install of FreeBSD configured to your satisfaction and have exited back to the host, take a ZFS snapshot of its filesystem. You will clone this snapshot later to create individual jails. I name it “ready” to show that it is ready for cloning:

zfs snapshot zf/jail/.freebsd-10x64@ready

Setup the host to support jails

First enable jails:

echo jail_enable="YES" >>/etc/rc.conf

Now create /etc/jail.conf and add the following lines.

# file: /etc/jail.conf
# Defaults
exec.prestart = "/sbin/mount -t nullfs -o ro /usr/ports/ /usr/jail/$name/usr/ports";
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.poststop = "/sbin/umount -f /usr/jail/$name/usr/ports";
exec.clean;
mount.devfs;
mount.fstab = "/etc/fstab.blank";
host.hostname = "$name.domain.com";   # replace 'domain.com' with your own
allow.nomount;
path = "/usr/jail/$name";

You’ll add a few lines here later when you create your first jail, but this sets up defaults for all your jails. To explain some of these lines: the jail commands replace $name in these settings with the name of a jail. The exec.prestart lines runs before the jail starts and mounts /usr/ports read-only so the jail can see it. The exec.poststop line likewise unmounts it when you stop the jail. It gives a blank fstab so the jailed OS won’t complain on boot. Set the hostname domain to whatever you like; the $name will match the jail name, which makes things easy.

Now create that empty fstab:

touch /etc/fstab.blank

Make jails!

Now your host system is ready to create all the jails you like! The first time you do all this, it may take a few hours, as you get things just the way you want them. With some experience, it can all be done in 30 minutes or so.

Coming up in Part 2: creating a jail to support a single network service.

(Hat-tip to Savagedlight, whose article on FreeBSD jails and ZFS clones was a major source of the procedure I’ve outlined here.)

My Public Key

For those who know what it is, here’s my public key. I’m going to start signing my email with it, so you can use it to verify me, and feel free to encrypt email to me with it. Contact me via any other channel you like to get my fingerprint to verify that it matches this, to make sure someone hasn’t compromised my web site and changed it.

It’s a bit longer than usual because it has a JPEG of my smilin' mug encrypted in it, for another possible way to verify it.

Reaching for the Unix Toolchain

The first time I used the Unix shell, I was hooked. The idea of having all these little programs, each of which did one thing, and being able to chain them together to do more complicated things, made perfect sense. Coming from an 8-bit background, where you were always up against the limits of the machine and waiting for programs to load, keeping everything small and focused was great.

I still reach for the toolchain on my own systems on a daily basis. There are many times when I could reach for a language like Perl or C and write a complete application, but so many times that isn’t necessary – I just want to do one thing, one time, and do it right now with as little work as possible.

For instance: today I was going to listen to music on my MP3 player, but the plug is getting loose, so it wouldn’t play right. So I plugged it into the computer, mounted it, and figured I’d play the songs there. But I wanted to use the playlist from the unit, and shuffle the songs like it does, and the songs are in multiple directories.

Now, I’m sure that I could install a program like XMMS, and it would handle all that, but installing and learning it would take time, and it would mean running a full app to get at 1% of its features. That’s not very Unix-y. So here’s what I did. It’s an example of starting with one function and then adding tools to the chain until you’re finished.

First of all, after changing directory to the player’s MUSIC directory, the playlist is in “music.m3u”. The format of this file has one comment line at the top, and then each song is on its own line, with blank lines between each song. So the first thing I needed to do was just get the actual song lines. Since all the songs have an mp3 suffix, that was easy:

cat music.m3u | grep mp3

Okay, that gives me the list, but they’re in order. (I know, useless use of cat, but I reckon it makes it clearer what I’m doing.) To shuffle the list, I reach for the random utility:

cat music.m3u | grep mp3 | random -f - 1

Now I have a shuffled list of songs, so I need to loop through them and pass them to a program that will play them. First, I usually just loop and echo the lines to make sure that works:

for i in `cat music.m3u | grep mp3 | random -f - 1`; do echo $i; done

The backquotes there pass the output of my chain to the for loop, so it can loop through them. Uh oh, that shows a problem: by default, for breaks on all whitespace, and my songs have spaces in their names, so every word is being echoed on a separate line. That won’t work, so I need to tell for to break only on newlines. I do that by setting the IFS environment variable:

IFS="
"; for i in `cat music.m3u | grep mp3 | random -f - 1`; do echo $i; done

Great, now it’s seeing one song each time through the loop and putting it in $i. So now I add my MP3 playing program:

IFS="
"; for i in `cat music.m3u | grep mp3 | random -f - 1`; do mpg321 "$i"; done

I put quotes around $i so that mpg321 will see the filename as one argument as well. But now I’ve discovered one more problem that I didn’t notice before: the filenames in the playlist use backslashes between directory and filenames, rather than the forward slashes that my system uses. So I need to insert a command to change those:

IFS="
"; for i in `cat music.m3u | grep mp3 | random -f - 1 | sed 's/\\\\/\\//g'`; do mpg321 "$i"; done

Here’s the deal with that sed command. The first four backslashes end up being seen by sed as a single literal backslash to be replaced, because the shell evaulates each pair as a single escaped backslash, then sed does the same. The second pair of backslashes are turned into a single literal backslash by the shell, and then sed uses that to escape the forward slash that follows. The result is to replace each backslash in the line with a forward slash.

Now it works, but there’s one small issue: it’s hard to kill out of it. Killing the current mpg321 process allows the loop to continue and start the next one. If I keep pressing Control-C, it just keeps killing the mpg321 processes, not the for loop itself. So let’s add a one-second sleep after each song, so I can Control-C twice: once to kill the song, then again to kill the loop while it’s sleeping:

IFS="
"; for i in `cat music.m3u | grep mp3 | random -f - 1 | sed 's/\\\\/\\//g'` ; do mpg321 "$i"; sleep 1; done

And that’s it! It took me maybe 3 minutes to hack it together; I didn’t even sit down. Now, if I were doing this for pay, or as a program I expected to use on a regular basis, there are a lot of things I’d add, and I’d probably redo it as a single program. I’d want a cleaner exit method. I’d want it to handle non-MP3 files. I’d want it to deal more gracefully with unusual filenames – this will choke on a file that actually has a backslash in the name, for instance (which is very unlikely, but possible). If others were using it, I might write my own randomizer, in case their system doesn’t have ‘random’ installed, and I’d want it to ask them what audio player to use. There would be lots of ways to nice it up.

But in this case, just wanting to get the music started so I could get back to what I was doing, this was the best solution. And that’s often the case with sysadmin work: someone says, “Can you tell me what email came in at 7:45:03 last night?” I could write a program to let the client enter a time and see a report of emails from that time – or I could just toss together a pipeline of a few commands and answer the question. You have to know when it makes more sense to build something more complete and lasting, but many times the best solution is the quickest one using the tools at hand.

FreeBSD Administration

I’ve been doing FreeBSD sysadmin work and using it on my own systems since about 1998. I like its no-nonsene, professional attitude and the simplicity and openness of its licensing. I can build the kernel and OS from source (though that’s not necessary as often as it used to be). Other skills:

  • System and security updates
  • Security auditing
  • Installing and configuring ports
  • Networks (including wireless) and firewalls
  • Installing and administering services (web, email, etc.)
  • Backup and restore
  • Scripting admin tasks
  • Data mining of log files
  • Kernel tuning (sysctl)
  • RAID (using arrays of hard drives for redundancy)

I am also the maintainer for the games/xlogical port (a port of the classic 8-bit game), and I’m looking for other opportunties to contribute to ports.

If you need a regular FreeBSD sysadmin or help with any emergency problem, please contact me at aaron.baugher@gmail.com.

Avoiding robots.txt in wget

I occasionally use the wget utility with the -m option to download a mirror of an entire website. This is very handy, but wget respects the robots.txt file, so it won’t mirror a site if robots.txt disallows it.

Obviously, you should respect the downloading restrictions of other sites, but there are times when you have a valid reason to ignore them (when it’s your site, for instance, but you don’t want to change robots.txt on a live site). In that case, here’s what you do: First run wget with the -m option. It will download the robots.txt file and then quit. Now edit the robots.txt file, change it to Allow instead of Disallow where necessary, and save it. Now change the permissions on that file to 444. Now run your wget -m command again.

On the second run, the permissions change will prevent wget from overwriting the robots.txt file with the version that disallows it, and it will go on happily mirroring the rest of the site.

Here is the sequence of commands (replace vi with your editor of choice):

wget -m http://www.mysite.com/
vi www.mysite.com/robots.txt (edit and save)
chmod 444 www.mysite.com/robots.txt
wget -m http://www.mysite.com/