The Perils of Pipes

In my last post, I mentioned this in regards to making Wallace Jr. and my hacked version of Mednafen:

It’s not insurmountable, but having programs talk back and forth to each other is always a bit tricky, especially to avoid deadlocks: A is waiting for B to say something, and B is waiting for A to say something. Especially when B was never intended to talk to A to begin with.

My prediction of running into difficulties proved to be all too accurate, even despite expecting problems to arise and being exceedingly careful to avoid them. Sadly, this hardly qualifies to win the JREF prize, any more than predicting that the sun will rise tomorrow would.

Recall what I’m trying to do here: have Wallace Jr. generate a sequence of controller inputs, have Mednafen execute them while playing a game, and have Wallace Jr. evaluate what happens in the game as a result. This means there has to be bidirectional communication: one channel from Wallace Jr. to Mednafen, and a second from Mednafen back to Wallace Jr.

In the original design, the path into Mednafen was pretty simple, since the input sequence was predetermined. Wallace Jr. generated an MCM file (a Mednafen movie file, consisting of an initial state and, well, a sequence of controller input), saved it to disk, and told Mednafen where to find it when it was launched. Mednafen, in turn, printed out status information, which Wallace Jr. read at its leisure.

For those of you not familiar with programming, allow me to elaborate on that last part a bit. Most languages provide three standard input/output streams to programs: one for input (stdin), one for normal output (stdout), and one for outputing error messages (stderr). By default, when you run a program from the command line, stdin comes from the keyboard, and stdout and stderr get printed out to the terminal window. To the program, these three streams look just like any other file; in fact, in C they’re even represented as FILE *, the same as you’d get if you called fopen to open a file.

Since I said by default the streams are connected to the keyboard and the terminal window, that obviously implies this isn’t always the case. When you create a new process, you’re free to connect its standard streams to whatever you want. That’s what Wallace Jr. did: when launching Mednafen, it attached its stdout stream to a pipe, so that whatever Mednafen writes to it, Wallace Jr. can read it.

The current version of Wallace Jr. goes one further, attaching pipes both to Mednafen’s stdout and stdin streams:

child = subprocess.Popen ([self.executable_path,
                                "-video", self.debug and "1" or "0",
                                "-sound", self.debug and "1" or "0",
                                "-nothrottle", "1",
                                "-movie", "/dev/stdin",
                                "-metrics", self.project_file,

But wait, you object, the arguments I’m passing to Mednafen still tell it to read its movie from a file — particularly, a file named /dev/stdin. What gives? /dev/stdin is just a symlink to /proc/self/fd/0, which is in turn a symlink to whatever the current process’s stdin stream is. So really, giving Medafen a file name of /dev/stdin is just telling it to read from its stdin.

(If that sounds like some kind of voodoo to you, keep in mind that on Unix-based systems, everything is a file. Files in the conventional sense of “a bunch of bytes with a name and stored on a disk” is just one type of file — files can also be pipes or devices or almost anything else. Everything in /proc is some type of information about the processes running on the system, exposed as a set of files. The underlying data is stored not on disk, but in the kernel‘s internal data structures.)

Anyway, the ultimate goal in sending Mednafen controller inputs via a pipe instead of a file is so that, in the future, Wallace Jr. will be able to generate programs that decide the next controller input based on the current state of the game. To do that, obviously, it needs to see the game state at time t before the controller input at time t+1 can be sent. Writing all the controller input to a file ahead of time is right out.

If everything’s working, what should happen is that Wallace Jr. does a little processing, sends controller input to Mednafen, waits for Mednafen to respond with the game state, and repeats. Meanwhile, Mednafen waits for Wallace Jr. to send it controller input, emulates the next frame of the game, sends the updated game state to Wallace Jr., and repeats. If these ever get out of sync — namely, if both Mednafen and Wallace Jr. are waiting for the other to send something, you hit deadlock and nothing happens.

It’s easy to get wrong. Here’s two examples of how things went wrong.

As a proof-of-concept, I initially tried having Wallace Jr. send everything at once through the pipe. Deadlock. However, I found that if I closed Wallace Jr.’s side of the pipe going to Mednafen, it worked! That was weird, since I was being careful to flush the pipe after writing, to make sure the data was actually getting sent instead of sitting around in a buffer.

After banging my head against it for a while, I ultimately figured out that gzip was the culprit. Normally, Mednafen movie files, which is ultimately what’s being sent to it via the pipe, are compressed with gzip, and I initially had Wallace Jr. do so as well. Apparently, however, the code on the Mednafen side of things that handles decompression was waiting to reach the end-of-file before actually decompressing the data. That’s why Mednafen did nothing until Wallace Jr. closed its side of the pipe, which would cause Mednafen to finally detect end-of-file.

Naturally, closing the pipe isn’t an option when sending controller input one frame at a time, since there’s no way to re-open a pipe once it’s closed. Luckily, however, Mednafen is happy with getting uncompressed movie data sent to it; taking gzip out of the equation entirely fixed the problem.

The second deadlock I ran into took even longer to diagnose. Here’s a simplified version of the code that triggered it. What it’s supposed to do is generate the next input, send it to Mednafen’s stdin, and then read from Mednafen’s stdout until it sees a line that starts with “metrics”. When it does, it processes it and decides whether to loop or not. What it actually does is deadlock immediately when it tries to read from Mednafen. Do you see why?

while should_continue:
    controller_input = compute_next_input ()
    child.stdin.write (controller_input)
    child.stdin.flush ()
    should_continue = False
    for line in child.stdout:
        if line.startswith ("metrics"):
            should_continue = process_metrics (line)

Your first guess is that Mednafen isn’t actually writing any lines that start with “metrics”. That wasn’t the problem. Your second guess is that Mednafen isn’t flushing its output buffer, so the line isn’t getting sent through the pipe. Wrong again.

Give up? I’m not surprised — it’s very non-obvious.

The “for line in file” construct in Python iterates through the contents of a file, one line at a time. Internally, this happens by Python invoking each time through the loop to get the next line. Here’s what the Python manual says about the next() method for file objects (emphasis added):

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

A hidden read-ahead buffer! next() is trying to be clever and trying to read more than just the next line, so it can pad out its buffer. However, Mednafen stops and waits for more input from Wallace Jr. after outputting a line, so next() waits for input that isn’t going to come until Wallace Jr. gets back to work. But that won’t happen until next() returns, which it won’t until it sees more input from Mednafen, even though it already has the line Wallace Jr. is asking for! Deadlock.

The fix is simple enough: use readline() to get the next line of the file instead of the nice syntactic sugar of the for loop, in order to avoid next()‘s read-ahead buffer:

while should_continue:
    controller_input = compute_next_input ()
    child.stdin.write (controller_input)
    child.stdin.flush ()
    should_continue = False
    line = child.stdout.readline ()
    while line != "":
        if line.startswith ("metrics"):
            should_continue = process_metrics (line)
        line = child.stdout.readline ()

I would actually call next()‘s behavior here a bug. It’s perfectly reasonable for it to read past the end-of-line and buffer the excess, if there’s data after the newline. What’s happening internally is that the data next() got from the pipe ended with a newline, and it’s going ahead and trying to read from the pipe again just for the sake of filling its buffer. This actually decreases efficiency, since if next() isn’t going to be called again, it’s doing a read unnecessarily, at the cost of another system call. And if next() is going to be called again immediately, well, waiting to do the read until then doesn’t cost anything.

Of course, you wouldn’t notice the difference in practice, unless there’s no more data after the newline, but the file/pipe/whatever isn’t closed yet either. Which is exactly what happens in Wallace Jr.’s case.

The moral is: even if you know how tricky bidirectional interprocess communication is, it’s still trickier than you expect.

Cop versus Bus (D- edition)

[Disclaimer: Despite what you may have assumed from the title, this post does not discuss the obscure rock, paper, scissors variant “cop, bus, driver” (where driver controls bus, bus runs over cop, cop shoots driver). We apologize for any confusion.]

It has been remarked that the great thing about standards is that there’s so many to choose from. While working on adding support for Amarok to the next version of Music Applet, I ran into this first-hand.

GNOME and KDE, the two main desktop environments for Linux, each have a preferred IPC mechanism to provide a relatively easy way for programs to talk to one another. GNOME uses D-Bus, and several of the players currently supported by Music Applet use it. KDE, however, uses DCOP, and since Amarok is a KDE-based application, it provides a DCOP interface.

D-Bus and DCOP are, at a high level at least, pretty similar: they’re based on making method calls on objects exported by applications. They’re mutually incompatible with each other, of course, but they’re targeting the same niche. So, since Music Applet 2.3.0 will be using both of them to some extent, how do they shape up against each other?

I’m not going to go into the technical merits of each, or any of their implementation details. I’m only looking from the perspective of someone trying to write some Python code that interacts with other programs via D-Bus or DCOP.

Both D-Bus and DCOP provide a command-line interface for issuing individual method calls. While you wouldn’t normally use this interface in a program (unless you’re writing a shell script, of course), it is handy for quick testing, making sure the method you’re looking at does what you think it does.

Let’s say we want to ask our music player what song it’s currently playing. For Rhythmbox, which uses D-Bus, we need to do this:

$ dbus-send --print-reply --dest=org.gnome.Rhythmbox /org/gnome/Rhythmbox/Player  \
method return sender=:1.13 -> dest=:1.23 reply_serial=2
   string "file:///home/paul/music/Jonathon%20Coulton/Chiron%20Beta%20Prime.mp3"

You’ll note that I had to break the command across two lines to get it to fit in this column, and a lot of that command looks pretty redundant. Yes, yes, it avoids namespace pollution in case an application or an object supports multiple interfaces defined by different parties, but it’s a pain to type out. Also note that I had to include --print-reply so that the command would actually say what got returned. Also also, not pictured here, if you’re passing arguments to the method you’re calling, you have to explicitly say what the type of each argument is. Ick.

Now let’s see what the equivalent use of the DCOP command-line tool for calling the equivalent method on Amarok is:

$ dcop amarok player encodedURL

Even though it’s saying the same sort of thing (listing an application, an object owned by that application, and a method on that object), this is much easier to type. And even better, if we leave out some of the arguments, the command will list what’s available. For example, if we instead did:

$ dcop amarok player

It would list all the methods available on the player object exported by amarok. Pretty nifty. Sure, there’s a D-Bus way to do that, but you need to remember what method to call on the org.freedesktop.whatever interface of the object — or is it a separate object that handles introspection? I’d have to look up the D-Bus documentation. But the DCOP command-line tool doesn’t make you do that.

When we turn to the Python interface, things improve a bit for D-Bus, even though we’re still stuck with those very long fully qualified names for everything. Here’s a quick Python program that does the same thing as the command-line example:

import dbus
bus = dbus.SessionBus ()
player_proxy = bus.get_object ("org.gnome.Rhythmbox", "/org/gnome/Rhythmbox/Player")
player = dbus.Interface (player_proxy, "org.gnome.Rhythmbox.Player")
url = player.getPlayingUri ()
print url

Of course, if we’re going to make multiple calls to the object, we only have to set things up once, and then reuse the object as many times as we want.

The DCOP equivalent once again looks simpler (at least for now):

import pcop
import pydcop
# Magic goes here, to be discussed later...
amarok = pydcop.anyAppCalled ("amarok")
url = amarok.player.encodedURL ()
print url

While this might seem like another win for DCOP, things don’t look so good once we move away from these toy examples. For starters, the Python DCOP bindings don’t provide any support for asynchronous calls. Instead, all calls are synchronous: once we make the call, the program sits there and does nothing until it receives a response. If that other program you’re talking to takes a long time to respond, that spells trouble for our program’s responsiveness to the user, especially in a GUI, where now our program won’t even be able to redraw itself.

The Python D-Bus bindings do provide a way to do asynchronous calls, by passing a function that should be invoked once a response comes back from the other program. In the meantime, our program can go on to do other things. This also lets our program issue multiple IPC calls all at once more efficiently: in the ideal case, the other program can send its responses to all our calls during a single timeslice. If we’re stuck with synchronous calls, each one gets serviced in separate timeslices, which will end up taking longer.

Technically, this lack of asynchronous calls isn’t a fault with DCOP per se, but rather with its Python bindings. The underlying C++ library supports asynchronous calls. Adding support for this for Python programs is listed in the TODO file.

However, there’s a more obnoxious problem with DCOP when compared to D-Bus. Although D-Bus is used by GNOME, it’s not actually part of GNOME; it’s a project with minimal dependencies. So, if D-Bus isn’t already running (which in GNOME is pretty unlikely anyway), there isn’t that much the library has to do to start it up.

DCOP, on the other hand, is a part of KDE itself, and to assure that it’s running, it’s necessary to initialize at least the base set of KDE services. Remember that code in the Python DCOP example I didn’t show you before? Here’s what it is:

import kdecore
import sys
app = kdecore.KApplication (sys.argv, "program name")

That’s right, our program has to initialize a KDE application object. It doesn’t look like much code-wise, but that initialization will go ahead and start up DCOP and a handful of other KDE services before it returns, if they’re not already running. And if you’re running a GNOME-based desktop — and if you’re running Music Applet, you certainly are — you probably don’t have all the KDE services running yet.

On my laptop at least, this initialization takes several seconds. As a result, if you have Music Applet’s Amarok plugin enabled, this adds several seconds to start-up time before the applet appears in the panel.

To avoid this delay, it’ll be necessary to create a thread in the Amarok plugin just to do that initialization. Ick. Note that deferring initializing the plugin until the applet appears in the panel doesn’t really help, since the applet would then just become unresponsive while that initialization is taking place. And no, launching dcopserver via a call to os.system instead of initializing KDE doesn’t quite work. I mean, it does work just for DCOP, but when a proper KDE application starts, it will get confused about how only some of the KDE services are running and will start spitting out error messages and will behave a bit strangely.

Even worse, if the Python DCOP bindings try to connect to the DCOP server and fail, they will never try again. So you have to make sure the DCOP server is running before you try using it at all.

So, while DCOP does have simplicity going for it, D-Bus ends up providing a more robust Python interface and lower initialization overhead, which are both big plusses when you want your program to be small and responsive.

Fun fact: KDE4 will be ditching DCOP in favor of D-Bus. Unfortunately, I expect this will require having both DCOP and D-Bus in Music Applet’s Amarok plugin, so that it works with Amarok-for-KDE3 and Amarok-for-KDE4 seamlessly. Sort of like how, back in Music Applet’s C days, there was Rhythmbox-via-D-Bus and Rhythmbox-via-Bonobo. Thankfully no Music Applet users seem to care about compatibility with semi-ancient Rhythmbox pre-0.9 compatibility these days.

Quick PyGTK Tip

If you ever find yourself writing a custom widget using PyGTK, and can’t figure out why none of your do_something methods are ever getting called, make sure you have this line of code after your class’s definition:

gobject.type_register (YourCustomWidgetClass)

Believe me, remembering this will save you at least an hour of very confused debugging. PyGTK will never complain that you forgot to do this. Instead, you’ll just end up with a widget that doesn’t actually do anything.

Comments Off