The Perils of Pipes

In my last post, I mentioned this in regards to making Wallace Jr. and my hacked version of Mednafen:

It’s not insurmountable, but having programs talk back and forth to each other is always a bit tricky, especially to avoid deadlocks: A is waiting for B to say something, and B is waiting for A to say something. Especially when B was never intended to talk to A to begin with.

My prediction of running into difficulties proved to be all too accurate, even despite expecting problems to arise and being exceedingly careful to avoid them. Sadly, this hardly qualifies to win the JREF prize, any more than predicting that the sun will rise tomorrow would.

Recall what I’m trying to do here: have Wallace Jr. generate a sequence of controller inputs, have Mednafen execute them while playing a game, and have Wallace Jr. evaluate what happens in the game as a result. This means there has to be bidirectional communication: one channel from Wallace Jr. to Mednafen, and a second from Mednafen back to Wallace Jr.

In the original design, the path into Mednafen was pretty simple, since the input sequence was predetermined. Wallace Jr. generated an MCM file (a Mednafen movie file, consisting of an initial state and, well, a sequence of controller input), saved it to disk, and told Mednafen where to find it when it was launched. Mednafen, in turn, printed out status information, which Wallace Jr. read at its leisure.

For those of you not familiar with programming, allow me to elaborate on that last part a bit. Most languages provide three standard input/output streams to programs: one for input (stdin), one for normal output (stdout), and one for outputing error messages (stderr). By default, when you run a program from the command line, stdin comes from the keyboard, and stdout and stderr get printed out to the terminal window. To the program, these three streams look just like any other file; in fact, in C they’re even represented as FILE *, the same as you’d get if you called fopen to open a file.

Since I said by default the streams are connected to the keyboard and the terminal window, that obviously implies this isn’t always the case. When you create a new process, you’re free to connect its standard streams to whatever you want. That’s what Wallace Jr. did: when launching Mednafen, it attached its stdout stream to a pipe, so that whatever Mednafen writes to it, Wallace Jr. can read it.

The current version of Wallace Jr. goes one further, attaching pipes both to Mednafen’s stdout and stdin streams:

child = subprocess.Popen ([self.executable_path,
                                "-video", self.debug and "1" or "0",
                                "-sound", self.debug and "1" or "0",
                                "-nothrottle", "1",
                                "-movie", "/dev/stdin",
                                "-metrics", self.project_file,
                                self.rom_file],
                            stdin=subprocess.PIPE,
                            stdout=subprocess.PIPE,
                            close_fds=1)

But wait, you object, the arguments I’m passing to Mednafen still tell it to read its movie from a file — particularly, a file named /dev/stdin. What gives? /dev/stdin is just a symlink to /proc/self/fd/0, which is in turn a symlink to whatever the current process’s stdin stream is. So really, giving Medafen a file name of /dev/stdin is just telling it to read from its stdin.

(If that sounds like some kind of voodoo to you, keep in mind that on Unix-based systems, everything is a file. Files in the conventional sense of “a bunch of bytes with a name and stored on a disk” is just one type of file — files can also be pipes or devices or almost anything else. Everything in /proc is some type of information about the processes running on the system, exposed as a set of files. The underlying data is stored not on disk, but in the kernel‘s internal data structures.)

Anyway, the ultimate goal in sending Mednafen controller inputs via a pipe instead of a file is so that, in the future, Wallace Jr. will be able to generate programs that decide the next controller input based on the current state of the game. To do that, obviously, it needs to see the game state at time t before the controller input at time t+1 can be sent. Writing all the controller input to a file ahead of time is right out.

If everything’s working, what should happen is that Wallace Jr. does a little processing, sends controller input to Mednafen, waits for Mednafen to respond with the game state, and repeats. Meanwhile, Mednafen waits for Wallace Jr. to send it controller input, emulates the next frame of the game, sends the updated game state to Wallace Jr., and repeats. If these ever get out of sync — namely, if both Mednafen and Wallace Jr. are waiting for the other to send something, you hit deadlock and nothing happens.

It’s easy to get wrong. Here’s two examples of how things went wrong.

As a proof-of-concept, I initially tried having Wallace Jr. send everything at once through the pipe. Deadlock. However, I found that if I closed Wallace Jr.’s side of the pipe going to Mednafen, it worked! That was weird, since I was being careful to flush the pipe after writing, to make sure the data was actually getting sent instead of sitting around in a buffer.

After banging my head against it for a while, I ultimately figured out that gzip was the culprit. Normally, Mednafen movie files, which is ultimately what’s being sent to it via the pipe, are compressed with gzip, and I initially had Wallace Jr. do so as well. Apparently, however, the code on the Mednafen side of things that handles decompression was waiting to reach the end-of-file before actually decompressing the data. That’s why Mednafen did nothing until Wallace Jr. closed its side of the pipe, which would cause Mednafen to finally detect end-of-file.

Naturally, closing the pipe isn’t an option when sending controller input one frame at a time, since there’s no way to re-open a pipe once it’s closed. Luckily, however, Mednafen is happy with getting uncompressed movie data sent to it; taking gzip out of the equation entirely fixed the problem.

The second deadlock I ran into took even longer to diagnose. Here’s a simplified version of the code that triggered it. What it’s supposed to do is generate the next input, send it to Mednafen’s stdin, and then read from Mednafen’s stdout until it sees a line that starts with “metrics”. When it does, it processes it and decides whether to loop or not. What it actually does is deadlock immediately when it tries to read from Mednafen. Do you see why?

while should_continue:
    controller_input = compute_next_input ()
 
    child.stdin.write (controller_input)
    child.stdin.flush ()
 
    should_continue = False
    for line in child.stdout:
        if line.startswith ("metrics"):
            should_continue = process_metrics (line)
            break

Your first guess is that Mednafen isn’t actually writing any lines that start with “metrics”. That wasn’t the problem. Your second guess is that Mednafen isn’t flushing its output buffer, so the line isn’t getting sent through the pipe. Wrong again.

Give up? I’m not surprised — it’s very non-obvious.

The “for line in file” construct in Python iterates through the contents of a file, one line at a time. Internally, this happens by Python invoking file.next() each time through the loop to get the next line. Here’s what the Python manual says about the next() method for file objects (emphasis added):

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

A hidden read-ahead buffer! next() is trying to be clever and trying to read more than just the next line, so it can pad out its buffer. However, Mednafen stops and waits for more input from Wallace Jr. after outputting a line, so next() waits for input that isn’t going to come until Wallace Jr. gets back to work. But that won’t happen until next() returns, which it won’t until it sees more input from Mednafen, even though it already has the line Wallace Jr. is asking for! Deadlock.

The fix is simple enough: use readline() to get the next line of the file instead of the nice syntactic sugar of the for loop, in order to avoid next()‘s read-ahead buffer:

while should_continue:
    controller_input = compute_next_input ()
 
    child.stdin.write (controller_input)
    child.stdin.flush ()
 
    should_continue = False
    line = child.stdout.readline ()
    while line != "":
        if line.startswith ("metrics"):
            should_continue = process_metrics (line)
            break
        line = child.stdout.readline ()

I would actually call next()‘s behavior here a bug. It’s perfectly reasonable for it to read past the end-of-line and buffer the excess, if there’s data after the newline. What’s happening internally is that the data next() got from the pipe ended with a newline, and it’s going ahead and trying to read from the pipe again just for the sake of filling its buffer. This actually decreases efficiency, since if next() isn’t going to be called again, it’s doing a read unnecessarily, at the cost of another system call. And if next() is going to be called again immediately, well, waiting to do the read until then doesn’t cost anything.

Of course, you wouldn’t notice the difference in practice, unless there’s no more data after the newline, but the file/pipe/whatever isn’t closed yet either. Which is exactly what happens in Wallace Jr.’s case.

The moral is: even if you know how tricky bidirectional interprocess communication is, it’s still trickier than you expect.