Wallace: Behind the Scenes

Out of curiousity, how interested are people in reading about the technical details behind the Wallace rewrite? I could write up a couple posts about some of the technical challenges encountered and/or achievements made since the last update, but there’s not all that much in the way of stuff that can be shown off.

For example, does anyone want to hear about how user input is collected? Or maintaining audio/video synchronization and minimizing audio lag? Or performing caps negotiation with downstream elements in the processing pipeline? (I’ve certainly learned quite a bit on that last one over the past couple days.) There’s interesting stuff there for anyone who wants to know more about writing GStreamer elements or who is just curious about what kind of processing is needed to make Wallace work.

I’ll probably write about some of this stuff anyway, if only to have it out there and Googleable for anyone that’s interested. But if there’s not a whole lot of interest in that sort of thing, that’ll set the bar higher for deciding what’s blogworthy about Wallace development, especially given that there’s a non-trivial investment in time preparing a post about something like that.

GStreamer audio sinks are picky

If you ever get the urge to write your own GStreamer element, and one of the things your element will output is an audio stream, I have a word of advice that might save you a fair amount of frustration.

All of the typical sink elements that output an audio stream to a sound card (such as alsasink, esdsink, osssink, or even gconfaudiosink, which after all is probably just going to wrap one of the first three) require that your buffer offsets be measured according to the stream time (GST_FORMAT_TIME), and not any other way. It doesn’t matter if the caps you negotiated for your source pad clearly specified the sample size and rate, or even that you’re setting the timestamp for each of your buffers. If your offsets don’t use GST_FORMAT_TIME, it won’t work.

Specifically, you’ll wind up with this critical error message spit out by one of the base classes most of the audio sinks shipped with GStreamer inherit from:

GStreamer-CRITICAL **: gst_segment_clip: assertion `segment->format == format' failed

And that’s because that base class hard-codes GST_FORMAT_TIME as the segment format in its call to gst_segment_clip.

Note that the error message you get doesn’t clearly say that this is the reason it fails, unless you happen to install the debugging symbols for all the GStreamer libraries and plugins, and then go into a debugger to see what’s calling that function (after remembering to set the G_DEBUG environment variable to fatal_criticals so you get a core dump at the site of the problem), and then look up that part of the GStreamer source code to find out about the hard-coded use of GST_FORMAT_TIME.

Now, for those of you who notice what category this was posted under, yes, this means I’ve been tinkering with Wallace again. In particular, I’m completely rewriting how it uses GStreamer to output audio and video streams from the emulator into something useful (i.e., either playing the streams or encoding them to a video file). I’ve come to the conclusion that my old methodology was fundamentally completely wrong, and it’s a wonder I was able to make it work at all, kind of sort of.

This time, I’m having GStreamer drive execution of the emulator itself, instead of handling that on my own and trying to shovel its output into the pipeline, which makes scheduling and timing really really brittle. Now, each time GStreamer thinks the emulator should produce some more output, it will call (through the nesemulator element I’m writing) the emulator more or less directly. This ends up simplifying a good chunk of code, since GStreamer is (naturally) pretty good at figuring out when it needs more output data. At least, GStreamer is going to do a much better job than my abusing g_idle_add and nanosleep.

Also, instead of treating nesemulator as a source element, it will act as a decoder element. So what input stream does it take? Why, a sequence of button inputs, of course! GStreamer’s fakesrc can be coaxed into providing a null input for testing nesemulator quite nicely. Of course, I’ll need to write a new element to poll user input and convert that into the right button presses, but I was going to have to do that in one way or another anyway. Even better, this approach provides a cleaner way to implement playback of FCEU movie files: write another element that decodes a .fcm file into the corresponding input sequence, and stream that into the nesemulator element.

In fact, this approach can also be used to write an NSF decoder element, by wrapping the nesemulator element, discarding its video output, and controlling selection of which song to play. If you were ever longing to listen to 8-bit music in Rhythmbox, that’d let you do it.

For the moment, though, I just have a rough but functional implementation of nesemulator, which lets you do simple stuff like this:

gst-launch-0.10 fakesrc sizetype=2 sizemin=4 sizemax=4 filltype=2 ! nesemulator name=emu location=Mega\ Man\ 2\ \(U\).nes emu.video_src ! queue ! ffmpegcolorspace ! xvimagesink emu.audio_src ! queue ! audioconvert ! alsasink

You know, in case you get the urge to watch the intro to one of the greatest video games of all time.

What happened to Wallace anyway?

So it’s been several months since the last Wallace update. What’s been going on since then. Well, not much.

It may surprise you, but the main reason I sort of wandered away from it has been frustration with getting GStreamer to do what I wanted it to. I wanted to use GStreamer for the audio/video output since then I effectively get video encoding for free: it’s just a matter of building a pipeline that sends the output to a video file instead of the screen and speakers.

However, my skills at writing GStreamer elements are wanting at best, and the fact that the design I was going for — a real-time source that has separate audio and video sources — was a bit more advanced than what’s covered in the beginners’ documentation. Everything suddenly got to a point where sending the A/V to the speakers and screen would either hang the program or cause a very noticable amount of desync. I think I also managed to track part of it to a bug in GStreamer at the time, which presumably has since been fixed, but I never really went back to the code after throwing my hands up.

On the actual what-the-program-is-supposed-to-be-doing front, I’ve come to the conclusion that to do things “right” would require having some degree of debugging/disassembly features in Wallace, to allow you to pick apart the game and figure out what’s going on inside it. There’s some things — in particular, determining the player’s position within a level — that just can’t be done without getting at whatever internal state the game uses to keep track of it.

Of course, doing that would entail treating the emulator code (which I took from FCE Ultra) less like a black box. While the emulation itself is pretty solid, the source code to it, while not quite being WTF-worthy, is hardly the easiest thing to understand, especially if you aren’t intimitely familiar with the NES’s internals.

Reverse engineering a game is certainly possible. After all, ROM hacks aren’t hard to find — for example, look one post down — and can certainly require a good understanding of the game’s code and data to pull off. It’s just that that’s more effort that would be required to have some degree of reverse engineering support in Wallace.

And looking farther out, assuming everything else gets worked out, it all comes down to how much processing power you can throw at “growing” your “solution”. And unless someone suddenly donates me a 500 YHz processor, that’ll involve distributed computing, which to implement well would require (among other things) splitting the emulator core and the genetic algorithm into separate libraries, to minimize the dependencies on the distributed client code. Which wouldn’t be all that difficult in principle, as FCE Ultra already had a quasi-library-like interface, but it was geared towards code that got directly linked in, and the function calls went both ways. So, it’s just one more thing to clean up.

Plus, having gotten distracted by a shiny new thing to work on is only going to delay things further. I’d at least like to get its AI to the point where it can’t be beaten by whatever’s been growing in my old college dorm refridgerator for the past five years or so before running off to something else.

So, to answer your question Ryan, in theory, yes, Wallace could play it. In theory. In theory, communism works. In theory.

Comments Off

Wallace 0.99.1 Released

I finally got around to tinkering with Wallace, the tool for using genetic algorithms to play Nintendo games, again. The latest version of the software has several new features, a couple of which I’ll mention here.

The biggest is that your solutions can now take the form of a button input sequence, instead of being a program that runs in a VM and generates button input. This has the advantage of producing interesting results a lot faster, at the cost of your solutions being tied to your initial conditions. So, while you might be able to generate something that plays a particular Dr. Mario screen very well, it will crash and burn on any different starting condition — it’s just a dumb sequence of moves, after all.

There are many other, more subtle, improvements to the evolver as well. The selection phase now uses tournament selection to choose who lives on to the next generation, instead of just taking the top n solutions so far. This lets you vary selection pressure from brutal (top n) to nonexistent (creationists’ “evolution is entirely random” strawman) and anywhere in between. You can also now set the rate of mutation to a much wider range, in particular setting it to rates far lower than the (ridiculously high) minimum available in the previous version.

Also, the evolver prompts you to save your progress before it closes. I can’t tell you how many times the lack of that feature has caused me to lose a lot of data.

So, how do things shape up now? Here’s the statistics from a relatively quick sample run, on the same Dr. Mario screen as last time. Now, however, I’m using button input sequences with Select and Start filtered out (you can do that now too), with a population ranging from 200 to 2000 solutions, with fairly mild selection pressure (15%ish?), with a maximum mutation rate of 20 mutations per organism. The fitness function now measures final score, with penalties for the time intervals between each time you earn points. So, if you score 300 points at time 5 and 900 points at time 15, your fitness is 300 - 5 + 900 - (15 - 5) = 1185. (The ability to express that kind of fitness function is another new feature.)

So, here’s the stats (did I mention collecting these statistics is another new feature?):

Generation Maximum Minimum Mean Median
1 2369 -1 39 0
2 2369 -1 348 287
3 2969 -1 985 893
4 3272 -1 1405 1781
5 4170 -1 1518 2369
6 4181 -1 1947 2370

A few observations

  • A 100% randomly generated solution still managed to score 2400 points in the first generation.
  • By the sixth generation, at least half the solutions are better than the best solution in the first generation.
  • Each generation, the mean and median fitness increases.
  • It should be impossible for the fitness to be -1 with the fitness function I defined. There’s a bug somewhere.
  • Even so, the minimum shows no improvement from generation to generation; mutations can still render a good solution bad.

Of course, you want to see that top solution in the sixth generation, don’t you? Well, unlike the VM-based ones from last time, which ended up doing the same thing over and over, these solutions look like they’re being played by a monkey on crack who occasionally makes a really good move. Of course, since my lease doesn’t cover monkeys and none of my crack dealers are returning my calls, the monkey-on-crack theory is only a hypothesis.

Anyway, here’s the video of the top solution of generation six:

If you’re feeling adventurous, you can try your hand at Wallace yourself:

Download Wallace 0.99.1 (717 KB)

About Smegging Time (or, Wallace 0.99.0 is out!)

I’ll let the following link speak for itself:

Download Wallace 0.99.0 (source tarball, 713 KB)

OK, maybe I can say a little more.

I’ve got Wallace, everyone’s favorite NES emulator for Linux with a genetic programming tool bolted on, worked into a sufficiently functional state to let the outside world play with it. Thet there link will get you a copy of the source code, which you can do the usual “./configure, make, make install” thing on to get a working copy of it.

A couple caveats to keep in mind before diving in, though. First, consider this a tech demo, or pre-alpha software at best, not a polished, finished product. It’ll probably work OK if you don’t poke or prod it too hard, though if you want to crash it or send it into undefined behavior I can think of three ways off the top of my head to do so. Also, I make no guarantees that this release will be compatible with future ones, so don’t come crying to me if the evolver you’ve been running for weeks stops working when 0.99.1 (or whatever) comes out.

And of course, I’ve only tested this on my laptop running Debian unstable. Getting it to run on some other Linux system shouldn’t be too hard. With work you might even be able to get it working on some other platform GNOME has been ported to, but I wouldn’t bet too much on it.

Now that all that’s out of the way, what’s new since the last status update? Plenty! The VM that the candidate solutions run on is now register-based (loosely based on the MIPS architecture), so solutions only crash inside the VM if they try to divide by zero nowadays. You can also save your progress in the evolver tool, it executes as quickly as possible (instead of at normal emulation speed), and you can watch and encode candidate solutions at your leisure.

So what does it take to get a little genetic programming action started? Let’s say you wanted to generate something to play Dr. Mario (since score is much more closely correlated with progress there than in Super Mario Bros.). First, you tell the Evolver what ROM image to use and what the initial state will be — here, we pick a state slot that begins on Level 10 on High speed:

Game tab on the Evolver

Next, we define our fitness function by specifying a set of metrics to measure and how to modify them. Here, our fitness will be the score plus the number of seconds before losing. Note the number of conditions we look for for when to stop evaluating a solution: the virus count dropping to zero (win!), the Game Over screen (lose!), the Pause screen (programs don’t need breaks!), and the Main Menu screen (ever try hitting A+B+Select+Start?):

Fitness tab on the Evolver

Next, we specify how we’ll be breeding our solutions. The Evolver has several knobs you can play with to control population size and how to generate new solutions from old ones. Lots more are certainly possible. For example, it’d be nice to control how important the fitness score is to surviving to the next generation — currently, the top n solutions make it and the rest are culled, instead of doing some random-by-weight sort of selection. Here’s the somewhat arbitrary settings I chose:

Breeding tab on the Evolver

Now all that’s left to do is switch over to the last tab, hit that nice green Run button, and watch the magic happen.

Population tab on the Evolver

So, how well does this work in practice? I haven’t done much intensive experimentation, but early results are encouraging. In the first generation (which are all purely randomly generated), most of the solutions either crash before getting very far or don’t do anything, letting the pills just stack up in the middle until the reach the top. A couple of solutions do exhibit slightly more interesting behavior, such as this one, which rotates like mad and shoves pills to the left as fast as it can:

Solution from Generation 1

The top performer from Generation 1, with a fitness of 16. (660 KB, 16 seconds)

Over the first few generations, the solutions that crash quickly die off, and the ones that just let the pills fall in the middle flourish. Then the ones that shove the pills to one side or another, buying a few more precious seconds and thus increased fitness, dominate. And then, in Generation 6, an exciting development: the first solution to kill a virus!

Solution from Generation 6

The top performer from Generation 6, with a fitness of 309. (383 KB, 9 seconds)

To be fair, that screenshot makes the solution look like it’s playing far more shrewdly than it really is. The video tells the whole story: it’s just shoving pills to the right, but does it at just the right rate to line up the red sections of the first three pills with the red virus.

And if you think that’s neat, check out what happens in Generation 7:

Solution from Generation 7

The top performer from Generation 7, with a fitness of 916. (700 KB, 16 seconds)

That solution pushes pills to the right and rotates them at just the right speed to knock out two virus with the first three pills. It is interesting how the end result after three pills is the same as what I’d do if I were behind the controller. I can’t help but wonder how an Intelligent Design proponent (that is to say, a creationist) would distinguish whether I or a seventh-generation evolved computer program were playing, given that screenshot.

Anyway, the next couple generations don’t turn up anything with a higher fitness than that, and that’s the point where I stopped it so I could work on pushing out this release. Nevertheless, Generation 9 did produce the longest-lived solution so far, clocking in at almost double the time any of its competitors does, and still managing to kill a virus along the way.

Solution from Generation 9

The longest-lived performer from Generation 9, with a fitness of 329. (1.2 MB, 29 seconds)

Not too shabby for a couple hours of evolution on a 1.7 GHz Pentium M, eh?

After all that I bet you’re dying to try this out for yourself, so I’ll reiterate:

Download Wallace 0.99.0 (source tarball, 713 KB)

Attack of the Underflow

In principle, the last major piece of Wallace has landed: randomly generating solutions, evaluating them, throwing away bad ones, and generating new solutions through asexual (cloning and mutating) and sexual (using crossover) reproduction.

However, there’s two small problems. First, the stack-based VM is exceptionally prone to crashing. Second, the stack-based VM is exceptionally prone to crashing. Now I know technically speaking that’s only one problem, but I thought it was such a big one, it was worth mentioning twice.[0]

The implementation of the VM, of course, is fine. (At least, there aren’t any known bugs in it.) Rather, the programs that are running within the VM have a tendancy to perform an illegal operation and crash. There are four ways in which a program can crash inside the VM, most common first:

  • Stack underflow: It tries to pop a value off the stack, but the stack is empty.
  • Reserved opcode: It tries to execute an opcode that has no operation associated with it.
  • Stack overflow: It tries to push a new value onto the stack, but the stack is already full.
  • Divide by zero: Because you can’t.

Reserved opcodes could be eliminated by assigning them to something, but the stack underflows are quite problematic. Out of 89 operations the VM supports, only 6 of those don’t try to pop anything off the stack. The vast majority of operations not only need to pop their operands, but they also leave the stack with fewer values on it than it originally had. So naturally, when you’re randomly generating programs to run in the VM’s instruction set, most of the time they’ll trigger a stack underflow and crash.

In fact, the vast majority of programs crash on their very first instruction. Furthermore, in a trial using a population size between 100 and 500 candidate solutions, measuring fitness solely by how many instructions got executed before a crash, even after 50 generations the typical candidate solution still only managed to successfully execute 2 instructions before crashing.

Granted, that’s up from a median of 0, but it’s still not very encouraging. And to keep this in perspective, you’ll obviously want these programs executing more than one instruction per emulated frame, and there’s slightly over 60 such frames per second. Programs that die after only a handful of instructions aren’t going to get anywhere.

(As a side note, however, one candidate solution — one of the initial randomly-generated ones — actually managed to hit the 6000-instruction timeout I arbitrarily set! I wasn’t able to peek into it and see what it was doing, but I strongly suspect it managed to jump into the “uninitialized” portion of memory, which would be filled with 0 bytes, all of which map to nop. Whatever it did, that “gene” never got passed on to anyone else in the population.)

I’m forced to conclude that, despite the stack-based VM being attractive from an implementing-the-VM perspective, it’s quite unsuitable when throwing random code at it. A register-based VM would be infinitely more robust, since the registers the instructions operate on will always be there; just the values stored inside them would change. Alas, this change involves replacing just about everything in the VM except for the mutation engine, which is a large chunk of tedious (but relatively straightforward) coding. But unfortunately, it must be done.

Footnotes

[0] To steal a line from my laptop’s namesake.

LIFO the Party

Time for your irregularly scheduled Wallace update.

As you’ll recall, there are two key things you need for genetic programming (or, more generally, any evolution-like process): chunks of information that gets replicated imperfectly, and a metric for evaluating each chunk’s fitness; the less-fit chunks are discarded, and the more-fit chunks go through the process again.

Well, we now have a data representation for our “solutions” to the “playing Super Mario Bros.” problem. While a simple sequence of button presses could’ve worked, it feels like we’re working with recordings rather than organisms. (Actually, we would be; an FCEU movie file is little more than a compressed sequence of button presses.) So if we want better verisimilitude with biological evolution, we want our candidate solutions to be more “alive” than that.

So, why not make our solutions computer programs themselves?

Before going into too great of detail how I’ve gone about doing this, here’s an example of a very simple program playing Super Mario Bros., and doing surprisingly well considering it was my first attempt. (There’s no genetic programming going on here; the program playing in the video below was hand-coded.)

Mario v. a quick-and-dirty program to try to play it

Ogg Theora video, 938 KB, 37 seconds. Ogg codecs for Windows might be found here.

Over the weekend I’ve implemented a quick-and-dirty virtual machine in which you can write programs that try to play Nintendo games. The VM provides some key services:

  • A simple instruction set similar to machine code to build the programs out of.
  • A means for sending button presses to the emulator.
  • A means for reading the game’s memory.
  • A sandbox to prevent a misbehaving program from doing any damage to the actual system.

Unlike the instruction set probably used by the computer you’re sitting in front of right now, the VM is a stack-based (as opposed to a register-based) machine. There’s a stack that just about every instruction uses; each instruction pops its arguments off the stack, does its work, and pushes the results onto the stack.

This kind of architecture saves us from having to worry about what addressing modes we need for each instruction; everything simply uses the stack. Granted, since the VM I’ve written can work with both 8-bit and 16-bit values (the NES uses an 8-bit processor, but we need 16 bits to easily address all memory locations), the instructions do worry a little about the types of data on the stack. Of course, it would be possible to store type information in the stack as well, but then the instructions all need to be clever about how they handle each possible data type, which makes implementing the VM more error-prone.

Anyway, I’m sure you’re dying to see what program generated that video that manages to get all the way to the halfway point of World 1-1. Well, here it is, converted to an assembly language-style format:

start:     push_i8u 0x82       ; RIGHT + B buttons
           pop_r               ; write to output register
mainloop:  push_i8u 20         ; will loop for 20 vblank periods
waitloop:  pause               ; wait for next vblank period
           push_i8u 1          ; decrement our counter …
           sub_i8u             ;   … by one
           dup                 ; push a copy of the result
           push_i16 waitloop   ; if counter isn’t zero …
           cja                 ;   … go back to waitloop
           pop                 ; throw away counter
           push_r              ; get our current output
           push_i8u 0x01       ; A button …
           xor_8               ;   … gets toggled
           pop_r               ; write to output register
           push_i16 mainloop   ; and go back to top …
           ja                  ;   … of mainloop

OK, so I lied; all instructions except the push immediate instructions get their arguments from the stack.

Anyway, as you can see from the comments, all this program does is hold the Right and B buttons constantly, and every 20 emulation frames (about every 1/3 of a second) it either presses or releases the A button. As a result, Mario constantly runs to the right and jumps periodically. It’s surprising how far this 16-line dumb-as-a-post program gets on its second life!

So now all that’s left is a way to generate new programs for the VM, making them reproduce and mutate, and pitting them against each other in a battle-to-the-death.

Crossing the Streams

The latest update to Wallace, despite the delay since the last installment, doesn’t actually add any features directly related to its ultimate goal of using genetic algorithms to evolve something that can play Super Mario Bros.

Instead, it ports the audio/video code from SDL and drawing-pixmaps-on-the-window, respectively, to the GStreamer framework.

One of the neat things about GStreamer’s design is that it uses pipelines of elements to process multimedia streams, sort of like how you’d put together a bunch of simple programs on a Unix command line. Under this model, doing different things with the data just involves using different elements to process it. For example, taking the emulator’s output and sending it to the screen and speakers involves this pipeline:

Output pipeline

Walemulatorsrc is a custom-made element that acts as a source for the emulator’s output audio and video streams. The two queues act as buffers. Gconfaudiosink and gconfvideosink output the audio and video streams, respectively, using the user’s preferred output methods. This, for example, could let us render to video memory directly. But since the emulator has a very specific format for its audio and video streams, which may not match what those two sink elements accept, the streams are first passed through filters that can convert between different audio (audioconvert) and video (ffmpegcolorspace) formats.

The advantages of using a pipelined series of elements like this is what happens if, say, instead of drawing to the screen and playing to the speakers, you want to encode a video file instead? Well:

Encoder pipeline

Now after converting the streams to a convenient format, we run the audio stream through a Vorbis encoder, run the video stream through a Theora encoder, multiplex the two streams into an Ogg media file, and write the result to a file. Of course, there’s no reason you couldn’t generate a video file in whatever your favorite format is, either; it all depends on what elements you plug into the pipeline.

And as proof that this works, you can watch me beat up Metal Man from Mega Man 2:

Mega Man v. Metal Man
Ogg Theora video + Vorbis audio, 2.8 MB, 40 seconds. Windows users might be able to find the necessary codecs here if they don’t have them already, I guess.

Also, using GStreamer elements to render video on-screen scales the video to fill the window for free, which is a nice side benefit.

As nifty as all this is, and as simple as it seems in principle, implementing it turned out to be trickier than I expected. The main source of headaches was the need to implement that walemulatorsrc element myself. No prior experience writing elements + non-trivial behavior (two source pads, fed asynchronously by the emulation core) = lots of aggravation. I’m sure anyone who groks GStreamer would shudder at the code inside that element, which, let’s be honest, has a fair amount of the proverbial duct tape and baling wire.

Also, I had expected outputting the video via GStreamer to be faster than drawing a pixmap image onto the window about 60 times a second, but it turns out the GStreamer solution takes more CPU time than the “ugly and wrong” way. A little tentative profiling with Sysprof suggests that 20% of the CPU time is being spent inside that ffmpegcolorspace element, converting the 8-bit RGB bitmaps generated by the emulation core into the YUV color space that elements like gconfvideosink and theoraenc seem to want. It may be worthwhile in the future to eschew ffmpegcolorspace and do the conversion within walemulatorsrc, but you know what they say about premature optimization.

I promise, the next status update will actually further Wallace along to its ultimate goal. I can’t wait to see how what I have planned next stacks up.

[Editor's note: your homework exercise is to figure out why that last sentence is a combination clue and pun.]

Inside the Mushroom Kingdom

I spent a ridiculous amount of time this weekend hacking on Wallace. Of course, a lot of that was due to a near-total overhaul of the earlier code, refactoring thousands of lines to make the way forward easier (since I now have a decent mental picture of what I need, instead of aping the existing FCEU code.)

Of course, that doesn’t mean refactoring was the only change since the last update. As you’ll recall, I had been using screen scraping to get information about the current state (things like score, number of lives, etc.), but along the way found it was way more complicated than I had anticipated. Plus, even Super Mario Bros. has pathological cases where screen scraping is more problematic than recognizing ten digits, as anyone who’s ever gotten more than nine extra lives can tell you.

So, screen scraping is dead and buried. (Using image recognition to detect events, however, is still there, since it’s easy to implement, easy to understand, and widely applicable.) Luckily, there’s a much richer resource for getting game information: the game’s memory itself. Naturally, in-game memory has all the information we’re likely to need, assuming we can find it, and the FCEU core’s debugging hooks lets us access it easily from code.

Ah, but how to find it? The FCEU core comes to the rescue again; part of its cheating interface lets you search game memory for interesting values. (As you may know, cheating devices like the Game Genie work by replacing values in memory; an “infinite lives” cheat, for example, just returns a constant value whenever the game looks to see how many lives are left.) Since FCEU is happy to do the heavy lifting, Wallace just has to provide a graphical interface to it.

To get an idea of how this works, let’s figure out where in memory Super Mario Bros. keeps track of the number of coins we’ve collected. We start by resetting the search, which saves a snapshot of the current contents of memory:

Before

In this example, we currently have 3 coins, and we’re about to collect 19 more. After we do so, we search for all bytes in memory that were originally “3″ but are now “22″.

After

We lucked out in this example: there’s only one byte in the address space that changed in the way we were looking for. In most cases, there’ll be some false positives, but they can be screened out by doing another search on the results.

Also, we assumed that the coin total is stored as a single byte in memory. More pathological situations are also possible, of course. For example, since the score is a six-digit number, there’s no way it can fit into a single byte (which can range only from 0 to 255), so it must be spread across multiple, presumably contiguous, bytes. Additionally, the data might not even be laid out like you’d expect a multi-byte integer to be; for example, the score in Super Mario Bros. is stored in six bytes, one byte per digit, starting with the millions digit (which only appears on screen if your score gets that high) and leaving off the ones digit (which is always 0 anyway). And, of course, endianness is also potentially an issue.

Of course, with a little trial and error, we can use the Memory Browser to find where the data we want is stored, and then extract it when we need it:

Score

Since I’m sure you’re dying to know, here’s where some interesting values can be found in Super Mario Bros.’s memory:

Data Address Format Notes
Current Score 2013 6 bytes, base 10, big-endian Multiply this by 10 to get the score
High Score 2007 6 bytes, base 10, big-endian Likewise
Coins 1886 1 byte
World 1887 1 byte Zero-based (i.e., World 1 is stored as 0)
Stage 1884 1 byte Likewise
Time Remaining 2040 3 bytes, base 10, big-endian
Lives 1882 1 byte Zero-based

So, what can we do with this? We can use the Evolver window to evaluate the fitness of an input. First, we specify our fitness function on the Fitness tab. This is mostly things we’ve seen before, just with a somewhat nicer interface. However, now a weight is associated with each metric; overall fitness is a weighted sum of all metrics.

Fitness

Next, we specify the initial conditions for our run in the Game tab:

Game

The Breeding tab will eventually be where we specify how to generate new candidate solutions in our genetic algorithm, but for the time being there’s nothing there. All we can do is do a manual test of our setup on the Population tab, where the human plays and the Evolver evaluates how well he or she does:

Playing

You’ll note that our fitness function weights world and stage much more heavily than score, as can be seen in the computed fitness value. There’s also a very heavy penalty for game over:

Over

So now, the main missing piece is the genetic algorithm.

Two Steps Forward, One Step Back

This time, the screenshot comes first. Check it out:

Super Mario Bros and the Evaluator window

That’s right, Wallace can now screen scrape games while they’re being played in order to determine how well the “player” is doing. Sort of.

Here’s what the configuration for this looks like:

Screen scraping the score from Super Mario Bros

Intuitively, we need two pieces of information in order to screen scrape: the part of the screen where the data is, and how to convert the raw pixels into a number.

The first part of that is easy enough; load a screen dump into the viewer and draw a rectangle around the part of the screen you want. To make things easier, the current code assumes each character will be on an 8-pixel boundary, so the box “snaps” along the grid lines accordingly. (More on this later….)

The second part is a bit trickier, as we need to sort of do OCR on the image to figure out what characters are there. Luckily, part of NES video memory is the “pattern table”, where the currently loaded patterns are stored. A pattern is simply an 8×8 bitmap with a color depth of 2 bits. NES games use the pattern table as the building block for backgrounds and sprites. For example, here’s the pattern table for Super Mario Bros:

Pattern table from Super Mario Bros

The colors are arbitrary — the actual color you see during the game is a result of combining the pattern’s two color bits with two additional color bits associated with the particular part of the screen it’s being drawn at. If you look carefully, you can see that the first set of patterns are pieces of Mario sprites.

Of course, what we’re really interested in are the patterns for each number, as those tell us what each digit will look like (modulo palette rotation) on the screen. Once we assign each digit to its corresponding pattern (assuming a base-10 number system for the time being), converting a part of the screen to a numerical value is relatively easy.

“Relatively” being the operative word here, unfortunately. For starters, I’ve only been able to figure out how to access the current contents of the pattern table from the FCEU store. For Super Mario Bros this is good enough, since it never changes. However, games with more graphics data will swap different pattern data in and out — for example, the contents of the pattern table in Mega Man 2 change depending on which stage you’re in. I think this might be done with the different mappers in NES games, but the FCEU code here is particularly inscrutable, so I haven’t figured out any way to access the “other” pattern tables without playing to the point where they’re loaded.

Also, remember how I mentioned the code assumes that scores and other numbers you want to screen-scrape occur on 8-pixel boundaries? If you look very carefully at the screen shots above, it looks like that’s the case, doesn’t it? After all, the upper-left corner of the score display starts at pixel (24, 24) (where pixel (0, 0) is the upper-left corner), right?

Well, no. Look very carefully at the patterns for each digit. The digits themselves are only 7 pixels wide, so one column of pixels on each is “blank”. This margin is on the left side of each digit.

In other words, the score actually starts at pixel (23, 24).

The code currently hacks around this by adjusting all your screen scraping bounding boxes by one pixel to the left. Which, of course, breaks pretty much any other game you try.

Plus, if you want to scrape multiple values, you need to set up an action for each separately, which is very much a pain.

In other words, the code is functional, but the interface has got to go. While the “one action per metric adjustment” sounded nice back when I made that design decision, it’d be much nicer to take one screen dump and specify all the parts of the screen we want.

Additionally, since we need pixel-perfect precision for specifying what parts of the screen should be scraped, it’d be really nice to have the program figure out the bounding rectangles for us. As a one-time sort of thing when defining an action, it could take the digit-to-pattern mapping we specify and scan the entire screen to deduce where the digits appear, assemble them into rectangles, and let those rectangles be mapped to metrics.

Just another example of not learning design-relevant issues until actually writing code. In your face, waterfall model!

Time Trial

The genetic algorithms and Nintendo games thing makes incremental progress!

Recap: My ultimate goal, for some reason, is to see if you can use genetic algorithms to “train” a computer to play Super Mario Bros. A genetic algorithm needs two key things: a way to generate candidate solutions (relatively easy, here) and a way to measure their “fitness” (or “goodness”) so as to decide which candidates live on to the next generation and which go the way of the dodo.

Of course, being a master of computer science, I won’t be satisfied by creating something that is only capable of playing Super Mario Bros. Nay, I want something that can, in principle, be applied to any Nintendo game. Naturally, a fitness function appropriate for Super Mario Bros. (whatever that might be) would be quite different than one appropriate for, say, Mega Man 2, to say nothing of something like StarTropics or (dare I say it) Final Fantasy.

Clearly, the user of whatever program I create will need some way to specify the desired fitness function. This must be powerful enough to allow the user to make genuinely useful fitness functions, as well as providing enough flexibility to apply to a wide variety of games. And, of course, it needs to be simple enough so that the user stands a chance of using the blasted thing without giving up.

Any fitness function must have some way of measuring a candidate solution. Well, here’s a first stab at it.

Wallace's Evaluator window
(Click image for full-size)

The basic paradigm here is a combination of three simple concepts: metrics, events, and actions.

  • A metric is a named value you can modify. In principle, a metric measures one particular aspect of a candidate solution. If you know anything about programming, a metric is pretty much just a variable. A fitness function would ultimately be defined as a mathematical combination of one or more metrics.
  • An event is a condition you can associate one or more actions with. After each video frame, all events are checked to determine whether or not they fire. If they do, their associated actions are executed.
  • An action specifies how to modify one of the metrics when the action’s event occurs.

An example should make it clear how this is supposed to work. In the example above, we’re trying to measure the wall clock time that passes while playing. We’ve defined two metrics: Minutes and Seconds. Knowing that for an NTSC NES game there are about 60 video frames per second (closer to 60.1, technically), we can say that after every 60 frames, we should increment Seconds by one. Similarly, after every 3600 video frames, a minute has passed, so we increment Minutes by one and subtract 60 from Seconds.

In practice, these measurements wouldn’t be very helpful. If we wanted to measure how long a run took, we’d more likely just want to count the frames and be done with it. But since at this point I haven’t yet implemented more interesting events and actions (which will involve screen scraping in one way or another), this is about all the code can do at the moment.

Nevertheless, it does serve as a proof of concept for how this will work. I suspect this will prove to be a good balance among power, flexibility, and ease of use.

And since you’re all dying to know, here’s how long it took me to play through Air Man’s stage. (Yes, on Difficult; I’m not a wimp.)

Beating Air Man's stage in Mega Man 2
(Click image for full-size)

Remember that that’s me playing, not any fancy randomly-generated, natually-selected solution. My code isn’t capable of doing any of that.

Yet.

Emulate Good Times, Come On

The crazy genetic-algorithms-v.-Super-Mario-Bros. project isn’t dead.

Wallace Preview

As you can see, I’ve successfully done a basic port of the FCE Ultra NES emulator to GTK/GNOME. It’s lacking lots of the crazy advanced features of FCEU, but the important ones are there:

  • (Unscaled) video output.
  • Audio output.
  • Keyboard input.
  • Joystick input.
  • Save states.
  • Movie recording and playback.
  • And a few other things.

It’s nothing fancy, but it’s definitely usable. It does have one advantage over the Linux version of FCEU, however:

Wallace Preview - Input Dialog

Look, it’s a way to configure your input settings without running the emulator with a special command-line switch that takes you through a series of text-based prompts for your input settings! You can’t see it in the above screenshot, but all you need to do is edit the cell with the input binding you want, and then press a key or move the joystick.

Although the source code for FCEU can be pretty ugly, it does have a pretty nice separation of the emulation code itself from the user interface, which helped a lot. Actually running the emulator is quite simple: you make a loop that calls a function that advances the emulation one frame, and gives you back the new screen contents and audio samples. (OK, it’s a bit more complicated than that, but you get the idea.)

So now “all” that’s left is the interesting part of actually using genetic algorithms to play a game. As you may recall, the two main things we need are a representation of a candidate solution (i.e., the series of controller inputs), and a fitness function to decide which candidate solutions are better than the others.

As mentioned in the above feature list, FCEU supports recording and playing back movies. These aren’t ordinary multimedia files, however — they’re a dump of the initial emulator state (essentially no different from a save state), followed by a series of controller inputs and timings. But that’s exactly what we need to represent candidate solutions! So, for generating solutions, we just need to take some initial state, generate a random series of controller inputs, write that out to a movie file, and feed that into the emulator.

The movie file format also looks to be quite amenable to the various mutations, cross-overs, and other genetic reassortments that will be necessary. The basic unit of information in a movie file is a single button change followed by a time (possibly 0) between this input change and the next one. (Or the time might preceed the input instead; I’m not entirely sure which, but it really doesn’t matter for our purposes.) With this representation, we can easily mutate fields and shuffle things around both within and between movies. If a movie file is DNA, then these little chunks are the individual nucleotides.

Of course, the tricky part will still be specifying a fitness function to use. I’ve got some ideas for how to do that, but I’ll save it for a later post.

FCEU Redux

Well, I’ve decided to give that crazy idea about using genetic algorithms to “teach” a computer to play Super Mario Bros. We’ll see how well this turns out.

The first step, of course, is to retrofit a NES emulator to be able to do the input and output manipulation I need. And being a Linux user, the best NES emulator around is FCE Ultra, even despite not having been actively maintained for quite some time. Plus, FCE Ultra has built-in support for making and playing movies (really, recording and replaying input), which should be very helpful. In fact, FCE Ultra movies are one of the preferred formats for people who do that sort of thing.

One of the few obnoxious things about FCE Ultra, however, is the utter lack of a GUI under Linux — everything is passed on the command line, and you have to remember the keys to activate the various features. That’s not quite going to cut it for my purposes.

So, the first step is to port FCE Ultra to GTK (and, to a lesser extent, GNOME). That way there’ll be a usable GUI on which to add an interface for working with the genetic algorithms (and, of course, making it easier to play around in general).

Now, I’ve had a little experience before hacking on FCE Ultra — the SDL joystick code is derived from a patch I wrote a few years back. And I remember that the FCE Ultra source code isn’t pretty to look at. The emulator’s an impressive piece of work at run-time, but the main developer really could’ve used that space bar a lot more, and one-space tabs really don’t cut it. It’s not like he used the saved space for comments or anything.

However, despite its flaws, the FCE Ultra source code does have one thing going for it: it already has a pretty good separation between the emulation code itself and the platform-specific interfaces. Sure, the “drivers” for different platforms are fairly tangled together at first glance, but it does mean that porting things to GTK is essentially just a matter of implementing a new driver and plugging it into the rest of the program. It also means that going further to shovel input in and analyze the output of the emulator should be quite doable.

So, at this point, I’m working on implementing basic GTK support into the last official release of FCE Ultra, thus familiarizing me with the ins and outs of the code and providing a friendlier interface for when I get to start working on the fun stuff.

Mario Meiosis

Here’s a (possibly) crazy idea: is it possible to use a genetic algorithm to “teach” a computer to play Super Mario Bros.?

For those of you wondering what that last sentence said, here’s the basic idea. A genetic algorithm is a technique based on evolution for finding a solution to a problem. You start off with a bunch of randomly-generated candidate solutions, which are probably all pretty bad, but that doesn’t matter. You apply a fitness function to decide how good each candidate solution is; even though they’re all bad, some are going to be less bad than others. Then take the best candidate solutions and use them to generate new ones, such as by mutation (changing parts of them randomly) or crossover (swap parts of two different solutions). Repeat until you get a solution that’s good enough.

So, can you use this technique to have a computer discover a way to win Super Mario Bros.? It seems like the answer should be “yes.” We just need to define two things: what candiate solutions look like, and what our fitness function is.

Candidate solutions are easy: a series of button inputs. The NES controller has eight buttons: Up, Down, Left, Right, Select, Start, A, and B. Each button can either be pressed or released at any given instant, and we can easily represent the state of the controller in one byte. (Naturally, not all 256 possibilities are valid, since you can’t press Up and Down simultaneously, and games wouldn’t quite know what to do if you could.)

But how do we define fitness? Intuitively, we want the fitness function to measure how far the computer gets into the game; a solution that gets to World 4-3 is better than one that only gets as far as World 1-2. However, this is much too coarsely grained, as we have no way of measuring progress within a single level. For example, consider when we’re just starting the algorithm, and we’re comparing purely random solutions. It’s very unlikely that any of them will get all the way to the end of World 1-1, but some of them will get farther than the others, and we want to select for them.

Ah, but how do we measure “progress” within a level? We intuitively know what we mean, but how to define it in terms the computer can understand? A naive approach might be to somehow watch the background to determine when the screen scrolls to the right; the more it scrolls, the more progress has been made. Unfortunately, this doesn’t handle warp pipes, since they let you shortcut part of the stage. We want a solution that takes the shortcut in World 1-1 and gets to the staircase to be “fitter” than one that skips the shortcut but dies before reaching the stairs, even though the latter solution scrolled through more of the stage.

One solution could be to encode knowledge of each stage’s layout into the fitness function, so it can determine exactly how far we get each time. This would work, but means that we need to feed a lot of information into the fitness function. Not only is this time consuming, but it makes it very difficult to apply the same approach to another side scroller, even one that has essentially identical mechanics, such as The Lost Levels.

So, even though “progress though the game” is the most intuitive choice for a fitness function, it’s problematic and difficult to actually use. We may be better off with some other metric that’s easier to obtain and that is correlated with progress.

One candidate is the score at the upper-left corner of the screen. In most cases, your score goes up as you go through more of the game, so higher scores should generally indicate more progress. And since the score is displayed on screen, it’s very easy to access. This approach might work, though it could get caught in a “local optimum” that never leads to winning the game. For example, if there’s a 1-up in the stage, you can collect it, earn some points, die deliberately, and then go back to the stage to repeat the process indefinitely. The score keeps going up, but you never get any farther.

Having the fitness function also consider time might help prevent the search from getting trapped in these dead ends. All else being equal, we want a solution that takes less time than the alternatives. A solution that rushes to the flagpole to get points from the time remaining should be better than one that keeps replaying the same stage to get more points. Of course, favoring time too much could lead to the “best” solution being the one that runs right into the goomba at the beginning of World 1-1 repeatedly — fast, but no progress. The fitness function would need to find a good balance between the two measures.

So, it seems as though a good fitness function for Super Mario Bros. considers multiple factors in hopes of indirectly measuring progress through the game.

What happens if we try this approach with different games? Our solutions still look the same — a series of controller states — but the fitness function will depend on which game we want to play. Pac-Man is obvious — fitness equals the score you end with. Not only is score closely correlated with progress (you can’t replay a level to get more points), but we also select for “better” play for each level (eating all four ghosts with each power pellet is better than always avoiding the ghosts).

Galaga is similar, but we might also want to consider the hit percentage as well as the final score; having 95% of your shots hit is better than having only 47% of them hit.

Of course, there are also games where finding a good fitness function is more difficult than for Super Mario Bros. For example, Mega Man 2 is also a side-scroller like Super Mario Bros., but there’s no score whatsoever. On the other hand, since there aren’t any shortcuts, measuring how far the screen scrolls might work (as long as we distinguish the different directions, so as not to reward pacing back and forth).

And if we want to really get crazy, what kind of fitness function would you use for Final Fantasy, without encoding detailed information about plot points and each of the dungeons? Measuring experience points would probably be a factor in the fitness function, but some other indicator of how close you are to beating Chaos would be needed so you don’t just keep walking around Corneria indefinitely fighting Imps. And I’m sure there are other games where finding a simple, effective fitness function is even harder.

Remember, we want the fitness function to be as smooth as possible. A fitness function made only of a few sudden steep steps will increase the time before a candidate solution crosses the magic threshold, as there’s no way to tell ahead of time which candidates are closer. (Compare the refutation of the creationist claim that the eye is “too complex” to have evolved: you don’t need to jump from “no eye” to “eye” in one step, as there’s lots of small, incremental improvements forming a continuum between the two extremes.)

Considering the challenge of choosing a good fitness function for some games, the technical issues of actually implementing a genetic algorithm pale by comparison. All you really need is a way to feed input into the game and screen-scrape the output to get information for the fitness function. Both of these should be fairly straightforward to implement in an emulator. The trick is to give the user a flexible way to specify the fitness function that should be used, which is more a UI issue than anything.