Time Trial

The genetic algorithms and Nintendo games thing makes incremental progress!

Recap: My ultimate goal, for some reason, is to see if you can use genetic algorithms to “train” a computer to play Super Mario Bros. A genetic algorithm needs two key things: a way to generate candidate solutions (relatively easy, here) and a way to measure their “fitness” (or “goodness”) so as to decide which candidates live on to the next generation and which go the way of the dodo.

Of course, being a master of computer science, I won’t be satisfied by creating something that is only capable of playing Super Mario Bros. Nay, I want something that can, in principle, be applied to any Nintendo game. Naturally, a fitness function appropriate for Super Mario Bros. (whatever that might be) would be quite different than one appropriate for, say, Mega Man 2, to say nothing of something like StarTropics or (dare I say it) Final Fantasy.

Clearly, the user of whatever program I create will need some way to specify the desired fitness function. This must be powerful enough to allow the user to make genuinely useful fitness functions, as well as providing enough flexibility to apply to a wide variety of games. And, of course, it needs to be simple enough so that the user stands a chance of using the blasted thing without giving up.

Any fitness function must have some way of measuring a candidate solution. Well, here’s a first stab at it.

Wallace's Evaluator window
(Click image for full-size)

The basic paradigm here is a combination of three simple concepts: metrics, events, and actions.

  • A metric is a named value you can modify. In principle, a metric measures one particular aspect of a candidate solution. If you know anything about programming, a metric is pretty much just a variable. A fitness function would ultimately be defined as a mathematical combination of one or more metrics.
  • An event is a condition you can associate one or more actions with. After each video frame, all events are checked to determine whether or not they fire. If they do, their associated actions are executed.
  • An action specifies how to modify one of the metrics when the action’s event occurs.

An example should make it clear how this is supposed to work. In the example above, we’re trying to measure the wall clock time that passes while playing. We’ve defined two metrics: Minutes and Seconds. Knowing that for an NTSC NES game there are about 60 video frames per second (closer to 60.1, technically), we can say that after every 60 frames, we should increment Seconds by one. Similarly, after every 3600 video frames, a minute has passed, so we increment Minutes by one and subtract 60 from Seconds.

In practice, these measurements wouldn’t be very helpful. If we wanted to measure how long a run took, we’d more likely just want to count the frames and be done with it. But since at this point I haven’t yet implemented more interesting events and actions (which will involve screen scraping in one way or another), this is about all the code can do at the moment.

Nevertheless, it does serve as a proof of concept for how this will work. I suspect this will prove to be a good balance among power, flexibility, and ease of use.

And since you’re all dying to know, here’s how long it took me to play through Air Man’s stage. (Yes, on Difficult; I’m not a wimp.)

Beating Air Man's stage in Mega Man 2
(Click image for full-size)

Remember that that’s me playing, not any fancy randomly-generated, natually-selected solution. My code isn’t capable of doing any of that.

Yet.

Emulate Good Times, Come On

The crazy genetic-algorithms-v.-Super-Mario-Bros. project isn’t dead.

Wallace Preview

As you can see, I’ve successfully done a basic port of the FCE Ultra NES emulator to GTK/GNOME. It’s lacking lots of the crazy advanced features of FCEU, but the important ones are there:

  • (Unscaled) video output.
  • Audio output.
  • Keyboard input.
  • Joystick input.
  • Save states.
  • Movie recording and playback.
  • And a few other things.

It’s nothing fancy, but it’s definitely usable. It does have one advantage over the Linux version of FCEU, however:

Wallace Preview - Input Dialog

Look, it’s a way to configure your input settings without running the emulator with a special command-line switch that takes you through a series of text-based prompts for your input settings! You can’t see it in the above screenshot, but all you need to do is edit the cell with the input binding you want, and then press a key or move the joystick.

Although the source code for FCEU can be pretty ugly, it does have a pretty nice separation of the emulation code itself from the user interface, which helped a lot. Actually running the emulator is quite simple: you make a loop that calls a function that advances the emulation one frame, and gives you back the new screen contents and audio samples. (OK, it’s a bit more complicated than that, but you get the idea.)

So now “all” that’s left is the interesting part of actually using genetic algorithms to play a game. As you may recall, the two main things we need are a representation of a candidate solution (i.e., the series of controller inputs), and a fitness function to decide which candidate solutions are better than the others.

As mentioned in the above feature list, FCEU supports recording and playing back movies. These aren’t ordinary multimedia files, however — they’re a dump of the initial emulator state (essentially no different from a save state), followed by a series of controller inputs and timings. But that’s exactly what we need to represent candidate solutions! So, for generating solutions, we just need to take some initial state, generate a random series of controller inputs, write that out to a movie file, and feed that into the emulator.

The movie file format also looks to be quite amenable to the various mutations, cross-overs, and other genetic reassortments that will be necessary. The basic unit of information in a movie file is a single button change followed by a time (possibly 0) between this input change and the next one. (Or the time might preceed the input instead; I’m not entirely sure which, but it really doesn’t matter for our purposes.) With this representation, we can easily mutate fields and shuffle things around both within and between movies. If a movie file is DNA, then these little chunks are the individual nucleotides.

Of course, the tricky part will still be specifying a fitness function to use. I’ve got some ideas for how to do that, but I’ll save it for a later post.

FCEU Redux

Well, I’ve decided to give that crazy idea about using genetic algorithms to “teach” a computer to play Super Mario Bros. We’ll see how well this turns out.

The first step, of course, is to retrofit a NES emulator to be able to do the input and output manipulation I need. And being a Linux user, the best NES emulator around is FCE Ultra, even despite not having been actively maintained for quite some time. Plus, FCE Ultra has built-in support for making and playing movies (really, recording and replaying input), which should be very helpful. In fact, FCE Ultra movies are one of the preferred formats for people who do that sort of thing.

One of the few obnoxious things about FCE Ultra, however, is the utter lack of a GUI under Linux — everything is passed on the command line, and you have to remember the keys to activate the various features. That’s not quite going to cut it for my purposes.

So, the first step is to port FCE Ultra to GTK (and, to a lesser extent, GNOME). That way there’ll be a usable GUI on which to add an interface for working with the genetic algorithms (and, of course, making it easier to play around in general).

Now, I’ve had a little experience before hacking on FCE Ultra — the SDL joystick code is derived from a patch I wrote a few years back. And I remember that the FCE Ultra source code isn’t pretty to look at. The emulator’s an impressive piece of work at run-time, but the main developer really could’ve used that space bar a lot more, and one-space tabs really don’t cut it. It’s not like he used the saved space for comments or anything.

However, despite its flaws, the FCE Ultra source code does have one thing going for it: it already has a pretty good separation between the emulation code itself and the platform-specific interfaces. Sure, the “drivers” for different platforms are fairly tangled together at first glance, but it does mean that porting things to GTK is essentially just a matter of implementing a new driver and plugging it into the rest of the program. It also means that going further to shovel input in and analyze the output of the emulator should be quite doable.

So, at this point, I’m working on implementing basic GTK support into the last official release of FCE Ultra, thus familiarizing me with the ins and outs of the code and providing a friendlier interface for when I get to start working on the fun stuff.

Mario Meiosis

Here’s a (possibly) crazy idea: is it possible to use a genetic algorithm to “teach” a computer to play Super Mario Bros.?

For those of you wondering what that last sentence said, here’s the basic idea. A genetic algorithm is a technique based on evolution for finding a solution to a problem. You start off with a bunch of randomly-generated candidate solutions, which are probably all pretty bad, but that doesn’t matter. You apply a fitness function to decide how good each candidate solution is; even though they’re all bad, some are going to be less bad than others. Then take the best candidate solutions and use them to generate new ones, such as by mutation (changing parts of them randomly) or crossover (swap parts of two different solutions). Repeat until you get a solution that’s good enough.

So, can you use this technique to have a computer discover a way to win Super Mario Bros.? It seems like the answer should be “yes.” We just need to define two things: what candiate solutions look like, and what our fitness function is.

Candidate solutions are easy: a series of button inputs. The NES controller has eight buttons: Up, Down, Left, Right, Select, Start, A, and B. Each button can either be pressed or released at any given instant, and we can easily represent the state of the controller in one byte. (Naturally, not all 256 possibilities are valid, since you can’t press Up and Down simultaneously, and games wouldn’t quite know what to do if you could.)

But how do we define fitness? Intuitively, we want the fitness function to measure how far the computer gets into the game; a solution that gets to World 4-3 is better than one that only gets as far as World 1-2. However, this is much too coarsely grained, as we have no way of measuring progress within a single level. For example, consider when we’re just starting the algorithm, and we’re comparing purely random solutions. It’s very unlikely that any of them will get all the way to the end of World 1-1, but some of them will get farther than the others, and we want to select for them.

Ah, but how do we measure “progress” within a level? We intuitively know what we mean, but how to define it in terms the computer can understand? A naive approach might be to somehow watch the background to determine when the screen scrolls to the right; the more it scrolls, the more progress has been made. Unfortunately, this doesn’t handle warp pipes, since they let you shortcut part of the stage. We want a solution that takes the shortcut in World 1-1 and gets to the staircase to be “fitter” than one that skips the shortcut but dies before reaching the stairs, even though the latter solution scrolled through more of the stage.

One solution could be to encode knowledge of each stage’s layout into the fitness function, so it can determine exactly how far we get each time. This would work, but means that we need to feed a lot of information into the fitness function. Not only is this time consuming, but it makes it very difficult to apply the same approach to another side scroller, even one that has essentially identical mechanics, such as The Lost Levels.

So, even though “progress though the game” is the most intuitive choice for a fitness function, it’s problematic and difficult to actually use. We may be better off with some other metric that’s easier to obtain and that is correlated with progress.

One candidate is the score at the upper-left corner of the screen. In most cases, your score goes up as you go through more of the game, so higher scores should generally indicate more progress. And since the score is displayed on screen, it’s very easy to access. This approach might work, though it could get caught in a “local optimum” that never leads to winning the game. For example, if there’s a 1-up in the stage, you can collect it, earn some points, die deliberately, and then go back to the stage to repeat the process indefinitely. The score keeps going up, but you never get any farther.

Having the fitness function also consider time might help prevent the search from getting trapped in these dead ends. All else being equal, we want a solution that takes less time than the alternatives. A solution that rushes to the flagpole to get points from the time remaining should be better than one that keeps replaying the same stage to get more points. Of course, favoring time too much could lead to the “best” solution being the one that runs right into the goomba at the beginning of World 1-1 repeatedly — fast, but no progress. The fitness function would need to find a good balance between the two measures.

So, it seems as though a good fitness function for Super Mario Bros. considers multiple factors in hopes of indirectly measuring progress through the game.

What happens if we try this approach with different games? Our solutions still look the same — a series of controller states — but the fitness function will depend on which game we want to play. Pac-Man is obvious — fitness equals the score you end with. Not only is score closely correlated with progress (you can’t replay a level to get more points), but we also select for “better” play for each level (eating all four ghosts with each power pellet is better than always avoiding the ghosts).

Galaga is similar, but we might also want to consider the hit percentage as well as the final score; having 95% of your shots hit is better than having only 47% of them hit.

Of course, there are also games where finding a good fitness function is more difficult than for Super Mario Bros. For example, Mega Man 2 is also a side-scroller like Super Mario Bros., but there’s no score whatsoever. On the other hand, since there aren’t any shortcuts, measuring how far the screen scrolls might work (as long as we distinguish the different directions, so as not to reward pacing back and forth).

And if we want to really get crazy, what kind of fitness function would you use for Final Fantasy, without encoding detailed information about plot points and each of the dungeons? Measuring experience points would probably be a factor in the fitness function, but some other indicator of how close you are to beating Chaos would be needed so you don’t just keep walking around Corneria indefinitely fighting Imps. And I’m sure there are other games where finding a simple, effective fitness function is even harder.

Remember, we want the fitness function to be as smooth as possible. A fitness function made only of a few sudden steep steps will increase the time before a candidate solution crosses the magic threshold, as there’s no way to tell ahead of time which candidates are closer. (Compare the refutation of the creationist claim that the eye is “too complex” to have evolved: you don’t need to jump from “no eye” to “eye” in one step, as there’s lots of small, incremental improvements forming a continuum between the two extremes.)

Considering the challenge of choosing a good fitness function for some games, the technical issues of actually implementing a genetic algorithm pale by comparison. All you really need is a way to feed input into the game and screen-scrape the output to get information for the fitness function. Both of these should be fairly straightforward to implement in an emulator. The trick is to give the user a flexible way to specify the fitness function that should be used, which is more a UI issue than anything.