Mario Meiosis

Here’s a (possibly) crazy idea: is it possible to use a genetic algorithm to “teach” a computer to play Super Mario Bros.?

For those of you wondering what that last sentence said, here’s the basic idea. A genetic algorithm is a technique based on evolution for finding a solution to a problem. You start off with a bunch of randomly-generated candidate solutions, which are probably all pretty bad, but that doesn’t matter. You apply a fitness function to decide how good each candidate solution is; even though they’re all bad, some are going to be less bad than others. Then take the best candidate solutions and use them to generate new ones, such as by mutation (changing parts of them randomly) or crossover (swap parts of two different solutions). Repeat until you get a solution that’s good enough.

So, can you use this technique to have a computer discover a way to win Super Mario Bros.? It seems like the answer should be “yes.” We just need to define two things: what candiate solutions look like, and what our fitness function is.

Candidate solutions are easy: a series of button inputs. The NES controller has eight buttons: Up, Down, Left, Right, Select, Start, A, and B. Each button can either be pressed or released at any given instant, and we can easily represent the state of the controller in one byte. (Naturally, not all 256 possibilities are valid, since you can’t press Up and Down simultaneously, and games wouldn’t quite know what to do if you could.)

But how do we define fitness? Intuitively, we want the fitness function to measure how far the computer gets into the game; a solution that gets to World 4-3 is better than one that only gets as far as World 1-2. However, this is much too coarsely grained, as we have no way of measuring progress within a single level. For example, consider when we’re just starting the algorithm, and we’re comparing purely random solutions. It’s very unlikely that any of them will get all the way to the end of World 1-1, but some of them will get farther than the others, and we want to select for them.

Ah, but how do we measure “progress” within a level? We intuitively know what we mean, but how to define it in terms the computer can understand? A naive approach might be to somehow watch the background to determine when the screen scrolls to the right; the more it scrolls, the more progress has been made. Unfortunately, this doesn’t handle warp pipes, since they let you shortcut part of the stage. We want a solution that takes the shortcut in World 1-1 and gets to the staircase to be “fitter” than one that skips the shortcut but dies before reaching the stairs, even though the latter solution scrolled through more of the stage.

One solution could be to encode knowledge of each stage’s layout into the fitness function, so it can determine exactly how far we get each time. This would work, but means that we need to feed a lot of information into the fitness function. Not only is this time consuming, but it makes it very difficult to apply the same approach to another side scroller, even one that has essentially identical mechanics, such as The Lost Levels.

So, even though “progress though the game” is the most intuitive choice for a fitness function, it’s problematic and difficult to actually use. We may be better off with some other metric that’s easier to obtain and that is correlated with progress.

One candidate is the score at the upper-left corner of the screen. In most cases, your score goes up as you go through more of the game, so higher scores should generally indicate more progress. And since the score is displayed on screen, it’s very easy to access. This approach might work, though it could get caught in a “local optimum” that never leads to winning the game. For example, if there’s a 1-up in the stage, you can collect it, earn some points, die deliberately, and then go back to the stage to repeat the process indefinitely. The score keeps going up, but you never get any farther.

Having the fitness function also consider time might help prevent the search from getting trapped in these dead ends. All else being equal, we want a solution that takes less time than the alternatives. A solution that rushes to the flagpole to get points from the time remaining should be better than one that keeps replaying the same stage to get more points. Of course, favoring time too much could lead to the “best” solution being the one that runs right into the goomba at the beginning of World 1-1 repeatedly — fast, but no progress. The fitness function would need to find a good balance between the two measures.

So, it seems as though a good fitness function for Super Mario Bros. considers multiple factors in hopes of indirectly measuring progress through the game.

What happens if we try this approach with different games? Our solutions still look the same — a series of controller states — but the fitness function will depend on which game we want to play. Pac-Man is obvious — fitness equals the score you end with. Not only is score closely correlated with progress (you can’t replay a level to get more points), but we also select for “better” play for each level (eating all four ghosts with each power pellet is better than always avoiding the ghosts).

Galaga is similar, but we might also want to consider the hit percentage as well as the final score; having 95% of your shots hit is better than having only 47% of them hit.

Of course, there are also games where finding a good fitness function is more difficult than for Super Mario Bros. For example, Mega Man 2 is also a side-scroller like Super Mario Bros., but there’s no score whatsoever. On the other hand, since there aren’t any shortcuts, measuring how far the screen scrolls might work (as long as we distinguish the different directions, so as not to reward pacing back and forth).

And if we want to really get crazy, what kind of fitness function would you use for Final Fantasy, without encoding detailed information about plot points and each of the dungeons? Measuring experience points would probably be a factor in the fitness function, but some other indicator of how close you are to beating Chaos would be needed so you don’t just keep walking around Corneria indefinitely fighting Imps. And I’m sure there are other games where finding a simple, effective fitness function is even harder.

Remember, we want the fitness function to be as smooth as possible. A fitness function made only of a few sudden steep steps will increase the time before a candidate solution crosses the magic threshold, as there’s no way to tell ahead of time which candidates are closer. (Compare the refutation of the creationist claim that the eye is “too complex” to have evolved: you don’t need to jump from “no eye” to “eye” in one step, as there’s lots of small, incremental improvements forming a continuum between the two extremes.)

Considering the challenge of choosing a good fitness function for some games, the technical issues of actually implementing a genetic algorithm pale by comparison. All you really need is a way to feed input into the game and screen-scrape the output to get information for the fitness function. Both of these should be fairly straightforward to implement in an emulator. The trick is to give the user a flexible way to specify the fitness function that should be used, which is more a UI issue than anything.