Mario Meiosis

Here’s a (possibly) crazy idea: is it possible to use a genetic algorithm to “teach” a computer to play Super Mario Bros.?

For those of you wondering what that last sentence said, here’s the basic idea. A genetic algorithm is a technique based on evolution for finding a solution to a problem. You start off with a bunch of randomly-generated candidate solutions, which are probably all pretty bad, but that doesn’t matter. You apply a fitness function to decide how good each candidate solution is; even though they’re all bad, some are going to be less bad than others. Then take the best candidate solutions and use them to generate new ones, such as by mutation (changing parts of them randomly) or crossover (swap parts of two different solutions). Repeat until you get a solution that’s good enough.

So, can you use this technique to have a computer discover a way to win Super Mario Bros.? It seems like the answer should be “yes.” We just need to define two things: what candiate solutions look like, and what our fitness function is.

Candidate solutions are easy: a series of button inputs. The NES controller has eight buttons: Up, Down, Left, Right, Select, Start, A, and B. Each button can either be pressed or released at any given instant, and we can easily represent the state of the controller in one byte. (Naturally, not all 256 possibilities are valid, since you can’t press Up and Down simultaneously, and games wouldn’t quite know what to do if you could.)

But how do we define fitness? Intuitively, we want the fitness function to measure how far the computer gets into the game; a solution that gets to World 4-3 is better than one that only gets as far as World 1-2. However, this is much too coarsely grained, as we have no way of measuring progress within a single level. For example, consider when we’re just starting the algorithm, and we’re comparing purely random solutions. It’s very unlikely that any of them will get all the way to the end of World 1-1, but some of them will get farther than the others, and we want to select for them.

Ah, but how do we measure “progress” within a level? We intuitively know what we mean, but how to define it in terms the computer can understand? A naive approach might be to somehow watch the background to determine when the screen scrolls to the right; the more it scrolls, the more progress has been made. Unfortunately, this doesn’t handle warp pipes, since they let you shortcut part of the stage. We want a solution that takes the shortcut in World 1-1 and gets to the staircase to be “fitter” than one that skips the shortcut but dies before reaching the stairs, even though the latter solution scrolled through more of the stage.

One solution could be to encode knowledge of each stage’s layout into the fitness function, so it can determine exactly how far we get each time. This would work, but means that we need to feed a lot of information into the fitness function. Not only is this time consuming, but it makes it very difficult to apply the same approach to another side scroller, even one that has essentially identical mechanics, such as The Lost Levels.

So, even though “progress though the game” is the most intuitive choice for a fitness function, it’s problematic and difficult to actually use. We may be better off with some other metric that’s easier to obtain and that is correlated with progress.

One candidate is the score at the upper-left corner of the screen. In most cases, your score goes up as you go through more of the game, so higher scores should generally indicate more progress. And since the score is displayed on screen, it’s very easy to access. This approach might work, though it could get caught in a “local optimum” that never leads to winning the game. For example, if there’s a 1-up in the stage, you can collect it, earn some points, die deliberately, and then go back to the stage to repeat the process indefinitely. The score keeps going up, but you never get any farther.

Having the fitness function also consider time might help prevent the search from getting trapped in these dead ends. All else being equal, we want a solution that takes less time than the alternatives. A solution that rushes to the flagpole to get points from the time remaining should be better than one that keeps replaying the same stage to get more points. Of course, favoring time too much could lead to the “best” solution being the one that runs right into the goomba at the beginning of World 1-1 repeatedly — fast, but no progress. The fitness function would need to find a good balance between the two measures.

So, it seems as though a good fitness function for Super Mario Bros. considers multiple factors in hopes of indirectly measuring progress through the game.

What happens if we try this approach with different games? Our solutions still look the same — a series of controller states — but the fitness function will depend on which game we want to play. Pac-Man is obvious — fitness equals the score you end with. Not only is score closely correlated with progress (you can’t replay a level to get more points), but we also select for “better” play for each level (eating all four ghosts with each power pellet is better than always avoiding the ghosts).

Galaga is similar, but we might also want to consider the hit percentage as well as the final score; having 95% of your shots hit is better than having only 47% of them hit.

Of course, there are also games where finding a good fitness function is more difficult than for Super Mario Bros. For example, Mega Man 2 is also a side-scroller like Super Mario Bros., but there’s no score whatsoever. On the other hand, since there aren’t any shortcuts, measuring how far the screen scrolls might work (as long as we distinguish the different directions, so as not to reward pacing back and forth).

And if we want to really get crazy, what kind of fitness function would you use for Final Fantasy, without encoding detailed information about plot points and each of the dungeons? Measuring experience points would probably be a factor in the fitness function, but some other indicator of how close you are to beating Chaos would be needed so you don’t just keep walking around Corneria indefinitely fighting Imps. And I’m sure there are other games where finding a simple, effective fitness function is even harder.

Remember, we want the fitness function to be as smooth as possible. A fitness function made only of a few sudden steep steps will increase the time before a candidate solution crosses the magic threshold, as there’s no way to tell ahead of time which candidates are closer. (Compare the refutation of the creationist claim that the eye is “too complex” to have evolved: you don’t need to jump from “no eye” to “eye” in one step, as there’s lots of small, incremental improvements forming a continuum between the two extremes.)

Considering the challenge of choosing a good fitness function for some games, the technical issues of actually implementing a genetic algorithm pale by comparison. All you really need is a way to feed input into the game and screen-scrape the output to get information for the fitness function. Both of these should be fairly straightforward to implement in an emulator. The trick is to give the user a flexible way to specify the fitness function that should be used, which is more a UI issue than anything.

9 Responses

  1. I think that dying shold be more strongly weighed as a negative factor. There are videos of people beating the game in less than half an hour, and never dying. Therefore, since dying wastes time, and the goal of the game is to defeat the boss, dying is almost always counter-productive. Even in the even you described of repeating a level to get extra lives and scores, that is based on the idea of needing extra lives later, but this idea is to generate a program that won’t die (or at least will die alot, but restarting, instead of trying over), so extra lives shouldn’t really be heavily weighted.

    I would say priorities should go:
    a) not dying (since the game can be beaten without dying, death is sub-optimal performance)
    b) level advancement (measured by what world you’re on and screen scroll rate)
    c) time (possibly tie this in with level advancement, as in screen scroll RATE)
    c) score
    d) extra lives (possibly work into a function of score, since the idea is you never need to use them)

    Also, the criteria themselves could be graded by how fast they produce the optomized code for completing a stage. Criteria would be based on how many iterations are required to produce a winning combination. Furthermore, the order of random trials could be tweaked. Say for instance that 01000000 is moving right, and 11000000 is jumping right. When running iterations, if it starts from 00000000 and moves up, most of the initial trials won’t involve much jumping. It’s quite likely (and from observing the speed trials of people who perform nearly perfectly) that the majority of an optimal iteration of winning moves is spent jumping. Therefore, in the controller input, instead of initially trying 00000000 and counting up, it would be faster to start at 11111111 and count down. This may be more applicable to some leves that are more conducive to almost continuous jumping, but it still has the potential of seriously impacting calculation time. Another excellent (and simpler example, I wish I had thought of this first), would be running right, as opposed to running left (something that is rarely, if ever, needed in Mario Brothers), and thus it would be preffered if the program always attempted running right before trying a solution that involves running left.

  2. You know, I should probably be relaxing on lunch break, but this is so much more interesting.

  3. Paul;
    This is a fascinating question, one that I myself pondered as a kid.

    Perhaps you could build a simple virtual mock up of level 1-1 and incorporate your own metrics that can be measured definitively- at least this would save you the effort of having to try to measure values obtained from the actual game.

    Of course, you could always get a *real* job and not waste your time with such problems.

  4. SO COOL!

  5. I don’t think penalizing for lives lost buys you very much, since the only two end results are “win” and “lose n + 3 lives”, where n is the number of 1-ups you get. Penalize deaths and you essentially penalize getting 1-ups for all non-winning runs.

    Similarly, there shouldn’t be a need to weight certain buttons more heavily when randomly generating candidate solutions. After all, solutions with a lot of “bad” moves should get culled from the population pretty quickly. Though I suppose there would be a lot to be said to throw Select and Start out of consideration entirely, since pausing would never be helpful and Select does nothing in-game anyway.

    One interesting metric I didn’t think of originally is penalizing every time the candidate solution presses or releases a button. Factoring that in should favor “smoother” or more “elegant” solutions — it’s better to hold Right for a while than to repeatedly press and release it.

    On the other hand, in my (admittedly limited) understanding of using genetic algorithms, it’s best for the fitness function to just consider how close a candidate solution is to some “ideal” solution, rather than trying to coax it into strategies you think are beneficial. After all, part of what makes this approach interesting is its ability to “invent” novel approaches, and perhaps some of the things we might want to rule out could turn out to be helpful in some cases.

    In any event, actually being able to play around with an implementation of this stuff would hopefully settle a lot of these questions. I just have to get my hands on the source code for a good NES emulator….

  6. I disagree about tossing out Start/Select. Let’s say the solution randomly mutates into the Contra Code (while playing Contra, or some other code applicable to the game at hand…er, at code). Would that not be a badass mutation? Moreover, what if you found a real-life example of that?

    Also, I remember at least one game (one of the Donkey Kong games, I believe) that allowed you to hit select then start to exit a level…doing so near the bottom of a chasm you carelessly jumped into proved a very beneficial strategy.

  7. Ya’ll have spent far far to much time thinking about this!
    And I didn’t know you could do that select/start move on Donkey Kong. Dang, that would have come in handy!

  8. Well, Paul, I dugg this post. I like the idea and think you should get some code working to do this. Unlike Tripod, I would say that dying is one of the lesser important issues in the game. I believe the highest metric should be completing the level, with a tie-breaker of some formula combining time and score.

    After that, as you so eloquently put in your post, it is a head-scratcher. Personally, I would do a formula mixture of “scroll-to-the-right,” score, and time. I think these all have to be taken into account, with their weights depending on your personal preference on “optimal” gameplay. I, myself, would be keenly interested in seeing the fastest possible Mario completion possible w/o a concern over warps and the such, and would thus give scroll-to-the-right and score high emphasis.

  9. That last sentence should read, “…and would thus give scroll-to-the-right and score time high emphasis.”

Comments are closed.