Time Trial

The genetic algorithms and Nintendo games thing makes incremental progress!

Recap: My ultimate goal, for some reason, is to see if you can use genetic algorithms to “train” a computer to play Super Mario Bros. A genetic algorithm needs two key things: a way to generate candidate solutions (relatively easy, here) and a way to measure their “fitness” (or “goodness”) so as to decide which candidates live on to the next generation and which go the way of the dodo.

Of course, being a master of computer science, I won’t be satisfied by creating something that is only capable of playing Super Mario Bros. Nay, I want something that can, in principle, be applied to any Nintendo game. Naturally, a fitness function appropriate for Super Mario Bros. (whatever that might be) would be quite different than one appropriate for, say, Mega Man 2, to say nothing of something like StarTropics or (dare I say it) Final Fantasy.

Clearly, the user of whatever program I create will need some way to specify the desired fitness function. This must be powerful enough to allow the user to make genuinely useful fitness functions, as well as providing enough flexibility to apply to a wide variety of games. And, of course, it needs to be simple enough so that the user stands a chance of using the blasted thing without giving up.

Any fitness function must have some way of measuring a candidate solution. Well, here’s a first stab at it.

Wallace's Evaluator window
(Click image for full-size)

The basic paradigm here is a combination of three simple concepts: metrics, events, and actions.

  • A metric is a named value you can modify. In principle, a metric measures one particular aspect of a candidate solution. If you know anything about programming, a metric is pretty much just a variable. A fitness function would ultimately be defined as a mathematical combination of one or more metrics.
  • An event is a condition you can associate one or more actions with. After each video frame, all events are checked to determine whether or not they fire. If they do, their associated actions are executed.
  • An action specifies how to modify one of the metrics when the action’s event occurs.

An example should make it clear how this is supposed to work. In the example above, we’re trying to measure the wall clock time that passes while playing. We’ve defined two metrics: Minutes and Seconds. Knowing that for an NTSC NES game there are about 60 video frames per second (closer to 60.1, technically), we can say that after every 60 frames, we should increment Seconds by one. Similarly, after every 3600 video frames, a minute has passed, so we increment Minutes by one and subtract 60 from Seconds.

In practice, these measurements wouldn’t be very helpful. If we wanted to measure how long a run took, we’d more likely just want to count the frames and be done with it. But since at this point I haven’t yet implemented more interesting events and actions (which will involve screen scraping in one way or another), this is about all the code can do at the moment.

Nevertheless, it does serve as a proof of concept for how this will work. I suspect this will prove to be a good balance among power, flexibility, and ease of use.

And since you’re all dying to know, here’s how long it took me to play through Air Man’s stage. (Yes, on Difficult; I’m not a wimp.)

Beating Air Man's stage in Mega Man 2
(Click image for full-size)

Remember that that’s me playing, not any fancy randomly-generated, natually-selected solution. My code isn’t capable of doing any of that.

Yet.

5 Responses

  1. Rather than doing it as a linear progression of keystrokes (which isn’t really that mappable to a GA in a very meaningful way), why not do something based on VRAM contents? like, the inputs could be sprite positions and CHR0 data.

  2. Short answer: This way is a lot easier to implement.

    Long answer: You’re right, my approach is more along the lines of the weasel program, although here there’s not necessarily only one “ideal” solution, and even if there is, it certainly isn’t known ahead of time.

    Or, in other words, my approach is searching for a rote solution, whereas a strategy more in the spirit of true GA would be to find an algorithm or heuristic for choosing inputs in response to changing game state — finding something that knows how to play well, rather than stumbling across a series of moves that happens to work.

    I’m really not even sure how you’d go about doing that. What would the organisms look like? Something like neural nets, I guess? I’d definitely need to do some reading to even figure out where to begin with something like that.

    I suppose the organism could also learn about the game state by screen-scraping, just like a human player does. In theory you could point it to the emulator’s internal state instead, but (a) that lets it cheat, and (b) that requires deciphering a lot more of FCEU’s guts than I care to do.

    Interestingly, though, I don’t think this has any impact on how you define a fitness function! The fitness function only really cares about the end result, not how you happen to get there. My current implementation just watches whatever’s playing, after all — it doesn’t care who’s providing the controller inputs.

    Now I have the urge to pit a population of rote-learners and a population of neural nets against each other, and see if one drives the other to extinction. Curses, you just found a way to make this little project even more interesting!

  3. So are you waiting to determine your fitness function before beginning designing the program? And in general, I get the impression that genetic algorithms are not going to be sufficient in this application given their fixed-length restrictions as opposed to genetic programming. GP allows for more versatility and open domain searching, as well as better crossover results, even compared to cut/splice asymetrical crossover with algorithms.

    There are many other things you might want to consider during implementation, that while not definatively necessary decrease the likelihood of getting stuck in local minima/maxima. As an example, you might consider the use of automatically defined functions (with the only downside being the potential complication of crossover and mutation) or reinsertion (though here you need avoid problems with what Excel terms ‘circular references’; not to mention the same crossover / mutation problems when nodes changes). Also, there’s some debate (since its completely up to the programmer) as to how to select individuals (one ranked based on fitness) for crossover / mutation / inclusion from generation to generation.

    Are you going to try to find a ‘solution’ for the entire game during a single run? I would imagine that it would be more reasonable to break it into multiple (and independent) solutions for each ‘level,’ with recombination at the end. This might also aid in the application of a fitness function.

    Admittedly, I’m not sure the best way to determine the fitness function. But could you be more specific on why you think screen-scraping (I admit I had to look up what that meant… where’s my wikipedia link?!) would be ‘cheating’? Also, while it would include getting back into the code, could you not simply include ‘markers’ in each level to use, well, as markers (i.e. milestones) and include that in the fitness function?

    Look at the bright side of this application. Depending on how you look at it, either your function set or your terminal set is already provided (*cough* exclude start&select for god’s sake *cough*), and there doesn’t appear to be much need for the other. And at least your termination decision is easy, so you don’t have to worry about generation limitations.

    Are you going to try and analytically determine a sufficient population size, or are you just going to start with a fairly large yet arbitrary number? Some ‘experts’ (or whatever you want to call them… writers at the least) think that the benefit of a large population diminishes as populations grow, though I haven’t spent the time to see if anyone empiracally support any estimation as to when this occurs… theory bastards…

    I look forward to when you write about progress with the program, even if it’s only utilizing a temporary fitness function. Get to work!

    Koza would be intrigued I’m sure…

    Addendum: both http://www.domoariga.to and http://www.mrrobo.to are available… yeah… i went there… i do have to say i’m a fan of capt. derivative… he’s not cool unless he has postulates, though…

  4. So I was thinking about it (what can I say it’s a slow day at the office), with “it” being you’re statement about it being “win” vs. “lose n + 3 lives”…

    Why not just make it “win” vs. “die”? I mean, sure, your’e restricting the project to finding a solution from a more restricted solution domain, but it would make defining the fitness function more manageable (I would think)…

    I don’t know… maybe not… just figured I’d ask “why not?”…

  5. First off, I’m probably being sloppy with using the terms “genetic algorithms” and “genetic programming,” so try not to read too much into my word choice.

    The basic approach I have in mind is to generate a bunch of random controller sequences, see which ones fare best, mutate/crossover/etc, and repeat. I know there’s a lot of parameters and strategies to choose from in there, and I haven’t yet decided just how I’ll approach it.

    Right now, my fitness functions are going to be pretty simple. From a purely implementation point of view, it wouldn’t be hard to make it a lot more sophisticated. However, I have no idea how to put a GUI in front of something like that without making it completely arcane.

    Besides, I’m curious how far I can go with even a simple approach.

    Also, I am doing primarily screen-scraping to evaluate how well a controller sequence does. What would be cheating would be giving it access to the game’s internal memory and variables and anything else you as a human sitting in front of the TV don’t have available to you. (Plus, it’d be a nightmare from a usability standpoint — just how is the user going to figure out where and how the score is stored?)

Comments are closed.