Wallace 0.99.1 Released

I finally got around to tinkering with Wallace, the tool for using genetic algorithms to play Nintendo games, again. The latest version of the software has several new features, a couple of which I’ll mention here.

The biggest is that your solutions can now take the form of a button input sequence, instead of being a program that runs in a VM and generates button input. This has the advantage of producing interesting results a lot faster, at the cost of your solutions being tied to your initial conditions. So, while you might be able to generate something that plays a particular Dr. Mario screen very well, it will crash and burn on any different starting condition — it’s just a dumb sequence of moves, after all.

There are many other, more subtle, improvements to the evolver as well. The selection phase now uses tournament selection to choose who lives on to the next generation, instead of just taking the top n solutions so far. This lets you vary selection pressure from brutal (top n) to nonexistent (creationists’ “evolution is entirely random” strawman) and anywhere in between. You can also now set the rate of mutation to a much wider range, in particular setting it to rates far lower than the (ridiculously high) minimum available in the previous version.

Also, the evolver prompts you to save your progress before it closes. I can’t tell you how many times the lack of that feature has caused me to lose a lot of data.

So, how do things shape up now? Here’s the statistics from a relatively quick sample run, on the same Dr. Mario screen as last time. Now, however, I’m using button input sequences with Select and Start filtered out (you can do that now too), with a population ranging from 200 to 2000 solutions, with fairly mild selection pressure (15%ish?), with a maximum mutation rate of 20 mutations per organism. The fitness function now measures final score, with penalties for the time intervals between each time you earn points. So, if you score 300 points at time 5 and 900 points at time 15, your fitness is 300 – 5 + 900 – (15 – 5) = 1185. (The ability to express that kind of fitness function is another new feature.)

So, here’s the stats (did I mention collecting these statistics is another new feature?):

Generation Maximum Minimum Mean Median
1 2369 -1 39 0
2 2369 -1 348 287
3 2969 -1 985 893
4 3272 -1 1405 1781
5 4170 -1 1518 2369
6 4181 -1 1947 2370

A few observations

  • A 100% randomly generated solution still managed to score 2400 points in the first generation.
  • By the sixth generation, at least half the solutions are better than the best solution in the first generation.
  • Each generation, the mean and median fitness increases.
  • It should be impossible for the fitness to be -1 with the fitness function I defined. There’s a bug somewhere.
  • Even so, the minimum shows no improvement from generation to generation; mutations can still render a good solution bad.

Of course, you want to see that top solution in the sixth generation, don’t you? Well, unlike the VM-based ones from last time, which ended up doing the same thing over and over, these solutions look like they’re being played by a monkey on crack who occasionally makes a really good move. Of course, since my lease doesn’t cover monkeys and none of my crack dealers are returning my calls, the monkey-on-crack theory is only a hypothesis.

Anyway, here’s the video of the top solution of generation six:

If you’re feeling adventurous, you can try your hand at Wallace yourself:

Download Wallace 0.99.1 (717 KB)

10 Responses

  1. Kulini… wait a second, wrong project.

    But seriously awesome-sauce Paul

  2. How long does it take to get through 6 generations? Just wondering because I’d like to see the 100(000)th generation’s results. Could that be done in a day(week[month{year}])?

  3. If you compile Wallace with some fairly aggressive optimization flags (-O3 -fomit-frame-pointer -DG_DISABLE_ASSERT -DG_DISABLE_CHECKS), the emulation runs at about 5x normal speed inside the evolver (on my system, at least). If you assume a typical solution takes 30 seconds of NES time (which may be optimistic in the long run, but reasonable for the first few generations), that’s about 6 seconds of real time per solution. For 1800 new solutions per generation, that comes to about 3 hours per generation, which sounds about right from experience.

    So assuming you can crunch through 8 generations per day, 100 generations will take a little under two weeks, and 100,000 will take a bit over 34 years.

    (These numbers are reasonably valid for a 1.7 GHz Pentium M; a faster processor would do better.)

    I haven’t done any real work trying to optimize the code, but I strongly suspect most of the time is being spent in the emulator itself, so even streamlining my additions to the codebase probably wouldn’t shave off much time. (Then again, humans are notoriously bad at optimizing code through intuition.) Parallelism is one potential alternative to crunching through more solutions in less time, but that’s a topic for another post….

  4. Awesome awesome awesome!

  5. hmmm….

    *thinking*

    ok, i agree that the “top n solutions” selection technique may not be optimal for Dr. Mario, but for other game-types i would still believe that it would be superior… i.e. Mario Bros…

    the next thing i have to say really depends on how your mutation subroutine works, but i assume that it simply randomly identifies a “move” and mutates it…

    given the nature of video games, do you think it would be beneficial to include more than the combinations and mutations? by that, i mean to simply take solution “L” (that has obviously already been selected to contribute to the next generation) and simply “amputate” a random length of the solution tail? in order to allow it to participate in the fitness evaluations you’d have to randomly generate (a la generation 0) a new tail, though it really shouldn’t be that hard to actually implement given that i assume you have the subroutines for these already in your combinations and as part of your initial population generation, respectively…

    just to make sure i’m making sense, it would be like if you had solution “a,b,b,u,d,l,l,l,r,d” and instead of mutating it at the fourth ‘move’ of ‘u’, “amputating” it at, say, the first ‘d’ and regenerating a new tail…

    the only reason i bring this up is that since video games, especially old school Nintendo ones, are fairly linear in their gameplay, and you may get stuck with solutions that, while ‘best’, just aren’t going to get you there… and while this is typically picked up by mutations, well… like i said, i don’t know how your mutations work so i can’t really speculate…

    all in all, though, looks good… and is pretty impressive…

    oh, and by the by, it’s about time you excluded start/select…

  6. Weakening the selection pressure helps to avoid getting trapped in a local maximum, by allowing poorer performers to survive for a while and hopefully mutate into something even better. Tournament selection is nifty in that it provides an easy way to parameterize the selection pressure, by controlling the number of participants in each tournament.

    There are five types of mutation. A point mutation changes a single byte in the organism. An insertion inserts one or more bytes. A deletion deletes one or more bytes. A transposition takes two contiguous chunks and swaps them. A duplication takes a chunk of bytes and inserts a copy immediately afterwards. Mutations can take place anywhere within the organism.

    Of course, as you observed, we expect most of the progress to occur at the “tail”, where the tail is the part of the solution after the last time a virus was eliminated. Changes here can’t make performance any worse, but could make it better. A mutation earlier on is more likely to throw everything behind it out of whack, since the later moves will be played when the game is in a different state. It’s possible to make a good beginning even better, but it’s less likely than improving the tail.

  7. It feels *so* weird to watch the video, because I find myself cheering for it. Go! C’mon, little guy! You can do it- no- just move it to the- a little to the right!

  8. I’d like to second what Ryan just said, and point out that I have pathetically watched the video 10… make that 11 times.

  9. Have you considered adjusting the fitness function to penalize on a per-input basis? That would remove alot of the random flipping of pills, make it appear a bit more natural, and more optomized, since they need to get a high score in as few inputs as possible.

  10. John, you mean penalizing for every time the input state changes? One could probably implement that without too much difficulty.

Comments are closed.