More on Wallace Jr.

To elaborate somewhat on my previous post, this is the input file I gave to Wallace Jr. in the run used to ultimately generate that video:

mednafen        /home/paul/mednafen-eval/instdir/bin/mednafen
buttons         right left up down b a
rom             /home/paul/roms/nes/Mega Man 2 (U).nes
initial_state   /home/paul/.mednafen/mcs/Mega Man 2 (U).0527a0ee512f69e08b8db6dc97964632.nc0
organism_size   1000
granularity     5
population_size 200
generations     50
parallelism    2
metric          health    0x06c0 1
metric          enemy     0x06c1 1
tournament_size             50
homologous_crossover_rate   40
arbitrary_crossover_rate    40
point_mutation_rate         10
reproduction_rate           10
point_mutation_probability  0.01

Note that the run used populations of 200 candidate solutions per generation. Each initial solution was 1,000 inputs long, with input granularity set to 5 frames — i.e., the same button input would be used for 5 consecutive frames (about 1/12 of a second) before advancing to the next. This was used to prevent solutions from being to spastic.

The various _rate settings specify how to generate new solutions from existing ones:

  • Homologous crossover takes one solution, deletes a randomly selected range within it, and replaces it with the corresponding range from another solution.
  • Arbitrary crossover is similar, but doesn’t require the two ranges to correspond to one another.
  • Point mutation randomly mutates each input with some small probability (here, 1%).
  • Reproduction just copies a solution unchanged from the previous generation.

When picking solutions for any of these techniques, Wallace Jr. uses a basic form of tournament selection. In tournament selection, you take a random subset of the population and pick the element with the highest fitness. This way, fitter solutions will get selected more frequently than less fit ones, but there’s still some randomness in the process. After all, if you always picked the fittest organism, you’d eliminate all the diversity in the population.

The metric lines specify addresses in game memory that store interesting values. Here, it’s the health of Mega Man and of the boss of the stage. Each is just one byte, which makes things easy. These values are actually ignored by Wallace Jr. itself; instead, my hacked copy of Mednafen uses these to decide what values to output after each frame.

For the actual fitness function, I decided to just specify those in Python instead of coming up with some way to express it in the configuration file. Here’s what has in it:

#! /usr/bin/env python
def compute_fitness (current_metrics, old_metrics, context):
    Reward damage done to the enemy and health preserved.  Penalize runs that
    end before someone dies via a penalty for frames where both are still
    alive.  Also try to penalize activation of the door glitch by watching
    for increases in health, which indicate death-by-spikes.
    def increased (key):
        return key in old_metrics and current_metrics[key] > old_metrics[key]
    fitness = current_metrics["health"] - current_metrics["enemy"]
    if "battle_started" not in context:
        fitness -= 500
    if current_metrics["health"] > 0 and current_metrics["enemy"] > 0:
        fitness -= 100
    if increased ("health"):
        fitness -= 500
    return fitness
def should_terminate (current_metrics, old_metrics, context):
    Terminate once someone dies or the aforementioned sign of the door
    glitch is observed.
    def dropped_to_zero (key):
        return current_metrics[key] == 0 and key in old_metrics and old_metrics[key] > 0
    def increased (key):
        return key in old_metrics and current_metrics[key] > old_metrics[key]
    if "battle_started" not in context and increased ("enemy") and current_metrics ["enemy"] == 28:
        context["battle_started"] = True
    return dropped_to_zero ("health") or dropped_to_zero ("enemy") or increased ("health")

The interface between Wallace Jr. and the fitness code can definitely stand to be improved. The functions are passed the current values of the metrics, the previous values of the metrics, and a context value it’s free to manipulate as it chooses to save information between calls. Most of the complexity is to handle the cases I mentioned in the previous post about avoiding undesirable outcomes like running out of input or activating the door glitch.

The astute reader will notice that I ran things out to 50 generations, but the video only shows examples up through Generation 13. Why? This:

Fitness of Wallace Jr. run

Things kind of hit a wall after Generation 13, with no real improvement any way you look at things. I suspect this was due to a combination of a fairly small generation size (200) and having 80% of each generation produced by crossover of existing solutions, with only 10% undergoing a handful of point mutations. Presumably, around the time Generation 13 was reached, the population had become a sort of inbred monoculture with little diversity between organisms, meaning it would take a long time for beneficial mutations to arise.

Sadly, I don’t have any way to go back and test this hypothesis, since the only organisms I saved were the peak performers of each generation. It may be worthwhile for later versions to compute Hamming distances or something like that to get a measure of how much diversity is in the population, and see how that changes over time.

One big limitation to Wallace Jr. currently is that it evolves input sequences, which are inherently going to be very specific to the initial state of the game when the solution starts playing it. In other words, if I played up to Air Man in a different playthrough and handed control off to the peak performer, it wouldn’t do nearly as well, since the state of the game wouldn’t match up with what it “expects”.

It’s like if you memorized the directions for getting to the store as a series of turns to make at each intersection — right, left, straight, right, right, etc. You’d do fine, unless I had you start a block or two over from your expected starting point. Suddenly, the directions are no longer correct.

What would be more interesting is to evolve programs to play the game, rather than mere input sequences. Of course, this would be a fair bit more difficult to implement, since now the solutions will need a way to observe the state of the game as part of their computation of what button input to produce. One way to do this would be to let it inspect the part of PPU memory where sprite information is stored, sort of letting it “read” what sprites are on the screen and where they’re at. The tricky part is getting Wallace Jr. and Mednafen to talk to each other like that while the emulation is going on — right now Wallace Jr. generates a Mednafen movie file and hands it off to Mednafen to play.

It’s not insurmountable, but having programs talk back and forth to each other is always a bit tricky, especially to avoid deadlocks: A is waiting for B to say something, and B is waiting for A to say something. Especially when B was never intended to talk to A to begin with.

If you’re interested in learning more about the theory behind genetic algorithms and genetic programming, you may want to take a look at A Field Guide to Genetic Programming, which is a freely downloadable book.

2 Responses

  1. First and foremost very fascinating.

    I’d also like to add you explain things very clearly in regards to the program and programing. I no longer feel a compulsion to perpetually yell ifelsefor.

  2. SO COOL

    Also, I second Benji – you do a great job of explaining things!

Comments are closed.