Game changer

One of the recurring issues with Wallace, in those rare times when I’m actually working on it, is the difficulty of interfacing Wallace with the NES emulation engine. Emulators generally aren’t written with the goal of being driven by another program; as a result my most recent effort at it involved a lot of ugly hacks to Mednafen to make it spew some useful data to stdout, and to cleverly disguise controller inputs generated by Wallace as a movie file.

However, I discovered by accident while wasting time I could’ve been spending on Panflute instead of browsing the epic time sink that is TV Tropes that some modern console emulators have built-in scripting support, and in FCEUX‘s case, the scripting support is also part of the often-relatively-neglected Linux version. Said scripting support offers a relatively straightforward way to write add-on code that mucks around with the game running inside the emulator, which would be orders of magnitude more elegant and maintainable than the approaches Wallace has taken up until now.

What do I mean by “muck around”? Well, you could add a way to drag-and-drop enemies in Super Mario Bros.:

Or give Mega Man kill-anything laser eyes (maybe he defeated Mr. Flibble Man?) in Mega Man 2:

Or replace the control scheme in Super Mario Bros. 3 with the one in Kirby: Canvas Curse, drawing platforms and obstacles on the screen using the mouse instead of having direct control over Mario:

Or add head-to-head network play to Nintendo’s Tetris game. You know, the one that doesn’t actually have a multiplayer mode to begin with:

Note that all that craziness is being implemented using Lua scripts in the emulator, without doing any hacking of the ROMs themselves. In FCEUX, loading a Lua script gives it control of the emulation loop and do pretty much whatever it wants, including poking around in game memory and messing with controller inputs but also doing anything else that can be done in Lua.

For the time being I’m not working on porting Wallace to FCEUX+Lua, but that’s mostly because I need to get Panflute in a releasable state and resurrecting Wallace right now would be too great a distraction.

Your daily dose of distraction

First: in case you ever complained that Mega Man 2 didn’t have enough rap in it, here you go [thanks to Josh for alerting me to this]:

(Their Final Fantasy rap is also pretty good, if you’re into that sort of thing.)

Next: in case you ever complained that Super Mario World didn’t have enough stuff-happening-even-though-you’re-too-lazy-to-press-any-of-the-buttons, take a look at this ROM hack:

Even with the level-editing tools that are out there, it’s impressive to imagine how much work must’ve gone into the level design to pull that off.

Finally, it’s unfortunate that I hadn’t been reading MS Paint Adventures until now. It’s what you’d get if you crossed a webcomic with an old-school adventure game. Is it weird if what’s sold me on it is how the character’s inventory system in the current “game” is explicitly based on a stack implemented in a circular buffer? And how it’s suggested it’s possible to upgrade to something more featureful:

EB: it’s so frustrating.
TG: whats your modus
EB: what?
TG: how do you retrieve artifacts from it
EB: oh. like one at a time i guess. and if i put too much in, something falls out.
TG: stack?? hahahahahaha
EB: what is yours?
TG: hash map
TG: my bro taught me a few tricks he basically knows everything and is awesome
EB: what the hell is that?
TG: you should probably brush up on your data structures

While I’d probably deque anyone who made a real game with such an obnoxious inventory system, in comic form it’s awesome. It may be the character’s birthday, but he won’t be LIFO the party with just that.* Hopefully whatever upgrades are in store will let him skip lists entirely and explore the rest of the wide array of options out there. Because dude, a stack? With that limited interface he’ll be in a heap of trouble. As What’s Her Face would say, DAG, yo.

Of course, in real life we have data structure based inventory systems too. We’re typically limited to a pair of five-element finger trees, sometimes augmented with a bag.

* Yes, I know that technically having a stack would very much make him LIFO the party almost by definition, but I’m trying to make a series of data structure puns here. If you trie it yourself, you’ll find it’s harder than it looks.

Wallace Jr. Grows On Trees

I’ve updated Wallace Jr. to support actual genetic programming, instead of just genetic algorithms. That is, Wallace Jr. now generates programs that try to play video games, instead of just generating predefined controller input sequences.

Naturally, the language used by Wallace Jr. for its evolved programs is extremely limited. In fact, it’s really more of a language for writing expressions rather than what you’d typically consider a “program”: there’s no variables or subroutines or even loops. Here’s the entire (informal) specification of the language:

Expression Definition
0x00, 0x01, …, 0xff Constants in the range 0 – 255
(+ a b) Add a and b, modulo 256
(- a b) Subtract b from a, modulo 256
(* a b) Multiply a and b, modulo 256
(& a b) Bitwise AND a and b
(| a b) Bitwise OR a and b
(^ a b) Bitwise XOR a and b
(ifgt a b t f) If a > b, return t; otherwise, return f
(read a) Return the byte at address a in sprite RAM

That’s it. By combining those operations, you can read data from sprite RAM (which among other things, will tell you where each sprite on the screen is at), do computations on them, and ultimately return a one-byte value interpreted as the next controller input.

Here’s an example of what a program written in this language would look like:

(& (+ 0x1b (& (- (| 0x77 (^ 0x60 (read 0x81))) 0x23) (& 0x10 (& 0x28 (read 0xbe))))) (^ 0x56 (ifgt 0x62 (read (ifgt (| (read 0xa8) 0xfe) 0x93 0x30 0x34)) 0x23 (ifgt (+ 0xbd (read 0x70)) (^ 0xef (- (read 0x33) 0x4d)) (+ (& 0x27 (& 0x78 (ifgt (read 0x58) 0x41 0x76 0xb6))) (^ (read 0xe7) 0x9c)) (+ 0xb0 (read 0x08))))))

Strictly speaking, none of the parentheses are necessary. Since the language uses Polish notation and each operation has fixed arity, parsing an expression is unambiguous even without parentheses — an advantage not held by the more conventional (to the average person, anyway) use of infix notation for arithmetic operations.

In other words, the above mess means the same thing as the following mess, where parentheses have been removed to, um, change readability. Whether readability is improved or hindered is left as an exercise for the reader.

& + 0x1b & - | 0x77 ^ 0x60 read 0x81 0x23 & 0x10 & 0x28 read 0xbe ^ 0x56 ifgt 0x62 read ifgt | read 0xa8 0xfe 0x93 0x30 0x34 0x23 ifgt + 0xbd read 0x70 ^ 0xef - read 0x33 0x4d + & 0x27 & 0x78 ifgt read 0x58 0x41 0x76 0xb6 ^ read 0xe7 0x9c + 0xb0 read 0x08

Maybe just drawing it as an expression tree would help:

Generation 3 top performer, optimized version
(click for full version)

In the tree, each term appears as a node, with the arguments to that term shown as children of the node. Evaluation of the expression goes from the bottom up; as new subterms are evaluated, the results get passed up until you reach the root node at the top, at which time you have the value of the expression.

The above expression was one generated in a run of Wallace Jr. in the fourth generation using pretty much the same input as last time, only using trees instead of controller input sequences. This time, the configuration looked something like this:

mednafen        /home/paul/mednafen-eval/instdir/bin/mednafen
 
buttons         right left up down b a
rom             /home/paul/roms/nes/Mega Man 2 (U).nes
initial_state   /home/paul/.mednafen/mcs/Mega Man 2 (U).0527a0ee512f69e08b8db6dc97964632.nc0
organism_type   tree
organism_size   1000
organism_depth  7
subtree_depth   4
granularity     5
population_size 500
generations     30
parallelism    2
debug           no
export_tail    1200
 
metric          health    0x06c0 1
metric          enemy     0x06c1 1
functions       megaman2.py
 
tournament_size             50
homologous_crossover_rate   0
arbitrary_crossover_rate    80
point_mutation_rate         5
subtree_mutation_rate       5
reproduction_rate           10
 
point_mutation_probability  0.05
crossover_leaf_probability  0.20

Here, Wallace Jr. is starting off with 500 randomly generated expression trees of maximum depth 7 — i.e., each starting tree has one root and up to seven levels below it. Each new generation is produced through crossover (swapping one node [and its children] with a node [and its children] from another tree), point mutation (randomly changing the content of some nodes without changing its arity), subtree mutation (replacing one node and its children with a randomly generated subtree), and reproduction (copying unchanged from the previous generation).

The fitness graph over 30 generations looked like this:

Fitness over 30 generations

It’s interesting how quickly things plateaued at a maximum fitness of 12 so quickly and never improved from there. Something similar happened last time, but not nearly so quickly. I’m not entirely sure why there was no improvement past a fitness of 12, but I suspect it’s because doing better than that actually requires fairly precise play. A fitness of 12 is about what you wind up with if you consistently deal damage to Air Man but get hit by a couple tornadoes between his jumps.

A little manual play lends some support to this hypothesis: taking the controller myself, my own fitness ranged from 12-16 most of the time, with 20 being a personal best. That 20 was very hard to come by, requiring some very carefully timed jumps to avoid near-unavoidable tornadoes. In this scenario, a fitness of 12 could be the point of diminishing returns, where each incremental improvement in fitness starts being much harder to come by.

Actually, when I said the expression I showed earlier was one generated by Wallace Jr., I lied. It’s actually an optimized version of this mess, which was the top performer in the fourth generation (and the first to achieve a fitness of 12):

Top performer of generation 3
(click for full version)

It’s quite a bit larger than the optimized version, since it’s doing silly things like computing (& 0x19 0x4a) each time instead of just using a constant 0x08. That’s pretty much all the optimizer in Wallace Jr. does, mainly to make the trees it generates more readable, or at least less unreadable, by simplifying things as much as possible.

After the third generation, there wasn’t any improvement in peak fitness. However, the expressions being generated tended to get bigger and bigger. Here’s the top performer of the thirtieth generation (or really, one of the many many trees tied for top performer), optimized:

Generation 29 top performer, optimized
(click for full version)

And in its full, unoptimized glory:

Generation 29 top performer, unoptimized
(click for full version)

Interestingly, if you compare the optimized versions of the Generation 29 top performer with the Generation 3 top performer shown above, you’ll see that the top four levels of the tree are, with the exception of one node, identical. This suggests the Generation 29 winner is descended from the Generation 3 winner. It would be interesting to study the full scope of their similarities, and the similarities with the winners in the intervening generations, to what parts of the expression are so seemingly essential that they’ve been preserved from one generation to the next.

For the time being, doing so is also left as an exercise for the reader.

More on Wallace Jr.

To elaborate somewhat on my previous post, this is the input file I gave to Wallace Jr. in the run used to ultimately generate that video:

mednafen        /home/paul/mednafen-eval/instdir/bin/mednafen
 
buttons         right left up down b a
rom             /home/paul/roms/nes/Mega Man 2 (U).nes
initial_state   /home/paul/.mednafen/mcs/Mega Man 2 (U).0527a0ee512f69e08b8db6dc97964632.nc0
organism_size   1000
granularity     5
population_size 200
generations     50
parallelism    2
 
metric          health    0x06c0 1
metric          enemy     0x06c1 1
functions       megaman2.py
 
tournament_size             50
homologous_crossover_rate   40
arbitrary_crossover_rate    40
point_mutation_rate         10
reproduction_rate           10
 
point_mutation_probability  0.01

Note that the run used populations of 200 candidate solutions per generation. Each initial solution was 1,000 inputs long, with input granularity set to 5 frames — i.e., the same button input would be used for 5 consecutive frames (about 1/12 of a second) before advancing to the next. This was used to prevent solutions from being to spastic.

The various _rate settings specify how to generate new solutions from existing ones:

  • Homologous crossover takes one solution, deletes a randomly selected range within it, and replaces it with the corresponding range from another solution.
  • Arbitrary crossover is similar, but doesn’t require the two ranges to correspond to one another.
  • Point mutation randomly mutates each input with some small probability (here, 1%).
  • Reproduction just copies a solution unchanged from the previous generation.

When picking solutions for any of these techniques, Wallace Jr. uses a basic form of tournament selection. In tournament selection, you take a random subset of the population and pick the element with the highest fitness. This way, fitter solutions will get selected more frequently than less fit ones, but there’s still some randomness in the process. After all, if you always picked the fittest organism, you’d eliminate all the diversity in the population.

The metric lines specify addresses in game memory that store interesting values. Here, it’s the health of Mega Man and of the boss of the stage. Each is just one byte, which makes things easy. These values are actually ignored by Wallace Jr. itself; instead, my hacked copy of Mednafen uses these to decide what values to output after each frame.

For the actual fitness function, I decided to just specify those in Python instead of coming up with some way to express it in the configuration file. Here’s what megaman2.py has in it:

#! /usr/bin/env python
 
def compute_fitness (current_metrics, old_metrics, context):
    """
    Reward damage done to the enemy and health preserved.  Penalize runs that
    end before someone dies via a penalty for frames where both are still
    alive.  Also try to penalize activation of the door glitch by watching
    for increases in health, which indicate death-by-spikes.
    """
 
    def increased (key):
        return key in old_metrics and current_metrics[key] > old_metrics[key]
 
    fitness = current_metrics["health"] - current_metrics["enemy"]
    if "battle_started" not in context:
        fitness -= 500
    if current_metrics["health"] > 0 and current_metrics["enemy"] > 0:
        fitness -= 100
    if increased ("health"):
        fitness -= 500
    return fitness
 
def should_terminate (current_metrics, old_metrics, context):
    """
    Terminate once someone dies or the aforementioned sign of the door
    glitch is observed.
    """
 
    def dropped_to_zero (key):
        return current_metrics[key] == 0 and key in old_metrics and old_metrics[key] > 0
    def increased (key):
        return key in old_metrics and current_metrics[key] > old_metrics[key]
 
    if "battle_started" not in context and increased ("enemy") and current_metrics ["enemy"] == 28:
        context["battle_started"] = True
 
    return dropped_to_zero ("health") or dropped_to_zero ("enemy") or increased ("health")

The interface between Wallace Jr. and the fitness code can definitely stand to be improved. The functions are passed the current values of the metrics, the previous values of the metrics, and a context value it’s free to manipulate as it chooses to save information between calls. Most of the complexity is to handle the cases I mentioned in the previous post about avoiding undesirable outcomes like running out of input or activating the door glitch.

The astute reader will notice that I ran things out to 50 generations, but the video only shows examples up through Generation 13. Why? This:

Fitness of Wallace Jr. run

Things kind of hit a wall after Generation 13, with no real improvement any way you look at things. I suspect this was due to a combination of a fairly small generation size (200) and having 80% of each generation produced by crossover of existing solutions, with only 10% undergoing a handful of point mutations. Presumably, around the time Generation 13 was reached, the population had become a sort of inbred monoculture with little diversity between organisms, meaning it would take a long time for beneficial mutations to arise.

Sadly, I don’t have any way to go back and test this hypothesis, since the only organisms I saved were the peak performers of each generation. It may be worthwhile for later versions to compute Hamming distances or something like that to get a measure of how much diversity is in the population, and see how that changes over time.

One big limitation to Wallace Jr. currently is that it evolves input sequences, which are inherently going to be very specific to the initial state of the game when the solution starts playing it. In other words, if I played up to Air Man in a different playthrough and handed control off to the peak performer, it wouldn’t do nearly as well, since the state of the game wouldn’t match up with what it “expects”.

It’s like if you memorized the directions for getting to the store as a series of turns to make at each intersection — right, left, straight, right, right, etc. You’d do fine, unless I had you start a block or two over from your expected starting point. Suddenly, the directions are no longer correct.

What would be more interesting is to evolve programs to play the game, rather than mere input sequences. Of course, this would be a fair bit more difficult to implement, since now the solutions will need a way to observe the state of the game as part of their computation of what button input to produce. One way to do this would be to let it inspect the part of PPU memory where sprite information is stored, sort of letting it “read” what sprites are on the screen and where they’re at. The tricky part is getting Wallace Jr. and Mednafen to talk to each other like that while the emulation is going on — right now Wallace Jr. generates a Mednafen movie file and hands it off to Mednafen to play.

It’s not insurmountable, but having programs talk back and forth to each other is always a bit tricky, especially to avoid deadlocks: A is waiting for B to say something, and B is waiting for A to say something. Especially when B was never intended to talk to A to begin with.

If you’re interested in learning more about the theory behind genetic algorithms and genetic programming, you may want to take a look at A Field Guide to Genetic Programming, which is a freely downloadable book.

The return of Wallace!

For whatever reason, I recently got it in my head to take another stab at my long-since-abandoned effort to apply genetic algorithms to the utterly pointless task of playing old video games. Here’s the result:

The gruntwork seen above was done using Wallace Jr., which as its name suggests, is a significantly stripped-down version of Wallace, my original attempt from a couple years back. Wallace’s ultimate failing was trying to do a little bit of everything: being a graphical emulator front-end, reverse engineering memory locations, doing the genetic algorithm itself, video encoding, etc. As a result, it did none particularly well, and continuing problems trying to shoehorn GStreamer into the processing pipeline ultimately led to its abandonment.

Instead, Wallace Jr. just does the genetic algorithm stuff: generating organisms, evaluating them, and spawning new generations. That’s it. The emulation itself is passed off to a modified version of Mednafen — specifically, a version hacked to enable noninteractive use and to dump interesting memory values while executing. The actual reverse engineering bit is left to something else; it accepts a list of memory locations to use as metrics when evaluating fitness, but leaves it up to you to find a way to figure them out.

As a result, Wallace Jr. accomplishes its core mission without getting bogged down in other things. Also, by offloading emulation into a separate process, it has the nice side benefit of enabling parallelism, since for whatever reason emulator authors seem to love global variables for all important state, which pretty much eliminates any hope of emulating more than one thing at a time in the same memory space.

Wallace Jr. is still pretty basic, but it’s still complete enough to generate the results in the video, wherein a set of controller inputs capable of defeating Air Man in Mega Man 2 in just 11 generations. In that demonstration, fitness is basically defined as Mega Man’s health minus Air Man’s health, with the goal of maximizing the fitness measure.

I say “basically” because genetic algorithms, much like biological evolution, is prone to finding clever ways to solve problems, and the naive fitness measure I just described lets some of these slip through. In particular, if input ends before either Air Man or Mega Man dies, the fitness will still be the difference between their health meters. If input ends really early, like, say, before the battle starts when Air Man has nominally zero health, the outcome is the optimal fitness measure of 28, even though that solution really isn’t what we want. The real fitness measure I used checks for that, and throws in a -100 penalty if input ends before a death, and a -500 penalty on top of that if the battle hasn’t even begun yet.

More amusingly, seen in the “outtake” at the end of the video, in one of the runs I attempted while developing Wallace Jr., it managed to randomly hit upon controller input that activated the Air Man door glitch. I didn’t even know that was possible without using Item 1, but apparently it is. I had to adjust the fitness function to penalize that as well.

I hope you appreciate that video, since it took a ridiculous amount of effort to make. For starters, Mednafen doesn’t have a way to output a normal video file — the movies it makes are just recordings of controller input. Hacking in a way to dump the raw audio samples to a file was simple enough, but video proved to be more problematic; raw video data is too large to write to a file without making the emulator horrifically I/O-bound, disrupting emulation as it tries to frame-skip to recover. Integrating a true video encoder into Mednafen was more work than I wanted to do, so I took the extraordinarily hackish solution of adding a way to dump PNG screenshots every couple frames (since doing it every frame ran into the same I/O issues).

Luckily, MEncoder has a way to convert a series of PNGs into a more conventional video file, but that still left the issue of stitching them all together into a single movie. I managed to use PiTiVi for that, despite its best efforts to be a bug-ridden half-working pile of something you’d expect to find Mike Rowe knee-deep in. Sadly, the alternatives for non-linear video editing on Linux are even worse.

The interstitials were a pain to make, since PiTiVi doesn’t provide a way to add captions to video clips. For that, I wound up manually putting together a GStreamer stream via gst-launch-0.10 and the textoverlay element. The trick there was making sure to encode the resulting video stream in a format that GStreamer could then read back when used by PiTiVi. For some reason, GStreamer will happily encode to lossless MJPEG but has no idea how to decode it!

Once that was finally done, I switched back to MEncoder to add the background music. Yes, I know the last note gets cut off at the end, but I wasn’t about to wrestle with PiTiVi again to lengthen the video by a couple seconds, especially given that PiTiVi doesn’t let you save your project! The mind boggles.

Splicing in a separate audio track also nicely hides how my hack of dumping screenshots rapid-fire instead of making a proper video causes synchronization problems when trying to combine it back with the dumped audio. I could never get the framerate quite right, but without the original audio, it’s more difficult to notice in the video.

Plus, it’s an appropriate song, even if its central premise gets refuted by Generation 10.