LIFO the Party

Time for your irregularly scheduled Wallace update.

As you’ll recall, there are two key things you need for genetic programming (or, more generally, any evolution-like process): chunks of information that gets replicated imperfectly, and a metric for evaluating each chunk’s fitness; the less-fit chunks are discarded, and the more-fit chunks go through the process again.

Well, we now have a data representation for our “solutions” to the “playing Super Mario Bros.” problem. While a simple sequence of button presses could’ve worked, it feels like we’re working with recordings rather than organisms. (Actually, we would be; an FCEU movie file is little more than a compressed sequence of button presses.) So if we want better verisimilitude with biological evolution, we want our candidate solutions to be more “alive” than that.

So, why not make our solutions computer programs themselves?

Before going into too great of detail how I’ve gone about doing this, here’s an example of a very simple program playing Super Mario Bros., and doing surprisingly well considering it was my first attempt. (There’s no genetic programming going on here; the program playing in the video below was hand-coded.)

Mario v. a quick-and-dirty program to try to play it

Ogg Theora video, 938 KB, 37 seconds. Ogg codecs for Windows might be found here.

Over the weekend I’ve implemented a quick-and-dirty virtual machine in which you can write programs that try to play Nintendo games. The VM provides some key services:

  • A simple instruction set similar to machine code to build the programs out of.
  • A means for sending button presses to the emulator.
  • A means for reading the game’s memory.
  • A sandbox to prevent a misbehaving program from doing any damage to the actual system.

Unlike the instruction set probably used by the computer you’re sitting in front of right now, the VM is a stack-based (as opposed to a register-based) machine. There’s a stack that just about every instruction uses; each instruction pops its arguments off the stack, does its work, and pushes the results onto the stack.

This kind of architecture saves us from having to worry about what addressing modes we need for each instruction; everything simply uses the stack. Granted, since the VM I’ve written can work with both 8-bit and 16-bit values (the NES uses an 8-bit processor, but we need 16 bits to easily address all memory locations), the instructions do worry a little about the types of data on the stack. Of course, it would be possible to store type information in the stack as well, but then the instructions all need to be clever about how they handle each possible data type, which makes implementing the VM more error-prone.

Anyway, I’m sure you’re dying to see what program generated that video that manages to get all the way to the halfway point of World 1-1. Well, here it is, converted to an assembly language-style format:

start:     push_i8u 0x82       ; RIGHT + B buttons
           pop_r               ; write to output register
mainloop:  push_i8u 20         ; will loop for 20 vblank periods
waitloop:  pause               ; wait for next vblank period
           push_i8u 1          ; decrement our counter ...
           sub_i8u             ;   ... by one
           dup                 ; push a copy of the result
           push_i16 waitloop   ; if counter isn't zero ...
           cja                 ;   ... go back to waitloop
           pop                 ; throw away counter
           push_r              ; get our current output
           push_i8u 0x01       ; A button ...
           xor_8               ;   ... gets toggled
           pop_r               ; write to output register
           push_i16 mainloop   ; and go back to top ...
           ja                  ;   ... of mainloop

OK, so I lied; all instructions except the push immediate instructions get their arguments from the stack.

Anyway, as you can see from the comments, all this program does is hold the Right and B buttons constantly, and every 20 emulation frames (about every 1/3 of a second) it either presses or releases the A button. As a result, Mario constantly runs to the right and jumps periodically. It’s surprising how far this 16-line dumb-as-a-post program gets on its second life!

So now all that’s left is a way to generate new programs for the VM, making them reproduce and mutate, and pitting them against each other in a battle-to-the-death.