Wallace Jr. Grows On Trees

I’ve updated Wallace Jr. to support actual genetic programming, instead of just genetic algorithms. That is, Wallace Jr. now generates programs that try to play video games, instead of just generating predefined controller input sequences.

Naturally, the language used by Wallace Jr. for its evolved programs is extremely limited. In fact, it’s really more of a language for writing expressions rather than what you’d typically consider a “program”: there’s no variables or subroutines or even loops. Here’s the entire (informal) specification of the language:

Expression Definition
0x00, 0x01, …, 0xff Constants in the range 0 – 255
(+ a b) Add a and b, modulo 256
(- a b) Subtract b from a, modulo 256
(* a b) Multiply a and b, modulo 256
(& a b) Bitwise AND a and b
(| a b) Bitwise OR a and b
(^ a b) Bitwise XOR a and b
(ifgt a b t f) If a > b, return t; otherwise, return f
(read a) Return the byte at address a in sprite RAM

That’s it. By combining those operations, you can read data from sprite RAM (which among other things, will tell you where each sprite on the screen is at), do computations on them, and ultimately return a one-byte value interpreted as the next controller input.

Here’s an example of what a program written in this language would look like:

(& (+ 0x1b (& (- (| 0x77 (^ 0x60 (read 0x81))) 0x23) (& 0x10 (& 0x28 (read 0xbe))))) (^ 0x56 (ifgt 0x62 (read (ifgt (| (read 0xa8) 0xfe) 0x93 0x30 0x34)) 0x23 (ifgt (+ 0xbd (read 0x70)) (^ 0xef (- (read 0x33) 0x4d)) (+ (& 0x27 (& 0x78 (ifgt (read 0x58) 0x41 0x76 0xb6))) (^ (read 0xe7) 0x9c)) (+ 0xb0 (read 0x08))))))

Strictly speaking, none of the parentheses are necessary. Since the language uses Polish notation and each operation has fixed arity, parsing an expression is unambiguous even without parentheses — an advantage not held by the more conventional (to the average person, anyway) use of infix notation for arithmetic operations.

In other words, the above mess means the same thing as the following mess, where parentheses have been removed to, um, change readability. Whether readability is improved or hindered is left as an exercise for the reader.

& + 0x1b & - | 0x77 ^ 0x60 read 0x81 0x23 & 0x10 & 0x28 read 0xbe ^ 0x56 ifgt 0x62 read ifgt | read 0xa8 0xfe 0x93 0x30 0x34 0x23 ifgt + 0xbd read 0x70 ^ 0xef - read 0x33 0x4d + & 0x27 & 0x78 ifgt read 0x58 0x41 0x76 0xb6 ^ read 0xe7 0x9c + 0xb0 read 0x08

Maybe just drawing it as an expression tree would help:

Generation 3 top performer, optimized version
(click for full version)

In the tree, each term appears as a node, with the arguments to that term shown as children of the node. Evaluation of the expression goes from the bottom up; as new subterms are evaluated, the results get passed up until you reach the root node at the top, at which time you have the value of the expression.

The above expression was one generated in a run of Wallace Jr. in the fourth generation using pretty much the same input as last time, only using trees instead of controller input sequences. This time, the configuration looked something like this:

mednafen        /home/paul/mednafen-eval/instdir/bin/mednafen
 
buttons         right left up down b a
rom             /home/paul/roms/nes/Mega Man 2 (U).nes
initial_state   /home/paul/.mednafen/mcs/Mega Man 2 (U).0527a0ee512f69e08b8db6dc97964632.nc0
organism_type   tree
organism_size   1000
organism_depth  7
subtree_depth   4
granularity     5
population_size 500
generations     30
parallelism    2
debug           no
export_tail    1200
 
metric          health    0x06c0 1
metric          enemy     0x06c1 1
functions       megaman2.py
 
tournament_size             50
homologous_crossover_rate   0
arbitrary_crossover_rate    80
point_mutation_rate         5
subtree_mutation_rate       5
reproduction_rate           10
 
point_mutation_probability  0.05
crossover_leaf_probability  0.20

Here, Wallace Jr. is starting off with 500 randomly generated expression trees of maximum depth 7 — i.e., each starting tree has one root and up to seven levels below it. Each new generation is produced through crossover (swapping one node [and its children] with a node [and its children] from another tree), point mutation (randomly changing the content of some nodes without changing its arity), subtree mutation (replacing one node and its children with a randomly generated subtree), and reproduction (copying unchanged from the previous generation).

The fitness graph over 30 generations looked like this:

Fitness over 30 generations

It’s interesting how quickly things plateaued at a maximum fitness of 12 so quickly and never improved from there. Something similar happened last time, but not nearly so quickly. I’m not entirely sure why there was no improvement past a fitness of 12, but I suspect it’s because doing better than that actually requires fairly precise play. A fitness of 12 is about what you wind up with if you consistently deal damage to Air Man but get hit by a couple tornadoes between his jumps.

A little manual play lends some support to this hypothesis: taking the controller myself, my own fitness ranged from 12-16 most of the time, with 20 being a personal best. That 20 was very hard to come by, requiring some very carefully timed jumps to avoid near-unavoidable tornadoes. In this scenario, a fitness of 12 could be the point of diminishing returns, where each incremental improvement in fitness starts being much harder to come by.

Actually, when I said the expression I showed earlier was one generated by Wallace Jr., I lied. It’s actually an optimized version of this mess, which was the top performer in the fourth generation (and the first to achieve a fitness of 12):

Top performer of generation 3
(click for full version)

It’s quite a bit larger than the optimized version, since it’s doing silly things like computing (& 0x19 0x4a) each time instead of just using a constant 0x08. That’s pretty much all the optimizer in Wallace Jr. does, mainly to make the trees it generates more readable, or at least less unreadable, by simplifying things as much as possible.

After the third generation, there wasn’t any improvement in peak fitness. However, the expressions being generated tended to get bigger and bigger. Here’s the top performer of the thirtieth generation (or really, one of the many many trees tied for top performer), optimized:

Generation 29 top performer, optimized
(click for full version)

And in its full, unoptimized glory:

Generation 29 top performer, unoptimized
(click for full version)

Interestingly, if you compare the optimized versions of the Generation 29 top performer with the Generation 3 top performer shown above, you’ll see that the top four levels of the tree are, with the exception of one node, identical. This suggests the Generation 29 winner is descended from the Generation 3 winner. It would be interesting to study the full scope of their similarities, and the similarities with the winners in the intervening generations, to what parts of the expression are so seemingly essential that they’ve been preserved from one generation to the next.

For the time being, doing so is also left as an exercise for the reader.

8 Responses

  1. I have a feeling that if you were to incorporate some notion of state variables, it would be able to do a lot better. Like, add two operators, (write A B) to write byte B into address A, and (read A B) to do the obvious thing. Then it’d be able to record a rudimentary amount of history and perhaps learn about movement patterns and so on.

    Careful not to cause the singularity, though. I’d like my post-singularity life to involve things that aren’t Mega Man 2.

  2. Oh yeah, another thing is in traditional genetic programming, the fitness function should rate a small program higher than a large program. So, if a 10-operation program gets the same score as a 20-operation program, the 10-operation one should defeat the 20-operation one. I suspect that having a bunch of “junk DNA” around might actually be beneficial though.

  3. First, I agree with fluffy’s second comment… he just beat me to it… bastard…

    Second, it had been awhile since I read anything, so I had to go back to find your fitness function, and I’m not sure I understand why you penalize runs that don’t end in a death. If I understand it correctly, you effectively penalize it to the extent that it won’t continue to the next generation (although I was admittedly too busy (read: lazy) to look and see how you select for reproduction)… But just because no one died doesn’t necessarily mean that it has no merit; it may be on the verge of an ass-kicking given a little more development. Or am I just missing something here because of the aforementioned business (read: laziness)?

    Whatever the case, I enjoy reading about it…

  4. I’d be okay with Mega Man Singularity

  5. It sounds like if you’re plateauing so quickly it’s time to update the fitness function. Something like losing health earlier is worse than losing health later, or winning faster is better than winning slower, or something that will differentiate the umpteen “identical” performances.

  6. If the gene pool is large enough, it shouldn’t matter that lack-of-death is penalized, since in theory the lack-of-death ones won’t be immediately wiped out. It depends on how Wallace is actually breeding things, though – the best approach I’ve seen is to randomly select two samples from the population, breed them to produce some offspring, and then have all of them Do Their Thing, and of them only the top few survive (in the case of this problem they don’t have to rerun the simulation, though, since they should always get the same result every time). This is better than simply skimming off the top, since then you avoid the local-maximum problem.

    Basically the breeding strategy itself is a very vital part that determines how well the genetic programing works out. (Of course, this is something that farmers have known for millennia.)

  7. Wow, lots of comments. Here goes:

    Keeping state: I was going to save discussion on future improvements for another post, since this one was getting long enough. Suffice it to say for now that only being able to look at the sprite RAM in the current frame has several limitations.

    Penalizing large organisms: Wallace Jr. can’t do this yet — the fitness function doesn’t have any access to the organism itself — but it’s not hard to implement. If anything, the penalty for being huge would be small relative to the scoring for actual performance. Like fluffy said, having “junk” in the “genome” can serve as a way to accumulate novel features without immediately impacting fitness, which is why the breeding only uses the unoptimized versions of the organisms.

    Requiring death: See this post for my motivation for penalizing runs that don’t end in someone dying. The short answer is, evolution gamed the fitness metric, finding ways to get a high score early on with things that looked nothing like expertly avoiding getting hit. Since running out of input is impossible with trees, it’s not as much as a concern, but I still want to avoid getting stuck in the door glitch local maximum.

    Breeding: The run I discuss in this post used 80% crossover, 5% subtree mutation, 5% point mutation, and 10% error-free cloning (aka reproduction). It’s there in the config file, but I never came out and said it. Selection is by tournament selection of size 50 (i.e., pick 50 at random and choose the fittest). There’s lots of different breeding techniques. Although Wallace Jr. only currently implements a few simple ones, it’s designed to facilitate adding new ones.

  8. And now that I’ve approved Wes’s comment (no idea why it got caught in the moderation queue): that’s also a definite possibility.

Comments are closed.