Wallace Jr. Grows On Trees

I’ve updated Wallace Jr. to support actual genetic programming, instead of just genetic algorithms. That is, Wallace Jr. now generates programs that try to play video games, instead of just generating predefined controller input sequences.

Naturally, the language used by Wallace Jr. for its evolved programs is extremely limited. In fact, it’s really more of a language for writing expressions rather than what you’d typically consider a “program”: there’s no variables or subroutines or even loops. Here’s the entire (informal) specification of the language:

Expression Definition
0x00, 0x01, …, 0xff Constants in the range 0 – 255
(+ a b) Add a and b, modulo 256
(- a b) Subtract b from a, modulo 256
(* a b) Multiply a and b, modulo 256
(& a b) Bitwise AND a and b
(| a b) Bitwise OR a and b
(^ a b) Bitwise XOR a and b
(ifgt a b t f) If a > b, return t; otherwise, return f
(read a) Return the byte at address a in sprite RAM

That’s it. By combining those operations, you can read data from sprite RAM (which among other things, will tell you where each sprite on the screen is at), do computations on them, and ultimately return a one-byte value interpreted as the next controller input.

Here’s an example of what a program written in this language would look like:

(& (+ 0x1b (& (- (| 0x77 (^ 0x60 (read 0x81))) 0x23) (& 0x10 (& 0x28 (read 0xbe))))) (^ 0x56 (ifgt 0x62 (read (ifgt (| (read 0xa8) 0xfe) 0x93 0x30 0x34)) 0x23 (ifgt (+ 0xbd (read 0x70)) (^ 0xef (- (read 0x33) 0x4d)) (+ (& 0x27 (& 0x78 (ifgt (read 0x58) 0x41 0x76 0xb6))) (^ (read 0xe7) 0x9c)) (+ 0xb0 (read 0x08))))))

Strictly speaking, none of the parentheses are necessary. Since the language uses Polish notation and each operation has fixed arity, parsing an expression is unambiguous even without parentheses — an advantage not held by the more conventional (to the average person, anyway) use of infix notation for arithmetic operations.

In other words, the above mess means the same thing as the following mess, where parentheses have been removed to, um, change readability. Whether readability is improved or hindered is left as an exercise for the reader.

& + 0x1b & - | 0x77 ^ 0x60 read 0x81 0x23 & 0x10 & 0x28 read 0xbe ^ 0x56 ifgt 0x62 read ifgt | read 0xa8 0xfe 0x93 0x30 0x34 0x23 ifgt + 0xbd read 0x70 ^ 0xef - read 0x33 0x4d + & 0x27 & 0x78 ifgt read 0x58 0x41 0x76 0xb6 ^ read 0xe7 0x9c + 0xb0 read 0x08

Maybe just drawing it as an expression tree would help:

Generation 3 top performer, optimized version
(click for full version)

In the tree, each term appears as a node, with the arguments to that term shown as children of the node. Evaluation of the expression goes from the bottom up; as new subterms are evaluated, the results get passed up until you reach the root node at the top, at which time you have the value of the expression.

The above expression was one generated in a run of Wallace Jr. in the fourth generation using pretty much the same input as last time, only using trees instead of controller input sequences. This time, the configuration looked something like this:

mednafen        /home/paul/mednafen-eval/instdir/bin/mednafen
buttons         right left up down b a
rom             /home/paul/roms/nes/Mega Man 2 (U).nes
initial_state   /home/paul/.mednafen/mcs/Mega Man 2 (U).0527a0ee512f69e08b8db6dc97964632.nc0
organism_type   tree
organism_size   1000
organism_depth  7
subtree_depth   4
granularity     5
population_size 500
generations     30
parallelism    2
debug           no
export_tail    1200
metric          health    0x06c0 1
metric          enemy     0x06c1 1
functions       megaman2.py
tournament_size             50
homologous_crossover_rate   0
arbitrary_crossover_rate    80
point_mutation_rate         5
subtree_mutation_rate       5
reproduction_rate           10
point_mutation_probability  0.05
crossover_leaf_probability  0.20

Here, Wallace Jr. is starting off with 500 randomly generated expression trees of maximum depth 7 — i.e., each starting tree has one root and up to seven levels below it. Each new generation is produced through crossover (swapping one node [and its children] with a node [and its children] from another tree), point mutation (randomly changing the content of some nodes without changing its arity), subtree mutation (replacing one node and its children with a randomly generated subtree), and reproduction (copying unchanged from the previous generation).

The fitness graph over 30 generations looked like this:

Fitness over 30 generations

It’s interesting how quickly things plateaued at a maximum fitness of 12 so quickly and never improved from there. Something similar happened last time, but not nearly so quickly. I’m not entirely sure why there was no improvement past a fitness of 12, but I suspect it’s because doing better than that actually requires fairly precise play. A fitness of 12 is about what you wind up with if you consistently deal damage to Air Man but get hit by a couple tornadoes between his jumps.

A little manual play lends some support to this hypothesis: taking the controller myself, my own fitness ranged from 12-16 most of the time, with 20 being a personal best. That 20 was very hard to come by, requiring some very carefully timed jumps to avoid near-unavoidable tornadoes. In this scenario, a fitness of 12 could be the point of diminishing returns, where each incremental improvement in fitness starts being much harder to come by.

Actually, when I said the expression I showed earlier was one generated by Wallace Jr., I lied. It’s actually an optimized version of this mess, which was the top performer in the fourth generation (and the first to achieve a fitness of 12):

Top performer of generation 3
(click for full version)

It’s quite a bit larger than the optimized version, since it’s doing silly things like computing (& 0x19 0x4a) each time instead of just using a constant 0x08. That’s pretty much all the optimizer in Wallace Jr. does, mainly to make the trees it generates more readable, or at least less unreadable, by simplifying things as much as possible.

After the third generation, there wasn’t any improvement in peak fitness. However, the expressions being generated tended to get bigger and bigger. Here’s the top performer of the thirtieth generation (or really, one of the many many trees tied for top performer), optimized:

Generation 29 top performer, optimized
(click for full version)

And in its full, unoptimized glory:

Generation 29 top performer, unoptimized
(click for full version)

Interestingly, if you compare the optimized versions of the Generation 29 top performer with the Generation 3 top performer shown above, you’ll see that the top four levels of the tree are, with the exception of one node, identical. This suggests the Generation 29 winner is descended from the Generation 3 winner. It would be interesting to study the full scope of their similarities, and the similarities with the winners in the intervening generations, to what parts of the expression are so seemingly essential that they’ve been preserved from one generation to the next.

For the time being, doing so is also left as an exercise for the reader.