Going in reverse

In case you’re wondering what I’ve been up to with the relative silence here lately, for some reason I’ve decided to write a reverse engineering tool for NES games. This might eventually lead to answering a long-standing question, or even lead into a revival of Wallace.

One of the challenges with writing such a tool is figuring out just what the various bytes in a ROM image even mean. In most operating systems, the internal structure of executable files contains some structural information. In particular, the executable code itself (aka .text, counterintuitively) is stored separately from global data (aka .data) that the executable code uses. So right off the bat, you can find the machine code and disassemble it back into something fairly readable.

No such luck with a NES ROM image. While there is segmentation in the file format, it’s based on whether the memory bank in question is part of CPU memory (PRG, or “program”, banks) or PPU memory (CHR, or “character”, banks). CHR banks contain the pattern tables that store the game’s graphics, so those are pretty straightforward to interpret.

However, PRG banks store both code and data in whatever layout the developers chose to. All you have to go by is the fact that the uppermost 6 bytes of CPU memory contain pointers to three pieces of code: the non-maskable interrupt handler (invoked by the PPU), the reset handler (invoked after power-on or reset), and the software interrupt handler (invoked by the BRK instruction). That’s all you know for sure about the memory layout.

In principle, this is enough. You can start at the addresses indicated by those pointers and start reading off instructions, following any branches or jumps along the way. We aren’t actually executing anything, just stepping through the structure of the code to find where all the instructions are. We don’t know what the program will actually do when it executes — for example, we don’t know which way a branch will go — but for our static analysis, we can just trace both code paths until we run out of code. When we’re done, everything we’ve reached is an instruction and can be disassembled, and anything else is data (or filler).

Of course, since nothing can be easy, this static control flow analysis fails if the program does anything tricky. Chances of running into something tricky are pretty good. For example, take a look at this bit of code from a full disassembly of Super Mario Bros.:

      lda OperMode     ;this is the heart of the entire program,
      jsr JumpEngine   ;most of what goes on starts here
      .dw TitleScreenMode
      .dw GameMode
      .dw VictoryMode
      .dw GameOverMode

JSR is the “jump to subroutine” instruction: it pushes the address of the following instruction onto the stack and jumps to the specified address. When an RTS (“return from subroutine”) is executed, that return address gets popped off the stack and jumped to. So, normally, the code after a JSR will get executed once the called subroutine does an RTS.

But clearly, that’s not what’s happening here, since we have eight bytes of data immediately following the JSR, instead of code. In particular, four 16-bit addresses. Needless to say, my current executable-code-detection algorithm chokes on this. What’s going on? The code for the JumpEngine “subroutine” reveals all:

       asl          ;shift bit from contents of A
       pla          ;pull saved return address from stack
       sta $04      ;save to indirect
       sta $05
       lda ($04),y  ;load pointer from indirect
       sta $06      ;note that if an RTS is performed in next routine
       iny          ;it will return to the execution before the sub
       lda ($04),y  ;that called this routine
       sta $07
       jmp ($06)    ;jump to the address we loaded

For those who don’t speak 6502, this is basically a clever way of implementing a jump table. On entry, the A register contains the index of the table, and the top of the stack contains the address of the jump table itself. The routine multiplies A by 2 (the size in bytes of an address) and stores it in index register Y. It then writes the Yth entry in the jump table to memory address $0006, and then jumps to the address stored there.

The slightly convoluted routine to implement this is needed to get around the limitations of the 6502 processor — in particular, the fact that registers are too small to store an address. But for our purposes, this sort of thing demonstrates that we can’t blindly assume a JSR invokes a “normal” subroutine — we have to check whether the alleged subroutine does any kind of stack trickery like that, and if it does, not assume that the instructions following the JSR will get called.

Of course, since now I have a need to do two kinds of static analysis (the “executable code detector” and the “tricky subroutine detector”), ad hoc navigation methods through the PRG banks won’t cut it. Luckily, it looks like the Boost Graph Library will give me a fairly easy way to make the PRG banks look like a proper graph (without actually having to build one explicitly), at which time I can implement static analysis routines using standard graph traversal algorithms. That shouldn’t be too hard — most problems there will probably come from learning how to write the necessary wrappers around my classes.

Also, note how I’ve been talking as though I know for certain where each of the PRG banks will appear in memory. Well, nothing can be easy. NES ROMs make use of mappers, which swap different PRG and CHR banks into memory at runtime, sort of like how a virtual memory system does paging. Dozens of different mappers exist, each with different possible behavior. With just static analysis, all I can go off of is knowing which set of banks might be mapped to which ranges of addresses, without knowing which combinations will actually exist at runtime.

In fact, if writing an emulator, implementing the CPU is arguably the easiest part. The real pain comes in trying to implement each of the mappers, which vary widely from ROM to ROM based on whichever crazy hardware the developers used to get around the Nintendo’s memory limitations. No standardization whatsoever.

Of course, there’s lots of other challenges to reverse engineering a ROM. I haven’t even covered everything about just figuring out which bytes are for what — figuring out the text table, for example, has its own interesting issues. And that’s before we get to the point of actually figuring out what the ROM does when you run it.

If this were a program to generate driving directions, it’s still at the point of trying to figure out which squiggles on the map are roads and which are rivers. A critical thing, sure, but still a long way from the goal.