Please do not jump on the table

Last time, I mentioned one obstacle I ran into when writing a reverse engineering tool for Nintendo games. Namely, just because a piece of code is called using the JSR (Jump to SubRoutine) instruction doesn’t mean the code being called actually is a subroutine. Subroutines eventually execute an RTS (ReTurn from Subroutine) instruction to go back to the code immediately following the JSR. This bit of code in Super Mario Bros., however, doesn’t:

JumpEngine:
       asl          ;shift bit from contents of A
       tay
       pla          ;pull saved return address from stack
       sta $04      ;save to indirect
       pla
       sta $05
       iny
       lda ($04),y  ;load pointer from indirect
       sta $06      ;note that if an RTS is performed in next routine
       iny          ;it will return to the execution before the sub
       lda ($04),y  ;that called this routine
       sta $07
       jmp ($06)    ;jump to the address we loaded

As I mentioned before, the above code treats the “return address” that JSR pushed onto the call stack as the address of a jump table, with the index into the jump table passed in the A register.

The reverse engineering tool needs to be aware of this, since otherwise it won’t know about the jump table, and thus won’t find the executable code pointed to by the jump table’s entries.

To solve this problem, I added a little static analysis routine that inspects all the code called by a JSR, checking for stack-based shenanigans. A normal subroutine may push and pop values on and off the call stack, but it will leave the return address alone, and when the RTS is reached, the return address will be at the top of the stack. A jump table engine like the above will pop the “return address” off the stack and end not with a RTS but with an indirect JMP (JuMP). Truly weird code will do something else.

My program now uses the Boost Graph Library to build a control flow graph of the executable code it’s located, showing all the possible paths of execution through the program. The analysis runs as a depth-first traversal of each alleged subroutine, checking what the depth of the call stack will be at each instruction.

Naturally, the analysis is only a heuristic. It’s possible for the code to be a normal subroutine and look like a jump table engine if, say, it manually pops the return address off the call stack and does an indirect JMP to it instead of an RTS. I’m making the assumption that the code isn’t deliberately obfuscated like that.

During testing, I tried running the jump table engine detector against Mega Man to see what would happen, and look at what it found:

PRG4_1562: txa
PRG4_1563: asl A
PRG4_1564: tay
PRG4_1565: iny
PRG4_1566: pla
PRG4_1567: sta $F4
PRG4_1569: pla
PRG4_156A: sta $F5
PRG4_156C: lda ($F4),Y
PRG4_156E: tax
PRG4_156F: iny
PRG4_1570: lda ($F4),Y
PRG4_1572: sta $F5
PRG4_1574: stx $F4
PRG4_1576: jmp ($00F4)

The details of how it works are different (it passes the index in the X register instead of A, and it only uses two bytes of zero-page memory instead of four), but it does the same thing as the jump table engine in Super Mario Bros. The fact that my detector seems to work on two different games written by two different developers is a good sign — both that the detector works, and that the jump-table-via-subroutine trick is likely to be used in many games. (The detector also found a few subroutines that do some other form of stack weirdness, but I haven’t look yet at just what’s going on with those.)

The reverse engineering tool doesn’t actually make use of this new information yet, but that’s the next thing on the to-do list.

Comments Off

Going in reverse

In case you’re wondering what I’ve been up to with the relative silence here lately, for some reason I’ve decided to write a reverse engineering tool for NES games. This might eventually lead to answering a long-standing question, or even lead into a revival of Wallace.

One of the challenges with writing such a tool is figuring out just what the various bytes in a ROM image even mean. In most operating systems, the internal structure of executable files contains some structural information. In particular, the executable code itself (aka .text, counterintuitively) is stored separately from global data (aka .data) that the executable code uses. So right off the bat, you can find the machine code and disassemble it back into something fairly readable.

No such luck with a NES ROM image. While there is segmentation in the file format, it’s based on whether the memory bank in question is part of CPU memory (PRG, or “program”, banks) or PPU memory (CHR, or “character”, banks). CHR banks contain the pattern tables that store the game’s graphics, so those are pretty straightforward to interpret.

However, PRG banks store both code and data in whatever layout the developers chose to. All you have to go by is the fact that the uppermost 6 bytes of CPU memory contain pointers to three pieces of code: the non-maskable interrupt handler (invoked by the PPU), the reset handler (invoked after power-on or reset), and the software interrupt handler (invoked by the BRK instruction). That’s all you know for sure about the memory layout.

In principle, this is enough. You can start at the addresses indicated by those pointers and start reading off instructions, following any branches or jumps along the way. We aren’t actually executing anything, just stepping through the structure of the code to find where all the instructions are. We don’t know what the program will actually do when it executes — for example, we don’t know which way a branch will go — but for our static analysis, we can just trace both code paths until we run out of code. When we’re done, everything we’ve reached is an instruction and can be disassembled, and anything else is data (or filler).

Of course, since nothing can be easy, this static control flow analysis fails if the program does anything tricky. Chances of running into something tricky are pretty good. For example, take a look at this bit of code from a full disassembly of Super Mario Bros.:

OperModeExecutionTree:
      lda OperMode     ;this is the heart of the entire program,
      jsr JumpEngine   ;most of what goes on starts here
 
      .dw TitleScreenMode
      .dw GameMode
      .dw VictoryMode
      .dw GameOverMode

JSR is the “jump to subroutine” instruction: it pushes the address of the following instruction onto the stack and jumps to the specified address. When an RTS (“return from subroutine”) is executed, that return address gets popped off the stack and jumped to. So, normally, the code after a JSR will get executed once the called subroutine does an RTS.

But clearly, that’s not what’s happening here, since we have eight bytes of data immediately following the JSR, instead of code. In particular, four 16-bit addresses. Needless to say, my current executable-code-detection algorithm chokes on this. What’s going on? The code for the JumpEngine “subroutine” reveals all:

JumpEngine:
       asl          ;shift bit from contents of A
       tay
       pla          ;pull saved return address from stack
       sta $04      ;save to indirect
       pla
       sta $05
       iny
       lda ($04),y  ;load pointer from indirect
       sta $06      ;note that if an RTS is performed in next routine
       iny          ;it will return to the execution before the sub
       lda ($04),y  ;that called this routine
       sta $07
       jmp ($06)    ;jump to the address we loaded

For those who don’t speak 6502, this is basically a clever way of implementing a jump table. On entry, the A register contains the index of the table, and the top of the stack contains the address of the jump table itself. The routine multiplies A by 2 (the size in bytes of an address) and stores it in index register Y. It then writes the Yth entry in the jump table to memory address $0006, and then jumps to the address stored there.

The slightly convoluted routine to implement this is needed to get around the limitations of the 6502 processor — in particular, the fact that registers are too small to store an address. But for our purposes, this sort of thing demonstrates that we can’t blindly assume a JSR invokes a “normal” subroutine — we have to check whether the alleged subroutine does any kind of stack trickery like that, and if it does, not assume that the instructions following the JSR will get called.

Of course, since now I have a need to do two kinds of static analysis (the “executable code detector” and the “tricky subroutine detector”), ad hoc navigation methods through the PRG banks won’t cut it. Luckily, it looks like the Boost Graph Library will give me a fairly easy way to make the PRG banks look like a proper graph (without actually having to build one explicitly), at which time I can implement static analysis routines using standard graph traversal algorithms. That shouldn’t be too hard — most problems there will probably come from learning how to write the necessary wrappers around my classes.

Also, note how I’ve been talking as though I know for certain where each of the PRG banks will appear in memory. Well, nothing can be easy. NES ROMs make use of mappers, which swap different PRG and CHR banks into memory at runtime, sort of like how a virtual memory system does paging. Dozens of different mappers exist, each with different possible behavior. With just static analysis, all I can go off of is knowing which set of banks might be mapped to which ranges of addresses, without knowing which combinations will actually exist at runtime.

In fact, if writing an emulator, implementing the CPU is arguably the easiest part. The real pain comes in trying to implement each of the mappers, which vary widely from ROM to ROM based on whichever crazy hardware the developers used to get around the Nintendo’s memory limitations. No standardization whatsoever.

Of course, there’s lots of other challenges to reverse engineering a ROM. I haven’t even covered everything about just figuring out which bytes are for what — figuring out the text table, for example, has its own interesting issues. And that’s before we get to the point of actually figuring out what the ROM does when you run it.

If this were a program to generate driving directions, it’s still at the point of trying to figure out which squiggles on the map are roads and which are rivers. A critical thing, sure, but still a long way from the goal.