Going in reverse

In case you’re wondering what I’ve been up to with the relative silence here lately, for some reason I’ve decided to write a reverse engineering tool for NES games. This might eventually lead to answering a long-standing question, or even lead into a revival of Wallace.

One of the challenges with writing such a tool is figuring out just what the various bytes in a ROM image even mean. In most operating systems, the internal structure of executable files contains some structural information. In particular, the executable code itself (aka .text, counterintuitively) is stored separately from global data (aka .data) that the executable code uses. So right off the bat, you can find the machine code and disassemble it back into something fairly readable.

No such luck with a NES ROM image. While there is segmentation in the file format, it’s based on whether the memory bank in question is part of CPU memory (PRG, or “program”, banks) or PPU memory (CHR, or “character”, banks). CHR banks contain the pattern tables that store the game’s graphics, so those are pretty straightforward to interpret.

However, PRG banks store both code and data in whatever layout the developers chose to. All you have to go by is the fact that the uppermost 6 bytes of CPU memory contain pointers to three pieces of code: the non-maskable interrupt handler (invoked by the PPU), the reset handler (invoked after power-on or reset), and the software interrupt handler (invoked by the BRK instruction). That’s all you know for sure about the memory layout.

In principle, this is enough. You can start at the addresses indicated by those pointers and start reading off instructions, following any branches or jumps along the way. We aren’t actually executing anything, just stepping through the structure of the code to find where all the instructions are. We don’t know what the program will actually do when it executes — for example, we don’t know which way a branch will go — but for our static analysis, we can just trace both code paths until we run out of code. When we’re done, everything we’ve reached is an instruction and can be disassembled, and anything else is data (or filler).

Of course, since nothing can be easy, this static control flow analysis fails if the program does anything tricky. Chances of running into something tricky are pretty good. For example, take a look at this bit of code from a full disassembly of Super Mario Bros.:

OperModeExecutionTree:
      lda OperMode     ;this is the heart of the entire program,
      jsr JumpEngine   ;most of what goes on starts here
 
      .dw TitleScreenMode
      .dw GameMode
      .dw VictoryMode
      .dw GameOverMode

JSR is the “jump to subroutine” instruction: it pushes the address of the following instruction onto the stack and jumps to the specified address. When an RTS (“return from subroutine”) is executed, that return address gets popped off the stack and jumped to. So, normally, the code after a JSR will get executed once the called subroutine does an RTS.

But clearly, that’s not what’s happening here, since we have eight bytes of data immediately following the JSR, instead of code. In particular, four 16-bit addresses. Needless to say, my current executable-code-detection algorithm chokes on this. What’s going on? The code for the JumpEngine “subroutine” reveals all:

JumpEngine:
       asl          ;shift bit from contents of A
       tay
       pla          ;pull saved return address from stack
       sta $04      ;save to indirect
       pla
       sta $05
       iny
       lda ($04),y  ;load pointer from indirect
       sta $06      ;note that if an RTS is performed in next routine
       iny          ;it will return to the execution before the sub
       lda ($04),y  ;that called this routine
       sta $07
       jmp ($06)    ;jump to the address we loaded

For those who don’t speak 6502, this is basically a clever way of implementing a jump table. On entry, the A register contains the index of the table, and the top of the stack contains the address of the jump table itself. The routine multiplies A by 2 (the size in bytes of an address) and stores it in index register Y. It then writes the Yth entry in the jump table to memory address $0006, and then jumps to the address stored there.

The slightly convoluted routine to implement this is needed to get around the limitations of the 6502 processor — in particular, the fact that registers are too small to store an address. But for our purposes, this sort of thing demonstrates that we can’t blindly assume a JSR invokes a “normal” subroutine — we have to check whether the alleged subroutine does any kind of stack trickery like that, and if it does, not assume that the instructions following the JSR will get called.

Of course, since now I have a need to do two kinds of static analysis (the “executable code detector” and the “tricky subroutine detector”), ad hoc navigation methods through the PRG banks won’t cut it. Luckily, it looks like the Boost Graph Library will give me a fairly easy way to make the PRG banks look like a proper graph (without actually having to build one explicitly), at which time I can implement static analysis routines using standard graph traversal algorithms. That shouldn’t be too hard — most problems there will probably come from learning how to write the necessary wrappers around my classes.

Also, note how I’ve been talking as though I know for certain where each of the PRG banks will appear in memory. Well, nothing can be easy. NES ROMs make use of mappers, which swap different PRG and CHR banks into memory at runtime, sort of like how a virtual memory system does paging. Dozens of different mappers exist, each with different possible behavior. With just static analysis, all I can go off of is knowing which set of banks might be mapped to which ranges of addresses, without knowing which combinations will actually exist at runtime.

In fact, if writing an emulator, implementing the CPU is arguably the easiest part. The real pain comes in trying to implement each of the mappers, which vary widely from ROM to ROM based on whichever crazy hardware the developers used to get around the Nintendo’s memory limitations. No standardization whatsoever.

Of course, there’s lots of other challenges to reverse engineering a ROM. I haven’t even covered everything about just figuring out which bytes are for what — figuring out the text table, for example, has its own interesting issues. And that’s before we get to the point of actually figuring out what the ROM does when you run it.

If this were a program to generate driving directions, it’s still at the point of trying to figure out which squiggles on the map are roads and which are rivers. A critical thing, sure, but still a long way from the goal.

6 Responses

  1. As one who wrote an emulator at one time, and who also attempted such a thing, good luck! I gave up on the code analysis pretty quickly once I hit a game (I forget which, but my gut tells me it’s Battletoads, bane of my emulation-writing existence) that builds up a program in RAM and executes from there.

    But the PRG swapping was also a hell of a thing.

  2. If what you say about Battletoads is true, then you’ve managed to find yet another way that game sucks. Self-modifying code may not be the root of all evil, but they play poker together on Tuesdays.

  3. Seriously. Battletoads was one of the most wicked games to emulate in the history of games (not counting games with the crazy-assed mappers like the MMC5 or VRC6/7). It falls under the weird category of games that work with very inaccurate emulators, but for MOSTLY-accurate emulators, a slight timing mistake can cause level 2 to lock up.

    DAMN do I hate that game (but DAMN was I overjoyed when I actually got it working without crazy hackery).

    Also it stands as the only game that I can’t even beat using save states. Stupid down-the-elevator-shaft chase sequence.

  4. As usual, I only followed a fraction of all of that – but I love to see work on Wallace! Thanks for the update.

  5. I’m pathetic in not feeling like thinking through reading the whole article, but a revival of Wallace sounds like a religious thing. I’m not saying, I”m just sayin’.

  6. Maybe Paul should invite Sarah Palin to this revival. I bet she’d show up.

Comments are closed.