<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">

<channel>
	<title>Paul Kuliniewicz &#187; dennis</title>
	<atom:link href="http://www.kuliniewicz.org/blog/archives/tag/dennis/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.kuliniewicz.org/blog</link>
	<description>After all, it could only cost you your life, and you got that for free.</description>
	<lastBuildDate>Wed, 18 Jan 2012 04:01:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/us/</creativeCommons:license>		<item>
		<title>Please do not jump on the table</title>
		<link>http://www.kuliniewicz.org/blog/archives/2008/09/14/please-do-not-jump-on-the-table/</link>
		<comments>http://www.kuliniewicz.org/blog/archives/2008/09/14/please-do-not-jump-on-the-table/#comments</comments>
		<pubDate>Sun, 14 Sep 2008 21:19:55 +0000</pubDate>
		<dc:creator>Paul Kuliniewicz</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[dennis]]></category>
		<category><![CDATA[nes]]></category>
		<category><![CDATA[reverse engineering]]></category>

		<guid isPermaLink="false">http://www.kuliniewicz.org/blog/?p=859</guid>
		<description><![CDATA[Last time, I mentioned one obstacle I ran into when writing a reverse engineering tool for Nintendo games. Namely, just because a piece of code is called using the JSR (Jump to SubRoutine) instruction doesn&#8217;t mean the code being called actually is a subroutine. Subroutines eventually execute an RTS (ReTurn from Subroutine) instruction to go [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.kuliniewicz.org/blog/archives/2008/09/05/going-in-reverse/">Last time</a>, I mentioned one obstacle I ran into when writing a reverse engineering tool for Nintendo games.  Namely, just because a piece of code is called using the <code>JSR</code> (Jump to SubRoutine) instruction doesn&#8217;t mean the code being called actually <em>is</em> a subroutine.  Subroutines eventually execute an <code>RTS</code> (ReTurn from Subroutine) instruction to go back to the code immediately following the <code>JSR</code>.  This bit of code in Super Mario Bros., however, doesn&#8217;t:</p>
<div class="vim"><code><span class="Special">JumpEngine</span>:<br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">asl</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="Comment">;shift bit from contents of A</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">tay</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">pla</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="Comment">;pull saved return address from stack</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">sta</span> <span class="Constant">$04</span>&nbsp; &nbsp; &nbsp; <span class="Comment">;save to indirect</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">pla</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">sta</span> <span class="Constant">$05</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">iny</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">lda</span> (<span class="Constant">$04</span>),<span class="Special">y</span>&nbsp; <span class="Comment">;load pointer from indirect</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">sta</span> <span class="Constant">$06</span>&nbsp; &nbsp; &nbsp; <span class="Comment">;note that if an RTS is performed in next routine</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">iny</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="Comment">;it will return to the execution before the sub</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">lda</span> (<span class="Constant">$04</span>),<span class="Special">y</span>&nbsp; <span class="Comment">;that called this routine</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">sta</span> <span class="Constant">$07</span><br />
&nbsp; &nbsp; &nbsp;&nbsp; <span class="Statement">jmp</span> (<span class="Constant">$06</span>)&nbsp; &nbsp; <span class="Comment">;jump to the address we loaded</span></code></div>
<p>As I mentioned before, the above code treats the &#8220;return address&#8221; that <code>JSR</code> pushed onto the call stack as the address of a jump table, with the index into the jump table passed in the <code>A</code> register.</p>
<p>The reverse engineering tool needs to be aware of this, since otherwise it won&#8217;t know about the jump table, and thus won&#8217;t find the executable code pointed to by the jump table&#8217;s entries.</p>
<p>To solve this problem, I added a little static analysis routine that inspects all the code called by a <code>JSR</code>, checking for stack-based shenanigans.  A normal subroutine may push and pop values on and off the call stack, but it will leave the return address alone, and when the <code>RTS</code> is reached, the return address will be at the top of the stack.  A jump table engine like the above will pop the &#8220;return address&#8221; off the stack and end not with a <code>RTS</code> but with an indirect <code>JMP</code> (JuMP).  Truly weird code will do something else.</p>
<p>My program now uses the <a href="http://www.boost.org/doc/libs/1_36_0/libs/graph/doc/table_of_contents.html">Boost Graph Library</a> to build a <a href="http://en.wikipedia.org/wiki/Control_flow_graph">control flow graph</a> of the executable code it&#8217;s located, showing all the possible paths of execution through the program.  The analysis runs as a depth-first traversal of each alleged subroutine, checking what the depth of the call stack will be at each instruction.</p>
<p>Naturally, the analysis is only a heuristic.  It&#8217;s possible for the code to be a normal subroutine and look like a jump table engine if, say, it manually pops the return address off the call stack and does an indirect <code>JMP</code> to it instead of an <code>RTS</code>.  I&#8217;m making the assumption that the code isn&#8217;t deliberately obfuscated like that.</p>
<p>During testing, I tried running the jump table engine detector against Mega Man to see what would happen, and look at what it found:</p>
<div class="vim"><code><span class="Special">PRG4_1562</span>: <span class="Statement">txa</span><br />
<span class="Special">PRG4_1563</span>: <span class="Statement">asl</span> <span class="Special">A</span><br />
<span class="Special">PRG4_1564</span>: <span class="Statement">tay</span><br />
<span class="Special">PRG4_1565</span>: <span class="Statement">iny</span><br />
<span class="Special">PRG4_1566</span>: <span class="Statement">pla</span><br />
<span class="Special">PRG4_1567</span>: <span class="Statement">sta</span> <span class="Constant">$F4</span><br />
<span class="Special">PRG4_1569</span>: <span class="Statement">pla</span><br />
<span class="Special">PRG4_156A</span>: <span class="Statement">sta</span> <span class="Constant">$F5</span><br />
<span class="Special">PRG4_156C</span>: <span class="Statement">lda</span> (<span class="Constant">$F4</span>),<span class="Special">Y</span><br />
<span class="Special">PRG4_156E</span>: <span class="Statement">tax</span><br />
<span class="Special">PRG4_156F</span>: <span class="Statement">iny</span><br />
<span class="Special">PRG4_1570</span>: <span class="Statement">lda</span> (<span class="Constant">$F4</span>),<span class="Special">Y</span><br />
<span class="Special">PRG4_1572</span>: <span class="Statement">sta</span> <span class="Constant">$F5</span><br />
<span class="Special">PRG4_1574</span>: <span class="Statement">stx</span> <span class="Constant">$F4</span><br />
<span class="Special">PRG4_1576</span>: <span class="Statement">jmp</span> (<span class="Constant">$00F4</span>)</code></div>
<p>The details of how it works are different (it passes the index in the <code>X</code> register instead of <code>A</code>, and it only uses two bytes of zero-page memory instead of four), but it does the same thing as the jump table engine in Super Mario Bros.  The fact that my detector seems to work on two different games written by two different developers is a good sign &#8212; both that the detector works, and that the jump-table-via-subroutine trick is likely to be used in many games.  (The detector also found a few subroutines that do some other form of stack weirdness, but I haven&#8217;t look yet at just what&#8217;s going on with those.)</p>
<p>The reverse engineering tool doesn&#8217;t actually make use of this new information yet, but that&#8217;s the next thing on the to-do list.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kuliniewicz.org/blog/archives/2008/09/14/please-do-not-jump-on-the-table/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Going in reverse</title>
		<link>http://www.kuliniewicz.org/blog/archives/2008/09/05/going-in-reverse/</link>
		<comments>http://www.kuliniewicz.org/blog/archives/2008/09/05/going-in-reverse/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 21:42:44 +0000</pubDate>
		<dc:creator>Paul Kuliniewicz</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[dennis]]></category>
		<category><![CDATA[nintendo]]></category>
		<category><![CDATA[reversing]]></category>

		<guid isPermaLink="false">http://www.kuliniewicz.org/blog/?p=846</guid>
		<description><![CDATA[In case you&#8217;re wondering what I&#8217;ve been up to with the relative silence here lately, for some reason I&#8217;ve decided to write a reverse engineering tool for NES games. This might eventually lead to answering a long-standing question, or even lead into a revival of Wallace. One of the challenges with writing such a tool [...]]]></description>
			<content:encoded><![CDATA[<p>In case you&#8217;re wondering what I&#8217;ve been up to with the relative silence here lately, for some reason I&#8217;ve decided to write a reverse engineering tool for NES games.  This might eventually lead to answering a <a href="http://www.kuliniewicz.org/blog/archives/2004/03/15/congrats-heres-rand-10000-points/">long-standing question</a>, or even lead into a revival of <a href="http://www.kuliniewicz.org/blog/archives/category/coding/wallace/">Wallace</a>.</p>
<p>One of the challenges with writing such a tool is figuring out just what the various bytes in a ROM image even mean.  In most operating systems, the internal structure of executable files contains some structural information.  In particular, <a href="http://en.wikipedia.org/wiki/Code_segment">the executable code itself (aka .text, counterintuitively)</a> is stored separately from <a href="http://en.wikipedia.org/wiki/Data_segment">global data (aka .data)</a> that the executable code uses.  So right off the bat, you can find the machine code and <a href="http://en.wikipedia.org/wiki/Assembly_language">disassemble it back into something fairly readable</a>.</p>
<p>No such luck with a NES ROM image.  While there <em>is</em> segmentation in the file format, it&#8217;s based on whether the memory bank in question is part of <a href="http://en.wikipedia.org/wiki/Central_processing_unit">CPU</a> memory (PRG, or &#8220;program&#8221;, banks) or <a href="http://en.wikipedia.org/wiki/Picture_Processing_Unit">PPU</a> memory (CHR, or &#8220;character&#8221;, banks).  CHR banks contain the pattern tables that store the game&#8217;s graphics, so those are pretty straightforward to interpret.</p>
<p>However, PRG banks store both code and data in whatever layout the developers chose to.  All you have to go by is the fact that the uppermost 6 bytes of CPU memory contain pointers to three pieces of code: the <a href="http://nesdevwiki.org/index.php/NMI">non-maskable interrupt</a> handler (invoked by the PPU), the reset handler (invoked after power-on or reset), and the software interrupt handler (invoked by the <code>BRK</code> instruction).  That&#8217;s all you know for sure about the memory layout.</p>
<p>In principle, this is enough.  You can start at the addresses indicated by those pointers and start reading off instructions, following any branches or jumps along the way.  We aren&#8217;t actually executing anything, just stepping through the structure of the code to find where all the instructions are.  We don&#8217;t know what the program will actually do when it executes &#8212; for example, we don&#8217;t know which way a branch will go &#8212; but for our static analysis, we can just trace both code paths until we run out of code.  When we&#8217;re done, everything we&#8217;ve reached is an instruction and can be disassembled, and anything else is data (or filler).</p>
<p>Of course, since nothing can be easy, this static control flow analysis fails if the program does anything tricky.  Chances of running into something tricky are pretty good.  For example, take a look at this bit of code from a full disassembly of Super Mario Bros.:</p>
<div class="vim"><code>OperModeExecutionTree:<br />
&nbsp; &nbsp; &nbsp; lda OperMode&nbsp; &nbsp;&nbsp; ;this is the heart of the entire program,<br />
&nbsp; &nbsp; &nbsp; jsr JumpEngine&nbsp;&nbsp; ;most of what goes on starts here<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp; &#0046;dw TitleScreenMode<br />
&nbsp; &nbsp; &nbsp; &#0046;dw GameMode<br />
&nbsp; &nbsp; &nbsp; &#0046;dw VictoryMode<br />
&nbsp; &nbsp; &nbsp; &#0046;dw GameOverMode</code></div>
<p><code>JSR</code> is the &#8220;jump to subroutine&#8221; instruction: it pushes the address of the following instruction onto <a href="http://en.wikipedia.org/wiki/Call_stack">the stack</a> and jumps to the specified address.  When an <code>RTS</code> (&#8220;return from subroutine&#8221;) is executed, that return address gets popped off the stack and jumped to.  So, normally, the code after a JSR will get executed once the called subroutine does an RTS.</p>
<p>But clearly, that&#8217;s not what&#8217;s happening here, since we have eight bytes of data immediately following the JSR, instead of code.  In particular, four 16-bit addresses.  Needless to say, my current executable-code-detection algorithm chokes on this.  What&#8217;s going on?  The code for the JumpEngine &#8220;subroutine&#8221; reveals all:</p>
<div class="vim"><code>JumpEngine:<br />
&nbsp; &nbsp; &nbsp;&nbsp; asl&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ;shift bit from contents of A<br />
&nbsp; &nbsp; &nbsp;&nbsp; tay<br />
&nbsp; &nbsp; &nbsp;&nbsp; pla&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ;pull saved return address from stack<br />
&nbsp; &nbsp; &nbsp;&nbsp; sta $04&nbsp; &nbsp; &nbsp; ;save to indirect<br />
&nbsp; &nbsp; &nbsp;&nbsp; pla<br />
&nbsp; &nbsp; &nbsp;&nbsp; sta $05<br />
&nbsp; &nbsp; &nbsp;&nbsp; iny<br />
&nbsp; &nbsp; &nbsp;&nbsp; lda ($04),y&nbsp; ;load pointer from indirect<br />
&nbsp; &nbsp; &nbsp;&nbsp; sta $06&nbsp; &nbsp; &nbsp; ;note that if an RTS is performed in next routine<br />
&nbsp; &nbsp; &nbsp;&nbsp; iny&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ;it will return to the execution before the sub<br />
&nbsp; &nbsp; &nbsp;&nbsp; lda ($04),y&nbsp; ;that called this routine<br />
&nbsp; &nbsp; &nbsp;&nbsp; sta $07<br />
&nbsp; &nbsp; &nbsp;&nbsp; jmp ($06)&nbsp; &nbsp; ;jump to the address we loaded</code></div>
<p>For those who don&#8217;t speak <a href="http://en.wikipedia.org/wiki/MOS_Technology_6502">6502</a>, this is basically a clever way of implementing a <a href="http://en.wikipedia.org/wiki/Branch_table">jump table</a>.  On entry, the <code>A</code> register contains the index of the table, and the top of the stack contains the address of the jump table itself.  The routine multiplies <code>A</code> by 2 (the size in bytes of an address) and stores it in index register <code>Y</code>.  It then writes the <code>Y</code>th entry in the jump table to memory address $0006, and then jumps to the address stored there.</p>
<p>The slightly convoluted routine to implement this is needed to get around the limitations of the 6502 processor &#8212; in particular, the fact that registers are too small to store an address.  But for our purposes, this sort of thing demonstrates that we can&#8217;t blindly assume a <code>JSR</code> invokes a &#8220;normal&#8221; subroutine &#8212; we have to check whether the alleged subroutine does any kind of stack trickery like that, and if it does, not assume that the instructions following the <code>JSR</code> will get called.</p>
<p>Of course, since now I have a need to do two kinds of static analysis (the &#8220;executable code detector&#8221; and the &#8220;tricky subroutine detector&#8221;), ad hoc navigation methods through the PRG banks won&#8217;t cut it.  Luckily, it looks like the <a href="http://www.boost.org/doc/libs/1_36_0/libs/graph/doc/table_of_contents.html">Boost Graph Library</a> will give me a fairly easy way to make the PRG banks look like a proper graph (without actually having to build one explicitly), at which time I can implement static analysis routines using standard graph traversal algorithms.  That shouldn&#8217;t be <em>too</em> hard &#8212; most problems there will probably come from learning how to write the necessary wrappers around my classes.</p>
<p>Also, note how I&#8217;ve been talking as though I know for certain where each of the PRG banks will appear in memory.  Well, nothing can be easy.  NES ROMs make use of mappers, which swap different PRG and CHR banks into memory at runtime, sort of like how a <a href="http://en.wikipedia.org/wiki/Virtual_memory">virtual memory</a> system does paging.  Dozens of different mappers exist, each with different possible behavior.  With just static analysis, all I can go off of is knowing which set of banks might be mapped to which ranges of addresses, without knowing which combinations will actually exist at runtime.</p>
<p>In fact, if writing an emulator, implementing the CPU is arguably the easiest part.  The real pain comes in trying to implement each of the mappers, which vary widely from ROM to ROM based on whichever crazy hardware the developers used to get around the Nintendo&#8217;s memory limitations.  No standardization whatsoever.</p>
<p>Of course, there&#8217;s lots of other challenges to reverse engineering a ROM.  I haven&#8217;t even covered everything about just figuring out which bytes are for what &#8212; figuring out the text table, for example, has its own interesting issues.  And that&#8217;s before we get to the point of actually figuring out what the ROM <em>does</em> when you run it.</p>
<p>If this were a program to generate driving directions, it&#8217;s still at the point of trying to figure out which squiggles on the map are roads and which are rivers.  A critical thing, sure, but still a long way from the goal.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kuliniewicz.org/blog/archives/2008/09/05/going-in-reverse/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

