The myth of signed arithmetic wraparound

Pop quiz time! Suppose you’re on a machine where int is 32 bits wide and thus INT_MAX, the maximum possible value for an int, is 2147483647. What is the output of the following C program?

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
 
int main (int argc, char *argv[])
{
        int sum = INT_MAX + 1;
        printf("INT_MAX + 1 == %d\n", sum);
        return EXIT_SUCCESS;
}

If you’re like most C programmers, you probably answered the following:

INT_MAX + 1 == -2147483648

And if pressed to explain how you could add two positive numbers together and wide up with a negative number, you’d explain that it’s because of how two’s complement arithmetic works:

 2147483647 == 0111 1111 1111 1111 1111 1111 1111 1111
+         1 == 0000 0000 0000 0000 0000 0000 0000 0001
-----------    ---------------------------------------
-2147483648 == 1000 0000 0000 0000 0000 0000 0000 0000

It turns out that you’d be wrong. In C, the behavior of signed arithmetic overflow — that is, what happens when the result of an arithmetic operation is outside the range of what’s representable with that data type — is undefined. You might witness the wraparound behavior you’d expect from two’s complement arithmetic, but it’s certainly not guaranteed, since C doesn’t require the use of two’s complement arithmetic.

But, you might protest, that’s an awfully pedantic point to make, since pretty much any system you’ll encounter these days uses two’s complement arithmetic, with alternative representations like one’s complement or sign-and-magnitude being extremely difficult to find outside of legacy hardware. Two’s complement and arithmetic and the wraparound behavior it exhibits may not technically be part of the C standard, but in any practical application it’s a de facto standard because of the use of two’s complement arithmetic by the underlying processor, right?

Actually, no, not at all; real-world compilers will exploit the fact that the C standard does not define the behavior of integer arithmetic overflow when it’s optimizing your program. As a result, you can’t at all rely on wraparound happening. For instance, consider the following program:

#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
 
int process_something (int fd, int size)
{
        // Catch integer overflow
        if (size > size + 1)
                abort();
 
        char *string = malloc(size + 1);
        int count = read(fd, string, size);
        string[count] = 0;
        int len = strlen(string);
        free(string);
        return len;
}
 
int main (int argc, char *argv[])
{
        int fd = open("/etc/issue", O_RDONLY);
        int len = process_something(fd, INT_MAX);
        close(fd);
        return EXIT_SUCCESS;
}

The function process_something wants to allocate size + 1 bytes and is trying to be careful not to accidentally pass a negative value to malloc, which would happen if size == INT_MAX. The code tries to exploit how two’s complement works by checking size > size + 1 to detect integer overflow, and aborting if that’s the case.

This code does not work when compiled with gcc, depending on the optimization level. At -O2 and above, gcc realizes that size > size + 1 should never be true — think about it mathematically — and so eliminates the entire if block. What about if there is integer overflow? The C standard says the behavior of the program is undefined if that happens, so the compiler doesn’t need to worry about it.

When compiled at optimization level -O0 or -O1, the output of the program is:

Aborted

Which is what happens when abort gets called. This is the behavior the programmer expected.

But when compiled at optimization level -O2 or -O3, the code that calls abort gets optimized away, and this happens (or at least, could happen) instead:

Segmentation fault

Why? On x86 Linux, malloc returns NULL when it receives a negative argument, since size + 1 is negative due to two’s complement wraparound. Since malloc returned a NULL pointer, a segmentation fault happens when the following code attempts to dereference it.

This is at least what happens on my computer. But since the program invokes undefined behavior, it could do anything at all and be considered correct, from a technical standpoint. Other possibilities are: successfully reading the input file anyway, failing silently without crashing, sending threatening e-mails to your boss, drinking your coffee when you’re not looking, or making demons fly out of your nose. Undefined behavior is undefined. Anything is possible.

The point is, relying on undefined behavior is a bad idea, even if you think you know what’s actually going to happen. Part of the reason that languages like C provide areas where undefined behavior can occur, as opposed to languages like Java that try to specify what must happen in every circumstance, is that it gives compilers a lot more flexibility in optimizing code. A compiler can assume, for example, that a < a + 1 without having to first prove that a will never be the maximum possible value for its type; if process_something in our previous example were in a separate library, the compiler would have no way of knowing what a could be at runtime, and would have to assume the worst.

For more about the gotchas of undefined behavior in C, read What Every C Programmer Should Know About Undefined Behavior (part 2) (part 3) on the LLVM Project Blog, which inspired this post and showed me why I would have answered the question I opened this post with incorrectly.