Does your computer appreciate ? Day?

It’s that time of year again, when we all reflect on what the ratio of a circle’s circumference to its diameter means to us. But have you ever stopped to ponder what it means… to your computer?

For C programmers at least, the answer might first seem to be found in the the standard library’s math.h header:

# define M_PI           3.14159265358979323846  /* pi */

The M_ prefix denotes it as a constant provided by the math part of the standard library, though you can pronounce it as “mmmm, pie” if you like. Either way, though, that certainly seems like a perfectly reasonable approximation to the transcendental number.

But is it?

To find out, first we need to consider how your computer represents real numbers internally. (OK, technically only rational numbers, and even then only a subset of them.) Most likely, your computer uses the standard floating point representation, which works pretty much like a binary version of scientific notation.

As you’ll recall, since scientists often work with really large or really small numbers that are obnoxious to write in normal notation, they like to move the decimal point around to make the number between 1 and 10, and then multiply the result by the appropriate power of 10 to compensate. For example, instead of writing the number of atoms there are in a gram of 12C as about 602,214,790,000,000,000,000,000, they’d write 6.0221479 × 1023.

Floating point works pretty much the same way, but in binary instead of decimal. For example, the number 13.40625 would be 1101.011012 (using the subscript 2 to denote a binary number), or 1.101011012 × 23.

Of course, floating point stores both the fractional part and the exponent as binary integers, but the precise details aren’t important for this discussion. What is important is how many bits are used to store the fraction and the exponent, and in C we have a few options:

Type Sign bit Exponent bits Fraction bits Total bits
float 1 8 23 + 1 32
double 1 11 52 + 1 64
long double 1 15 64 80

The sign bit denotes whether the number is positive or negative. The more exponent bits we have, the larger and smaller (in the “distance from zero” sense) the numbers we can represent, since we can move the binary point (not a decimal point!) farther to the left or right. The more fraction bits we have, the more precisely we can represent a number, just like 2.718281828459045 is more precise than 2.718.

If you’re paying attention, you might protest that the rows in that table don’t add up. How can we have 1 sign bit, 8 exponent bits, and 24 fraction bits in a 32-bit value, where clearly that’s 33 bits? Easy. In scientific notation, remember how we move the decimal point until the fraction part is between 1 (inclusive) and 10 (exclusive)? In floating point, we move the binary point until the fraction part is between 12 (inclusive) and 102 (exclusive). That means that the fraction part always looks like 1.something2. Since that first bit is always a 1, we don’t need to bother storing it. Woo, free bit! That’s why I wrote 23 + 1 in the table and not 24: 23 stored bits plus 1 implied bit.

The long double type is the oddball, since it’s much more platform-dependent than the other two. On x86 machines, it’s the 80-bit floating point value that the FPU uses internally. Other architectures will probably use something different. Also, the long double format explicitly stores that guaranteed-to-be-a-1 bit at the front of the fraction, presumably because that makes the FPU’s work easier.

Now, with that background out of the way, back to our question: just how accurately can our computer represent ?, without rolling our own data type? Here’s how the three types fare, compared with the true value:

Type Value
float 1.100100100001111110110112 × 21
double 1.10010010000111111011010101000100010000101101000110002 × 21
long double 1.1001001000011111101101010100010001000010110100011000010001101012 × 21
? 1.10010010000111111011010101000100010000101101000110000100011010011…2 × 21

(If you want to check for yourself, here’s the source code I used.)

Now, I may be a master of computer science, but even I have trouble looking at those numbers and seeing how they compare to our buddy M_PI, a.k.a. 3.14159265358979323846. Fortunately, it’s not too hard to compute the maximum rounding error in our floating point values. In general, the magnitude of the rounding error in base b is the number with digit b/2 in the position immediately after the last digit in the rounded number, and zeroes everywhere else. If it were otherwise, the number would’ve been rounded to something else. For a decimal example to make that clearer, the error in rounding ? to 3.14 is at most ±0.005: the true value of ? must be between 3.135 and 3.145, otherwise it wouldn’t've been rounded to 3.14.

Crunching the numbers for our three data types, we get:

Type Maximum error in ?
float ±2-23 ±10-6.9237 ±0.0000001192092895507811309
double ±2-52 ±10-15.6536 ±0.0000000000000002220446049
long double ±2-63 ±10-18.9649 ±0.0000000000000000001084202

To make it clearer still, let’s look at what the decimal values of ? in those types look like, both written out to 21 digits and truncated before we run into any errors:

Type Value Truncated Accurate Digits
float 3.14159274101257324219 3.141592 7
double 3.14159265358979311600 3.141592653589793 16
long double 3.14159265358979323851 3.141592653589793238 19

Moving from float to double roughly doubles the precision we get before losing accuracy, and moving on to long double adds a few more.

For comparison:

Constant Value Accurate Digits
M_PI 3.14159265358979323846 21
M_PIl 3.1415926535897932384626433832795029L 35

It turns out that M_PI is more precise than any of our options for storing it in a floating point value! That makes the GNU extension M_PIl, specifying a long double version of the constant, a bit silly, since we can’t use those extra 14 digits anyway! The only functional difference between the two is the L suffix at the end, telling the compiler to treat the constant as a long double instead of an ordinary double. Without that suffix, assigning M_PI into a long double would still only give us the precision of a double, with those extra fraction bits set to 0.

Of course, in C, storage limits of the different data types is machine-dependent. It’s certainly possible for those same definitions to be used on a machine with larger floating point types that can make use of those extra digits. But for poor old kryten here, it is not meant to be.

For extra ? Day fun, consider that many programs don’t want to depend on having a math.h header recent enough to define M_PI for them, and so define their own version of the constant as a backup. A Google Code search for C programs with “#define M_PI” in them turns up no shortage of hits. Amusingly, many programs have their own ideas as to how many digits M_PI should be defined with.

One particularly fun one is this hit for a 97-digit version of M_PI, if only for the comment in front of it:

 * Ok, so some systems require magic incantations for M_PI to be defined.
 * It's not important enough to worry about.
#ifndef M_PI
#define M_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117

Also amusing are the “you fail at math” examples, like this one:

#ifndef M_PI
#define M_PI 3.14159263

OK, it’s not as bad as the Indiana state legislature’s infamous attempt to implicitly define ? as 3.2. To say nothing of defining PI to be 12. But still.

So now you know: the great thing about ? is that there’s so many different values for it lurking in your computer. Of course, all the x86 assembly programmers are laughing about this whole mess, since they’ll just use the FLDPI instruction to have the FPU use its value for ? and be done with it.

Comments Off