The Dark Side of Type Inference

If you can’t figure out what you’re writing, the compiler won’t do any better.

I’ve already written about SML/NJ before. Since then, I’ve gotten more practice with it, seeing as how the project for Grad-Level Compilers is written in it, as is an increasing amount of work in Programming Languages.

Anyway. SML/NJ’s type system has a feature known as “type inference.” In a nutshell, type inference can look at a program and figure out what the type of everything is.

For example, let’s say you wanted to write a recursive factorial function. In C, it’d be something like:

int factorial (int n) {
    if (n == 0)
        return 1;
        return n * factorial(n - 1);

On the first line, you’ll notice two type declarations. First, we specify that the function factorial returns an int. Second, we specify that the argument n is also an int. In C, and many other typed languages, you need to explicitly say what the type of everything is.

Now look at the eqivalent in ML:

fun factorial n =
    if n = 0
    then 1
    else n * factorial (n - 1)

It’s basically the same thing (modulo the different syntax), but notice what’s different: nowhere did we say what type the function returns, and what type n is. The ML compiler can figure this out on its own:

val factorial = fn : int -> int

The compiler has figured out that n must be an integer (since we compare it to the integer 0) and that the function also returns an integer (since the integer 1 is a possible return value).

Ideally, type inference combines the best features of a strong type system and a weak type system. You don’t need to spend time explicitly stating the type of everything, but the compiler can still type-check the program and find type errors at compile time. The best of both worlds, right?

Well, not exactly. What happens if there is a type error? Consider this:

fun factorial n =
    if n = 0
    then 1.0
    else n * factorial (n - 1)

The compiler lets us know that this isn’t a well-typed program:

stdIn:2.5-4.31 Error: types of if branches do not agree [literal]
  then branch: real
  else branch: int
  in expression:
    if n = 0 then 1.0 else n * factorial (n - 1)

Here, the problem is basically that we’re returning a real if we take the true branch, but we’re returning an int if we take the false branch. This isn’t well-typed, and the compiler managed to figure it out for us and point out exactly where the problem is.

Sadly, for more complicated programs, it doesn’t work out quite so neatly. Say you have several functions, each of which take several arguments and call each other in a mutually recursive fashion. There’s also some polymorphic higher-order functions thrown in for good measure. You try to compile, but the compiler reports a type error on line XYZ. Or is there?

Well, it depends on how you look at it. The compiler first noticed a type error on line XYZ. Say, for instance, that it thinks you’re passing an incorrect type into a function. But how does the compiler know what type the function is expecting? Well, it performed type inference on that function. And here lies the problem. It’s entirely possible that the type error is in fact here and not on line XYZ. The compiler wasn’t able to catch the type error here, though; instead, it deduced the wrong type for the function. (Remember, in a type system like ML’s, you can have polymorphic functions that can be applied to several different types.) Thus, the error isn’t on line XYZ at all, but the compiler didn’t find it there.

Such a red herring can be a pain to debug, especially when the actual type error is an improper return type used in one of the cases of a dozen-case case expression. (I’d post the code that caused this, but since it’s for a class project, that wouldn’t sit too well with the university.) I only managed to track it down by eschewing type inference and explicitly stating what I thought the types of all the arguments and return values should be. With those in place, the real type error popped right up, no longer able to hide behind erroneous type inference.

So, the moral is that type inference can be handy, but it does have its drawbacks. If you’re unlucky, the compiler may be able to infer types that you couldn’t possibly have meant, causing it to give some really strange error messages elsewhere in the code. A good rule of thumb, I’ve found, is that if you can’t easily infer the types from looking at the code, write them in explicitly; the compiler may not be able to figure it out for you either (or at least not in such a way as would provide useful error messages).

Also, explicitly stating types even in a type-inferred language has documentative properties as well. You can tell just by looking at the code what the interface to a function looks like. Although the compiler will report this information right away in interactive mode, when looking at a file full of source code such information won’t be available. Again, if it’s not obvious to you what the types should be just from looking at it, you’re probably better off writing them in explicitly. The longer the function is, the more likely type inference can cause you a half-hour’s worth of misguided debugging.

2 Responses

  1. In other words, either use a weakly-typed language or use explicit typing. I hate that wussy type-inference middle ground which so many academic “perfect” languages attempt to take. (Prolog is also pretty bad about this stuff, for example.)

    Of course, weak typing has definite pitfalls as well, especially when the typing system is exceedingly weak and broken like in PHP, the only language I know of which resolves a type mismatch between string and int by converting the string to an int… (And would it have killed them to put in a simple way to make it throw an error if you try to read from an uninitialized variable?! That seems to be the cause of 99% of all aggravating PHP debug sessions!)

  2. Basically, yeah. If you name things clearly enough so that it’s “obvious” what type they’re supposed to have, type inference can save you redundancies like “BufferedReader reader = new BufferedReader()” in Java. And if you’re writing a polymorphic function, the benefit of explicitly typing arguments with type variables may not really buy you much. But if a typo causes the compiler to think two variables should be the same type when they really shouldn’t, you quickly run into problems.

    If you want PHP to warn you about uninitialized variables, put “error_reporting = E_ALL” in your php.ini. By default, it doesn’t print warnings like that. The problem is, since warnings aren’t on by default, you’ll find plenty of code out there (and even libraries from PEAR) that spit out all sorts of warnings. (Read: I ran into that when writing my quick-and-dirty RSS aggregator.)

    My main problem with PHP is that the only way to configure most of its behavior is with the site-wide config file, and lots of settings can completely break PHP code that relies on a certain setting. (magic_quotes_gpc, anyone?) If you have two pieces of code that depend on different settings, you get your choice of which code to rewrite.

Comments are closed.