In Defence of the Unitype

23/04/2014

Robert Harper says that dynamically typed languages are actually statically typed languages but with a single type - the unitype. This type is a union of all of the runtime types, and so contains all of their combined values. Because of this Harper says that statically typed languages are a superior superset of dynamically typed ones. In them one could use a union type to create this unitype, and program in it as if it were a dynamic language. But of course no one does this, as it would be crazy, and therefore statically typed languages must be better.

This argument is basically correct in its assertions. Looking at dynamically typed languages as unityped gives a nice insight into how they work, and one really could program in a statically typed language as if it were a dynamic language fairly comfortably. But the reason language designers still create dynamic languages isn't because they are too stupid, stubborn, or trendy to allow for multiple types. And it certainly isn't because "dynamic" sounds cool, while "static" sounds lame (is this primary school?).

To avoid conflict on this issue I will now refer to statically typed programming languages with compile time checking (such as Java or ML) as polytyped languages, and to dynamically typed programming languages with run time checking (such as Python and Javascript) as unityped languages.

For better or worse, there are real differences between unityped and polytyped languages. Here are some of them.

Polytypes require static scope

An often forgotten aspect of polytyping is that it relies on static scope. One aspect of static scope is that just from looking at the source code one can infer, at any point, which variables are defined and which are not, including their types.

Unityped languages such as Python or Ruby typically don't support static scope. In these languages variables are stored in a dictionary of local values, which can be modified in numerous different ways at run time. Static scope would require changes to these languages at their core. Harper might laugh off any dynamically scoped language as being useless, but this at least explains why polytypes couldn't just be added to such languages as some people propose.

In fact even when intended, static scope can be difficult to add. Early Lisps were designed with the possibility of static scope in mind, but many dialects now such as Common Lisp and EmacsLisp don't support it fully due to complex and tricky edge case.

Polytypes require unified models

Polytyped languages have two systems the programmer must create a mental model of in their head. The first is the computation model. This is what happens when code is executed. The second is the type model. This is how the types in the language are used and manipulated by the compiler. In a unityped system there is no type model, and the computation model is the only thing to learn. If it is simple, as in languages such as Lua, this can be really nice.

There is no issue with learning two mental models. Programmers learn hundreds in their lifetime. The tricky part is that in a polytyped system these two models need to be unified. At run time, a function really must return the same type as the one the compiler inferred, otherwise everything will break. Although not always obvious, this unification can sometimes come at a cost. Often the type model must be made more complex, the computation model must, or the expressiveness of the language reduced.

For example, the computation model of duck typing is really simple. The method name is looked up in a dictionary and if it exists, the method is found and called. Easy to understand, easy to implement, great! But if we want duck typing in a polytyped system we need to introduce more things to unify this computation with the type system. We could introduce Type Classes, or Union Types, and get the user to declare things explicitly, but there is no way to add it for free, exactly because of this unification.

Type systems fit the model of computation really well in almost all cases, but not perfectly. They require extensions and awkward additions in some edge cases. To get around this polytyped languages tend to squeeze their computation model to fit the type model better. Designers of unityped languages on the other hand can squeeze their computation model to fit to other things they like, such as natural language (Python), string operations (Perl), or lists (Lisp).

Another alternative is shown in languages such as Haskell, which have used concepts such as pure functions and lazy evaluation to keep the type system simple. This has complicated the computation model, up to the point where understanding how Haskell code actually executes can be very difficult. I like extreme languages, so I really like Haskell, but I do feel sorry for the suffering of the developers of GHC.

But most polytyped languages fall somewhere in-between, letting you express some common things, and restricting expression others, being sort of type-ish and sort of computational. I find this middle ground a frustrating compromise.

Polytypes are less extensible

All features in polytyped languages must be language features.

A type system must be necessarily simpler than it's computation model, otherwise it would contain just as many errors as it is trying to remove. One way to keep a type system simple is to limit it's extensibility.

Computation models on the other hand should not be limited. There shouldn't be a program that cannot be expressed by the computation model. The computation model should be as expressive and useful as possible! Here unityped languages have the most flexibility, because the computation model does not need to be unified with a type system. They can add complex extensions at a whim, or let users define extensions themselves! To add computational features to polytyped languages the type model often requires hard-coded unification which means changes to the compiler.

Again this probably does not bother Harper, but as someone who likes exploration and freedom of expression in computation, it bothers me.

Unitypes are worse is better

I prefer to work with polytyped languages, and I have no doubt that they are the right thing (tm). But they have suffered from a bad case of worse is better. Unityped languages are far more simple to create, and often more simple to program in. They work well enough, and so have been adopted and spread. They have obvious flaws, but users have contributed workarounds and solutions to their many problems. Just like the Unix virus they have spread, adapted and evolved.

It isn't true that unityped languages are a subset of polytyped languages. The type system does not define the language, and having a unitype has knock-on effects in all other areas of the language - syntax, semantics, implementation, design, even style, practices and community. This makes them distinct.

In my mind the best argument for polytypes is still the original, practical argument. That type systems, although far from perfect, are still the best way we know of finding errors at compile time. For this they are fantastic and I appreciate all the work that has been done on them by people such as Harper.