Typed languages, units.

I recently picked up programming again. I used to do it a lot before I went to university, but the constant mind-numbing programming assignments quickly put me off of programming. Apart from the occasional quick bug fix for software I use myself, I haven’t done any serious coding for years.

Until recently, when I needed something coded up during my research. I decided to learn Python, and I like it. It is easy to use, the libraries are extensive and user-friendly, and ipython is a useful tool. There is just one thing that draws my ire: the weak type system. Studying math has given me an appreciation for type checking that is even stricter than most languages.

An example: my length in centimeters plus the outside temperature in °C right now equals 180. This calculation makes no sense, because the units don’t match: you can’t add centimeters to degrees Celcius. But then there’s Python, which just lets you do that.

In [1]: length = 170

In [
2]: temperature = 10

In [
3]: length + temperature
Out[
3]: 180

Most bugs that stem from typos are of this sort. Those bugs are possible because the type system is too weak. If you have two loops, one iterating over i and one over j, basic unit-based type checking would probably flag any instance of i in a place where you should have typed j instead. If you intend to query A[i][j] then it should be possible to let i have row-index type and j have type-index type, making A[j][i] raise a type error.

Another example: Let A \in \mathbb{R}^{n \times n}, x \in \mathbb{R}^n, and we’re interested in the quantity Ax \in \mathbb{R}^n. If you’re like me and you can’t remember what rows and what columns are, then that doesn’t have to impact your ability to symbolically do linear algebra: the quantities xA = A^{\mathsf{T}}x, Ax^{\mathsf{T}} and A^{-1} x don’t “compile”, so any mathematician that reads it will know you typo-ed if you wrote one of those latter expressions. All operations might be matrices acting on vectors, but the matrices A^{-1} and A^{\mathsf{T}} fundamentally take input from different copies of \mathbb{R}^n than the ones that x and x^{\mathsf{T}} live in. That is why matrix operations make sense even if the matrices aren’t square or symmetric: there is only one way to make sense of any operation. Even if you write it wrong in a proof, most people can see what the typo is. But then there’s Python.

In [4]: import numpy as np 

In [5]: x = np.array([1,2])

In [6]: A = np.array([[3,4],[5,6]])

In [7]: np.dot(A,x)
Out[7]: array([11, 17])

In [8]: np.dot(A,np.transpose(x))
Out[8]: array([11, 17])

In [9]: np.dot(x,A)
Out[9]: array([13, 16])

I am like me and I can’t remember what rows and what columns are. I would like the interpreter to tell me the correct way of doing my linear algebra. At least one of the above matrix-vector-products should throw a type error. Considering the history of type systems, it is not surprising that the first languages didn’t introduce unit-based types. Nonetheless, it is a complete mystery to me why modern languages don’t type this way.