A lot of people are the smartest person in their country

556 words

Definition For a point set S \subset \mathbb{R}^n and a point x \in S, we say that x is Pareto-optimal if, for any y \in S, x \neq y, there exists an index i \leq n such that x_i > y_i.

Exercise For P any probability distribution on \mathbb{R}^n where the different coordinates are independent random variables and independent x_1,\dots,x_m \sim P, the expected number of Pareto optimal points among them is \sum_{i_1 = 1}^m \frac{1}{i_1} \sum_{i_2 = 1}^{i_1} \frac{1}{i_2} \cdots \sum_{i_{n-1}=1}^{i_{n-2}} \frac{1}{i_{n-1}}. (hint: induction on n)


Principal component analysis is a standard tool for anyone who wants to do quantitative science. You start with a pile of data points that vary along different axes, find the axis of most variation in the data, find the axis of second most variation, and so on. Bonus points if afterwards you pronounce that this statistical ghost is a real and meaningful quantity. See: Spearman’s g, big-five personality traits, among others.

If your data is normally distributed, and lets face it, all data is normally distributed if you squint enough, then the principal components are, in expectation, exactly the rows of your covariance matrix. That is, when you re-parametrize your data such that the principal components align with the coordinate directions, then the coordinates of any data point are independent random variables.


The different quantities resulting from a principal component analylis are always orthogonal directions. The first principal components are the axes of largest variation in the data set (the single component of Spearman’s g, the five personality traits), they don’t come equipped with any sense of being the “most important”. Just the biggest variance, which is a notion that is well-defined only when you chose the units of the original coordinates well. Therefore, we cannot say that one of the components is “the real one”, i.e., we cannot impose a meaningful linear ordering of quality after calculating the principal components.


Anyone can tell you that intelligence varies along different dimensions. Easy example: some people are good at math but bad with languages, for others it is the other way around. How many distinct directions are there? I don’t think it is a stretch to think there are at least 11 different ones. Turning our proof into a computation, we can now count the expected number of Pareto optimal points for given m,n.

def initialize(i):
     tab = []
     for k in range(1,i+1):
         tab.append((k,1./k))
     return tab
def iterate(tab):
     s = 0.
     for i in range(len(tab)):
         s = s + tab[i][1]
         tab[i] = (tab[i][0],s/tab[i][0])
     return tab
def final(tab):
     out = 0.
     for (x,y) in tab:
         out = out + y
     return out
def count(m, n):
     if n == 1:
         return 1
     table = initialize(m)
     while n > 2:
         table = iterate(table)
         n = n-1
     return final(table)

In our intelligence interpretation, this counts how many people can be said to be “the smartest”, in the sense that nobody else is smarter in all 11 different ways. That is, we can call someone “the smartest” if their location in intelligence space is Pareto-optimal among all people.

The above code has the maximum size of m limited by memory space, but we can calculate count(4882495,11) = 427462.8, where 4882495 is the population count of Ireland. Making the appropriate division, we conclude that one in every 11.4 Irish people is the smartest person in Ireland.

Discrete optimization

129 words

Professor: Welcome back to Algorithms class. Today, we will cover Dijkstra’s shortest path algorithm, the core of all navigation software. Next week’s assignment is to write code to compute shortest paths in a number of real world instances.

Professor: Suppose we have a road network described by a graph with n vertices and m edges and a vector c \in \mathbb{R}^E assigning a travel time to every edge. We want to minimize total travel time, and –

Student: I have a question. What if we want to keep other things into account, like finding a simple route with few turns?

Professor: The formalism allows for that by changing the objective and graph. Anyway, so we want to minimize travel time and the first step is to [blah blah blah]

Operations Research

98 words

Professor: Welcome back to Operations Research class. Last week we discussed the mathematical fundamentals of optimization theory, and this week we will see how to apply it to practice. Suppose we have n machines and m employees working on k different products. We want to maximize total profit, and –

Student: I have a question. What if we want to keep other things into account, like work satisfaction among employees?

Professor: The formalism allows for that by changing the objective and constraints. Anyway, so we want to maximize profit and the first step is to [blah blah blah]

Case study in the politics of mathematics: optimality of Nash equilibria

580 words

Back in the Soviet era, Leonid Vitaliyevich Kantorovich got himself in trouble. He developed a theory of optimization problems, originally for the sake of optimizing plywood production, but he saw a potential for using his techniques to optimize the entire Soviet economy. So he sent letters to the central planning bureau to convince them to make use of his ideas.

He has been involved in advanced mathematical research since the age of 15; in 1939 he invented linear programming, one of the most significant contributions to economic management in the twentieth century. Kantorovich has spent most of his adult life battling to win acceptance for his revolutionary concept from Soviet academic and economic bureaucracies; the value of linear programming to Soviet economic practices was not really recognized by his country’s authorities until 1965, when Kantorovich was awarded a Lenin prize for his work.

Excerpt from original CIA file on Kantorovich.

There was just one problem, arising from the theory of linear programming duality. For any linear optimization problem, we can derive a dual problem. If you solve both the primal and dual problems, they turn out to have the same solution value, and moreover the optimal solution to the one certifies optimality of the solution to the other. These “dual solutions” can have clear interpretations. In the case of optimizing resource allocations, the dual solution can be interpreted as market prices.

LP duality theory connects the notion of optimal resource allocation as used for central planning with the efficient market hypothesis and Nash equilibria. This connection can be interpreted as “capitalist markets find the optimal allocation of goods and services”. Obviously, the communists did not like that.

This interpretation has been popular in the US as well. They’d see this interpretation in light of the slightly weaker “first welfare theorem” and Smith’s invisible hand.

Of course there are numerous solid arguments for why these interpretations are bogus. To name just a few: markets are not convex, barriers to market entry are non-zero, humans are not perfectly rational nor omniscient, computation and communication are neither free nor instantaneous, negative externalities are common, the dual LP only takes prices into account and not net cash flow of individuals, and the “welfare function” doesn’t necessarily exist.

All of this was known 50+ years ago, but apart from the 1950 cutesy example of the prisoner’s dilemma, game theorists didn’t do much of note to dispute the notion that Nash equilibria are typically good. Nothing really got formalized until 20 or so years go with the introduction of the price of anarchy. The price of anarchy (stability) of a game is the ratio of social welfare in the optimal solution versus the social welfare in the worst (best) Nash equilibrium. The PoA and PoS can be unboundedly bad, and recent decades have seen a lot of exciting work happening here.

Mathematics is political. When we want to apply mathematical theorems to the real world, we need to make certain simplifying assumptions. Or actually, if we want to think any thoughts about the real world, but whatever. The simplifications we make determine the theorems we can prove. We can work on theorems that “prove” that capitalism is good or we can work on theorems that “prove” that capitalism is bad. Both would be valid mathematics, but both individuals and communities can decide to value one over the other, resulting in undergrads being taught either “10/10 scientists agree: capitalism is good” or “beware: capitalism can be arbitrarily bad”.

What does it mean to call something a social construct?

430 words

Many explanations of the term “social construct” are severely lacking. Let me explain what I personally mean when I call something a social construct. Largely, it means that we could have done it differently.

Gender is a social construct. Our ideas of which things are for men and which are for women has differed over place and time. Little boys used to wear dresses, women used to be considered mentally unfit to own property, gender nonconforming people (like women wearing pants) used to be shunned by their community. And that is just Western countries over the past century, without considering different millennia or cultures. We could change our current notions of gender without the gods coming down to interfere with our ambitions.

Money is a social construct. By choosing to redistribute or not, we shape the role of money and the way it influences the rich and poor. Same for (dis)allowing usury, theft or private ownership of the means of production, to name a few socially constructed aspects of private property and thus of money. Every facet of money is constructed through a social process, but the resulting structure can impact a person’s life as much as the law of gravity.

Mathematics is a social construct.1If you’re a platonist, read this as “our practice and understanding of mathematics.” I refer to the mathematics we know, not the mathematics that can be. There is no linear “tech tree” forcing people to develop Euclidean geometry before graph theory or predator/prey models. No law of nature tells pure mathematicians to abhor the thought of their work finding practical use. Instead of sorting computational problems into various Turing degrees,2Computable problems have the lowest Turing degree. The halting problem has a higher Turing degree. There are infinitely many other Turing degrees, some of which are higher or lower than the halting problem and some which are incomparable we could have been doing something useful instead. Peer reviewers use their own judgement to decide whether a result is novel, interesting and proven with enough rigor, no objective measurement devices involved. The choice of what is Real Mathematics and what is not is similarly arbitrary.

I tell people that a thing is socially constructed when I want to remind them that they could be different. When they’re talking about the thing as if it is a fixture of life and I want to invite them to imagine how it could be changed. Because no matter your philosophy of mathematics or external reality, the way we understand and value various ideas is always different among individuals and communities.

We can’t simulate nematodes with only 302 neurons

148 words

C. Elegans is a wonderful creature. It’s a nematode, so it looks like a tiny worm. It is one of biologists’ favorite model organisms and we know a lot about it.

Bob Goldstein, 2007, CC-BY-SA, via Wikimedia Commons.

It has roughly 1000 cells, and its cell lineage is almost identical for every individual. This creature has been mapped out more thoroughly than any other species on the planet. You can browse a map of its cells through the OpenWorm Browser. The hermaphrodite has 302 neurons and its connectome is completely known.

And faithfully simulating a nematode remains far out of reach. The state of the art is concerned with modeling basic aspects of its locomotion, like its different gaits on different surfaces.

It is silly to think we’ll be simulating human minds any time soon, when people have been trying to simulate C. Elegans for decades now.

The O(n³) time phenomenon

199 words

I am fond of things that should not work but still do. Stuff like induction, sat solvers, or the preservation of approximate truth under logical deduction. One more example is the O(n³) phenomenon.

To fix notation, we say an algorithm runs in time O(n³)3Read: in time of order n cubed. if there are numbers a, b > 0 such that, given an input of size n, the algorithm, run on a reasonable theoretic computer model with random-access memory, takes time upper bounded by a * n³ + b, for any n. This value n might be the number of letters in an input string, the number of vertices or edges in an input graph, the ambient dimension of a convex body, the number of variables in your 3SAT instance, or any other such number measuring input size.

In CS theory, the O(n³) time phenomenon is the observation that any problem which is solvable in polynomial time has an algorithm solving it in O(n³) time. It is false, and showing so is a nice undergrad exercise in diagonalization. It is also kind of true, as its prediction comes true again and again for almost all problems that we care about.

Deepmind will start playing fair?

195 words

In case you missed it, last week bought the most exciting piece of AI news of the past couple years. Deepmind’s Alphastar will be playing real non-NDA’d humans for multiple games (via Ars Technica).

This is cool news. Partly because they finally seem intent on fixing their cheating with superhuman micro, but mostly because repeat games could allow humans to develop counter strategies. The exhibition matches earlier this year only had the humans play one match against any given bot, thus giving the bot a huge informational advantage.4The neural nets were trained through supervised learning on human matches before being pitted against each other. These new matches will hopefully end up being more fair. However, Blizzard’s press release is vague enough that Deepmind could still decide to play only a hand full of games, which would prevent humans from gaining enough knowledge to devise counter strategies. Worse, the Alphastar team could decide to stop playing once they observe online message boards sharing good tactics between human players or any other nasty hacking of p’s.

The hype labs haven’t been doing well in fair play or reproducibility so far, but I’m willing to hope they’ll improve.

12. Why Kolmogorov complexity is useless

377 words

Mathematical hobbyists tend to be fascinated with information theory, Kolmogorov complexity and Solomonoff induction.This sentiment is very understandable. When I first learned of them, these subjects felt like they touch upon some fundamental truth of life that you don’t normally hear about. But for all its being a fundamental property of life and understanding, mathematicians treat it as a cute mathematical curiosity at most. In this post I will explain some of the reasons why so few mathematicians and computer scientists have cared about it over the past 50 years.

The zero-th reason is that it depends on your choice of encoding. You cannot cover this up by saying that any Turing machine can simulate any other with constant overhead, because a 2000 bit difference is not something you can compensate for on data-constrained topics.

The first reason is obvious and dates back to ancient times. Kolmogorov complexity is not computable. Properties that we can literally never know are pretty useless in everyday life.

The second reason is related: we cannot compute Kolmogorov complexity in practice. Even time-constrained variants are hellishly expensive to compute for large data sets.

The third reason is more typical of modern thinking in computer science theory. Namely that any theory of information needs a theory of computation to be useful in practice. This is directly related to the difference between computational and statistical indistinguishability, as well as the myth that your computer’s entropy pool could run out. Cryptography is safe not because it is information-theoretically impossible to retrieve the plaintext but because it is computationally infeasible to retrieve the plaintext. The Kolmogorov complexity of a typical encrypted data stream is low but it would be mistake to think that anyone could compute a short description. Along another route, once I have told you an NP-complete problem (with a unique solution), it won’t add any new information if I told you the answer. But still you would learn new information by getting the answer from me, because you couldn’t compute it yourself even knowing all requisite information.

Kolmogorov complexity is useless based on classical CS theory, practice and modern CS theory. This is how you know that anyone who proposes that it is an integral part of rational thought is full of shit.

11. Did MIRI cause a good thing to happen?

1134 words

The Future Perfect podcast from Vox did an episode proclaiming that AI Risk used to be a fringe concern but is now mainstream. That is why OpenAI did not open up GPT-2 to the general public.5Or to academic peer review for that matter. This was good. Everything thanks to Jaan Tallinn and Eliezer Yudkowsky. I cannot let this go uncontested.

Today: why are the people who made our writing bot so worried about what it could do? The short answer is they think that artificial intelligence models like this one can have major unintended consequences. And that’s an idea that’s moved from the fringe to the mainstream with the help of philanthropy.

[0:02:21-0:02:43]

This here is the central claim. The money of Tallinn is responsible for people thinking critically about artificial intelligence. Along the way we hear that Tallinn acquired his AI worries from Yudkowsky. Hence, Yudkowsky did something good.

AI might have unintended consequences, like taking our jobs or messing with our privacy. Or worse. There are serious researchers who think the AI could lead to people dying: lots of people. Today, this is a pretty mainstream idea. It gets a lot of mentions it any round-up by AI expert of their thinking on AI and so it’s easy to forget that a decade ago this was a pretty fringe position. If you hear this kind of thing and your reaction is like “Come on, Killer Robots? Really that sounds like science fiction”, don’t worry, you are part of a long tradition of dismissing the real world dangers of AI. The founders of the field wrote papers in which they said as an aside. “Yes. This will probably like transform human civilization and maybe kill us.” But in the last decade or so, something has started to change. AI Risk stopped being a footnote in papers because a small group of people in a small group of donors started to believe that the risks were real. Some people started saying wait if this is true, it should be our highest priority and we should be working on it. And those were mostly fringe people in the beginning. A significant driver of the focus on AI was Eliezer Yudkowsky.

[0:04:17-0:05:50]

So the driving force behind all worries about AI is said to be Yudkowsky. Because of his valiant essay-writing, Tallin got convinced and put his money towards funding MIRI and OpenAI. Because of course his real fears center around Mickey Mouse and Magic Broomstick, not on algorithms being biased against minorities or facial recognition software being used to put the Uyghur peoples in China in concentration camps. Because rational white men only focus on important problems.

Yes, so here are a couple examples every year the Pentagon discovers some bugs in their system that make them vulnerable to cybersecurity attacks. Usually they discover those before any outsiders do and they’re there for able to handle them. But if an AI system were sufficiently sophisticated, it could maybe identify the bugs that the Pentagon wouldn’t discover for years to come and therefore be able to do things like make it look to the US government like we’re being attacked by a foreign nuclear power.

[0:13:03-0:13:34]

This doesn’t have anything to do with my point, I just think its cute how people from America, the country whose army of cryptographers and hackers (all human) developed and lost the weapons responsible for some of the most devastating cyberattacks in history, worry that other countries might be able to do the same things if only they obtain the magical object that is speculated to exist in the future.

[GPT-2] is a very good example of how philantropic donations from people like Jaan Tallinn have reshaped our approach to AI. The organization that made GPT-2 to is called OpenAI. OpenAI got funding from Jaan Tallinn among many others and their mission is not just to create Artificial Intelligence. But also to make sure that the Artificial Intelligence it creates doesn’t make things worse for Humanity. They’re thinking about, as we make progress in AI, as we develop these systems with new capabilities, as we’re able to do all these new things, what’s a responsible process for letting our inventions into the world? What does being safe and responsible here look like and that’s just not something anybody thought about very much, you know, they haven’t really asked what is the safe and responsible approach to this. And when OpenAI started thinking about being responsible, they realized “Oh man, that means we should hold off on releasing GPT-2”.

[0:17:53-0:19:05]

This is boot licking journalism, completely going along with the narrative that OpenAI’s PR department is spinning, just like Vox’s original coverage of that puff news and all of the Effective Altruism community’s reaction. There is something profoundly absurd about taking a corporate lab’s press release at face value and believing that those people live in a vacuum. A vacuum where nobody had previously made unsupervised language models, as well as one where nobody had previously thought about what responsible release of ML models entails. OpenAI is fundamentally in the business of hype, to stroke their funders’ egos, and providing compute-heavy incremental progress in ML is just the means to this end.

It’s kind of reassuring but this organization is a voice at the table saying hey, let’s take this just a little slower. And the contributions from donors like Jaan Tallinn, they have to put that cautionary voice at the table and they put them there early. You know, I think it mattered. I think that the conversation we’re having now is probably more sophisticated, more careful, a little more aware of some of the risks than it would been if there hadn’t been these groups starting 10-15 years ago to start this conversation. I think I has one of those cases where something was always going to be funded only from the fringe and where it really didn’t matter that it got that funding from the fringe.

[0:20:18-0:20:53]

The writing makes a clear statement here: the people on the fringe (Yudkowsky et al.) are a significant part of the reason why people are thinking about this. I can hardly imagine how a journalist could say this after having done any research on the topic outside of their own cult-bubble, so I think they didn’t do this.

People in EA, people in ML and the staff at Vox seem almost willfully ignorant of all previous academic debate on dual use technology, none of which derives from MIRI’s fairy tales of evil genies. I blame this phenomenon on contempt of rationalists for the social sciences. If Yudkowsky contributed anything here, it might mainly be in making socio-political worries about technology seem marginally more exciting to his tech bro audience. But the counterfactual is unclear to me.