*I am skeptical of AI Safety (AIS) as an effective cause area, at least in the way AIS is talked about by people in the effective altruism community. However, it is also the cause area that my skills and knowledge are the best fit for contributing, so it seems worthwhile for me to think my opposition to it through.*

Previously: [1] [2] [3] [4] … [latest].

*Epistemic status: this argument has more flaws than I can count. Please don’t take it seriously. [See the post-script]*

Let’s answer this abstract philosophical question using high-dimensional geometry.

I’ll assume for simplicity that there is a single property called intelligence and the only variation is in how much you have of it. So no verbal intelligence vs visiual intelligence, no being better at math and than at languages, the only variation is in how much intelligence we have. Let us call this direction of variation g, scaled to have \|g\| = 1, and pretend that it is roughly the thing you get from a singular value decomposition/principal component analysis of human’s intelligence test results.

A typical neural net has many neurons. For example, VGG-19 has ~143 million parameters. Now suppose that we train a VGG-19 net to classify images. This is an optimization problem in \mathbb{R}^{143 \text{ million}}, and let’s call the optimal parameter setting x. By definition, the trained net has an intelligence of exactly the inner product g^{\mathsf{T}}x.^{1} ^{2}

The trained net is intelligent in exactly the extend that intelligence helps you recognize images. If you can recognize images more efficiently by not being intelligent, then the trained net will not be intelligent. But exactly how helpful would intelligence be in recognizing images? I’d guess that a positive amount of intelligence would be better than a negative amount, but other than that I have no clue.

As a good subjective Bayesian, I’ll hence consider the vector \omega of goodness-at-recognizing-images to be chosen uniformly from the unit sphere, conditional on having non-negative intelligence, i.e., uniformly chosen from \{\omega\in\mathbb{S}^{143\text{ million} - 1} : g^{\mathsf{T}}\omega \geq 0\}. For this distribution, what is the expected intelligence \mathbb{E}[g^{\mathsf{T}}x]? Well, we know, we know that x maximizes \omega, so if the set of allowed parameters is nice we would get g^{\mathsf{T}}x \approx g^{\mathsf{T}}\omega \cdot \|x\|,^{3} where \|x\| is how good the net is at recognizing images. We can calculate this expectation and find that, up to a constant factor, \mathbb{E}[g^{\mathsf{T}}\omega] \approx \frac{2}{\sqrt{2e\pi(143\text{ million}-1)}}.

So the trained VGG-19 neural net is roughly 10^{-5} times as intelligent as it is good at recognizing images. Hence, it is probably not very smart.

- Note that the projection of
*g*into this 143 million-dimensional space might be much shorter than g itself is, that depends on the architecture of the neural net. If this projection is very short, then every parameter setting of the net is very unintelligent. By the same argument that I’m making in the rest of the post, we should expect the projection to be short, but let’s assume that the projection is long for now. [↩] - I’m assuming for simplicity that everything is convex. [↩]
- I have to point out that this is by far the most unrealistic claim in this post. It is true if x is constrained to lie in a ball, but in other cases it might be arbitrarily far off. It might be true for the phenomenon I describe in the first footnote. [↩]

It must be noted that while the argument is BS, I do fully believe the basic intuition. Both for “intelligence” as well as for any other less-anthropomorphizing property such as “capacity for creating catastrophes”.