**Definition** For a point set S \subset \mathbb{R}^n and a point x \in S, we say that x is *Pareto-optimal* if, for any y \in S, x \neq y, there exists an index i \leq n such that x_i > y_i.

**Exercise** For P any probability distribution on \mathbb{R}^n where the different coordinates are independent random variables and independent x_1,\dots,x_m \sim P, the expected number of Pareto optimal points among them is \sum_{i_1 = 1}^m \frac{1}{i_1} \sum_{i_2 = 1}^{i_1} \frac{1}{i_2} \cdots \sum_{i_{n-1}=1}^{i_{n-2}} \frac{1}{i_{n-1}}. (hint: induction on n)

Principal component analysis is a standard tool for anyone who wants to do quantitative science. You start with a pile of data points that vary along different axes, find the axis of most variation in the data, find the axis of second most variation, and so on. Bonus points if afterwards you pronounce that this statistical ghost is a real and meaningful quantity. See: Spearman’s g, big-five personality traits, among others.

If your data is normally distributed, and lets face it, all data is normally distributed if you squint enough, then the principal components are, in expectation, exactly the rows of your covariance matrix. That is, when you re-parametrize your data such that the principal components align with the coordinate directions, then the coordinates of any data point are independent random variables.

The different quantities resulting from a principal component analylis are always orthogonal directions. The first principal components are the axes of largest variation in the data set (the single component of Spearman’s *g*, the five personality traits), they don’t come equipped with any sense of being the “most important”. Just the biggest variance, which is a notion that is well-defined only when you chose the units of the original coordinates well. Therefore, we cannot say that one of the components is “the real one”, i.e., we cannot impose a meaningful linear ordering of quality after calculating the principal components.

Anyone can tell you that intelligence varies along different dimensions. Easy example: some people are good at math but bad with languages, for others it is the other way around. How many distinct directions are there? I don’t think it is a stretch to think there are at least 11 different ones. Turning our proof into a computation, we can now count the expected number of Pareto optimal points for given m,n.

```
def initialize(i):
tab = []
for k in range(1,i+1):
tab.append((k,1./k))
return tab
def iterate(tab):
s = 0.
for i in range(len(tab)):
s = s + tab[i][1]
tab[i] = (tab[i][0],s/tab[i][0])
return tab
def final(tab):
out = 0.
for (x,y) in tab:
out = out + y
return out
def count(m, n):
if n == 1:
return 1
table = initialize(m)
while n > 2:
table = iterate(table)
n = n-1
return final(table)
```

In our intelligence interpretation, this counts how many people can be said to be “the smartest”, in the sense that nobody else is smarter in all 11 different ways. That is, we can call someone “the smartest” if their location in intelligence space is Pareto-optimal among all people.

The above code has the maximum size of `m`

limited by memory space, but we can calculate `count(4882495,11) = 427462.8`

, where 4882495 is the population count of Ireland. Making the appropriate division, we conclude that one in every 11.4 Irish people is the smartest person in Ireland.