What does it mean to call something a social construct?

430 words

Many explanations of the term “social construct” are severely lacking. Let me explain what I personally mean when I call something a social construct. Largely, it means that we could have done it differently.

Gender is a social construct. Our ideas of which things are for men and which are for women has differed over place and time. Little boys used to wear dresses, women used to be considered mentally unfit to own property, gender nonconforming people (like women wearing pants) used to be shunned by their community. And that is just Western countries over the past century, without considering different millennia or cultures. We could change our current notions of gender without the gods coming down to interfere with our ambitions.

Money is a social construct. By choosing to redistribute or not, we shape the role of money and the way it influences the rich and poor. Same for (dis)allowing usury, theft or private ownership of the means of production, to name a few socially constructed aspects of private property and thus of money. Every facet of money is constructed through a social process, but the resulting structure can impact a person’s life as much as the law of gravity.

Mathematics is a social construct.1If you’re a platonist, read this as “our practice and understanding of mathematics.” I refer to the mathematics we know, not the mathematics that can be. There is no linear “tech tree” forcing people to develop Euclidean geometry before graph theory or predator/prey models. No law of nature tells pure mathematicians to abhor the thought of their work finding practical use. Instead of sorting computational problems into various Turing degrees,2Computable problems have the lowest Turing degree. The halting problem has a higher Turing degree. There are infinitely many other Turing degrees, some of which are higher or lower than the halting problem and some which are incomparable we could have been doing something useful instead. Peer reviewers use their own judgement to decide whether a result is novel, interesting and proven with enough rigor, no objective measurement devices involved. The choice of what is Real Mathematics and what is not is similarly arbitrary.

I tell people that a thing is socially constructed when I want to remind them that they could be different. When they’re talking about the thing as if it is a fixture of life and I want to invite them to imagine how it could be changed. Because no matter your philosophy of mathematics or external reality, the way we understand and value various ideas is always different among individuals and communities.

Beyond Condorcet winners: Majority Judgement

631 words

There have been rumours on Wikipedia that Michael Balinksi passed away this Februari 4th. If this is indeed the case, then my heartfelt sympathies go out to those around him. I never knew him personally, but he was an amazing scientist and I’ve used results of his more than once. He was quite a phenomenal speaker.

Today I want to talk about a single-winner voting system that Balinksi introduced with Rida Laraki. It is called majority judgement and it is so brilliant that it almost makes me wonder what voting theorists could have been doing both before and since then.

One big concept in social choice theory is majority rule: if most people think that A is better than B, then A should be the winner. Most multi-candidate voting systems generalize this in various ways, always preserving that if candidate A would beat every other candidate B, C, etc, in a pairwise competition, then A should win the election. If candidate A satisfies this criterium, we call A a Condorcet winner. The leading wisdom in social choice theory was that any good voting rule should let the Condorcet winner win (if it exists).

According to my informal sample from the Effective Altruism community, EA’s favourite voting system seems to be approval voting, which is one of these voting systems that generalizes majority rule to multiple candidates.

The genius of majority judgement is that it moves past the Condorcet winner paradigm and considers a perspective beyond that.

To illustrate, let’s assume we’re running an election among two candidates, the red candidate and the blue candidate, and every voter gets a ballot with for each candidate the options “Amazing”, “Good”, “Mediocre” and “Terrible” to express how good of a president they think the candidate would be. Let us for simplicity assume that our voting population consists of 5 groups, the A’s, B’s, C’s, D’s and E’s, and everyone in a group votes the exact same way. The outcome of the election is in the table below.

ABC
% of population402040
Red candidateAmazingMediocreTerrible
Blue candidateGoodTerribleAmazing

The red candidate wins the Condorcet-style election: 60% of the population prefers the red candidate over the blue candidate. But the blue candidate is clearly better: 80% of the population considers the blue candidate to be “Good” or better, while 60% of the population considers the red candidate to be “Mediocre” or worse.

Majority judgement is a voting rule designed to have the blue candidate win in the above election. The actual algorithm is a bit involved, but the first step is to compare the median vote: if at least 50% of the voters think the blue candidate is “Good” or better and at least 50% of the voters think that the red candidate is “Mediocre” or worse, than the blue candidate will win. If the median opinion is a tie, a more complicated tie-breaking rule is entered. ((The exact rule satisfies a number of optimality criteria and is the only rule to do so. For this post I want to skip the details.))

I think the concept is very elegant, and I believe that the outcomes really would be better than with a system that elects Condorcet winners. In a talk that Balinksi gave, which I was lucky enough to attend, he pointed out another advantage of the majority judgement rule: it allows voters to express what they think of the candidates. You wouldn’t be asking anyone to “vote for the lesser evil”, everyone can keep their conscience clear. Majority judgement admits a clear way of expressing frustration with both candidates: rank both of them very badly. It also helps that the different options are given in words instead of ranking by numbers, for the latter turns out to entice voters to rate their favourite candidate 10/10 and all others 0/10.

Gaussian tail bounds

812 words

One-dimensional tail bounds

The standard normal distribution N(0,1) has probability density function \frac{1}{\sqrt{2\pi}}e^{-x^2/2}. There is no way to integrate this function symbolically in a nice way, but we do at times want to (upper) bound expressions of the form \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t. How can we do this?

One way is to follow this approach. Since t\geq x everywhere, we can upper bound \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t \leq \frac{1}{\sqrt{2\pi}}\int_x^\infty \frac{t}{x} e^{-t^2/2} \mathrm{d}t = \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}.

There is another tail bound which is a bit weaker for large x, but I like the proof better. We’ll give a tail bound by looking at the moment-generating function \lambda \mapsto \mathbb{E}[e^{\lambda X}], where X \sim N(0,1) is our normally distributed random variable. We can explicitly calculate this expectation and find \mathbb{E}[e^{\lambda X}] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{\lambda x - x^2/2}\mathrm{d}x = \frac{1}{\sqrt{2\pi}}e^{\lambda^2/2}\int_{-\infty}^\infty e^{-(x-\lambda)^2/2}\mathrm{d}x. The last term is just the entire Gaussian integral shifted a bit and hence \mathbb{E}[e^{\lambda X}] = e^{\lambda^2/2} Now we use Chernoff’s bound (an easy corrollary of Markov’s inequality) to find \mathbb{P}[X \geq t] \leq \mathbb{E}[e^{\lambda X}]e^{-\lambda t}, which we can now minimize over the choice of \lambda, setting \lambda=t, and we conclude that \mathbb{P}[X \geq t] \leq e^{-t^2/2}.

Multi-variate Gaussians

Let X \in \mathbb{R}^d be N(0,I_d) normally distributed, i.e., X is a vector with iid Gaussian N(0,1) entries. What tail bounds do we get on \|X\|? We start off with Markov’s inequality again. \mathbb{P}[\|X\| > t] = \mathbb{P}[e^{\lambda\|X\|^2} > e^{\lambda t^2}] \leq \frac{\mathbb{E}[e^{\lambda\|X\|^2}]}{e^{\lambda t^2}}.

Deriving the moment generating function \lambda \mapsto \mathbb{E}[e^{\lambda\|X\|^2}] of X^2 is an elementary calculation. \int_{-\infty}^\infty e^{\lambda x^2} \cdot e^{-x^2/2} \mathrm{d}x = \int_{-\infty}^\infty e^{\frac{-x^2}{2(\sqrt{1-2/\lambda})^2}}\mathrm{d}x = \frac{\sqrt{2\pi}}{\sqrt{1-2\lambda}}.

The coordinates of X are iid, so \mathbb{E}[e^{\lambda\|X\|^2}] = \mathbb{E}[e^{\lambda X_1^2}]^d = (1-2\lambda)^{-d/2}. The minimizer is at \lambda=(1-1/t^2)/2, and we find, requiring t \geq 1 for the last inequality,\mathbb{P}[\|X\| > t] \leq e^{-d(t^2-2\log t - 1)/2} \leq e^{-d(t-1)^2}.

Operator norm of Gaussian matrices

The operator norm or spectral norm of a n \times n matrix M is defined as \|M\| := \max_{x \in \mathbb{R}^n} \frac{\|Mx\|}{\|x\|}.

Now if M were a matrix with every entry independently N(0,1), what would the largest singular value of this random Gaussian matrix be? I’ll give an easy tail bound based on a net argument.

An \eta-net, \eta > 0, on the sphere is a subset N \subset \mathbb{S}^{d-1} such that for every point x \in \mathbb{S}^{d-1} there is a net element n \in N such that \|x-n\| \leq \eta, but every two net elements are at distance at least \eta from each other. A greedy algorithm can construct an \eta-net, and any \eta-net has size at most (4/\eta)^d. 1See e.g., Jiří Matoušek, Lectures on Discrete Geometry (Springer, 2002), page 314. The proof is based on a simple packing argument where balls of radius \eta/2 around each net element have to fit disjointly inside the ball of radius 1+\eta/2 \leq 1 centered at the origin.

Now let N\subset \mathbb{S}^{d-1} be a 1/2-net. By the above, the size of the net is bounded by |N| \leq 8^d.

The function x \mapsto \|Mx\| is \|M\|-Lipschitz. Hence we can bound \|M\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|\cdot\|x-\omega\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|/2. So we have now proved that \|M\| \leq 2\max_{\omega\in N} \|M\omega\|.

Now, as M\omega is N(0,I_d) normally distributed for any \omega\in\mathbb{S}^{d-1}, we can use the union bound over all points of N and conclude that, for all t \geq 1, \mathbb{P}[\|M\| \geq 2t\sqrt{d}] \leq 8^d e^{-d(t-1)^2/2}.

Maximum of n Gaussians

The distribution of the maximum \mathbb{P}[\max_{i \leq n} X_i \geq t] of n independent identically distributed variables X_1,\ldots,X_n \sim N(0,1) is, up to a constant factor, tight with the union bound \mathbb{P}[\max_{i \leq n} X_i \geq t] \leq ne^{-t^2/2}.

From this, we can find that the expected maximum is \mathbb{E}[\max_{i \leq n} X_i] = O(\sqrt{\ln n}). We derive this by bounding the integral \mathbb{E}[\max_{i \leq n} X_i] = \int_0^\infty \mathbb{P}[\max_{i \leq n} X_i \geq t] {\rm d}t. Split the integral at t = \sqrt{\ln n}, bound the integrand in the first part by 1 (it is a probability) and in the second part by ne^{-t^2/2}.

Average width of the simplex

Let x_1,\dots,x_{d+1} \in \mathbb{R}^d be the vertices of a regular simplex such that \|x_i\| = 1 for all i \in [d+1]. If \omega \in \mathbb{S}^{d-1} is chosen uniformly at random, the expectation of the difference \max_{i,j\in[d+1]} |\omega^{\mathsf{T}}(x_i-x_j)| is called the average width of the simplex. We can bound this up to a constant factor using our knowledge of Gaussians. Let H_t := \{y\in\mathbb{R}^d : \omega^{\mathsf{T}}y = t\}. The d-2-dimensional volume of H_t\cap \mathbb{S}^{d-1} is (1-t^2)^{(d-1)/2} times the volume of \mathbb{S}^{d-2} by Pythatoras’ theorem. Recalling that (1+1/\lambda)^\lambda \approx e, you can prove that the distribution of \omega^{\mathsf{T}}x_i is approximately N(0,1/\sqrt{d-1}). The Gaussian tail bound now says that the average width of the simplex is O(\frac{\sqrt{\ln d}}{\sqrt d}).