## Two conflicting concepts

Sometimes you hear a word or concept that changes how you look at the world. For me, these include speciecism and epistemic injustice.

Speciecism is analogous to racism and sexism, but for species: treating another being differently because they are of another species. Speciecism is about intent; if you eat chickens because they are chickens and not humans, that is speciecist, but if you eat chickens because you concluded from observation that they are incapable of suffering, that is not speciecist.

Epistemic injustice is when someone is wronged in their capacity as a knower. If you unjustly limit somebody’s ability to access or express knowledge, like forbidding them from learning to read or speak, that is an epistemic injustice.

I am an outspoken anti-speciecist and I think we should do what we can to prevent epistemic injustice in all forms. But some animals have learned enough language to meaningfully communicate with humans. Does that mean I should find it reprehensible that there are no schools for animals? I think I should and I think I do, but I feel hesitant to firmly claim the position.

## Typed languages, units.

I recently picked up programming again. I used to do it a lot before I went to university, but the constant mind-numbing programming assignments quickly put me off of programming. Apart from the occasional quick bug fix for software I use myself, I haven’t done any serious coding for years.

Until recently, when I needed something coded up during my research. I decided to learn Python, and I like it. It is easy to use, the libraries are extensive and user-friendly, and ipython is a useful tool. There is just one thing that draws my ire: the weak type system. Studying math has given me an appreciation for type checking that is even stricter than most languages.

An example: my length in centimeters plus the outside temperature in °C right now equals 180. This calculation makes no sense, because the units don’t match: you can’t add centimeters to degrees Celcius. But then there’s Python, which just lets you do that.

In [1]: length = 170

In [
2]: temperature = 10

In [
3]: length + temperature
Out[
3]: 180

Most bugs that stem from typos are of this sort. Those bugs are possible because the type system is too weak. If you have two loops, one iterating over i and one over j, basic unit-based type checking would probably flag any instance of i in a place where you should have typed j instead. If you intend to query A[i][j] then it should be possible to let i have row-index type and j have type-index type, making A[j][i] raise a type error.

Another example: Let $A \in \mathbb{R}^{n \times n}, x \in \mathbb{R}^n$, and we’re interested in the quantity $Ax \in \mathbb{R}^n$. If you’re like me and you can’t remember what rows and what columns are, then that doesn’t have to impact your ability to symbolically do linear algebra: the quantities $xA = A^{\mathsf{T}}x$, $Ax^{\mathsf{T}}$ and $A^{-1} x$ don’t “compile”, so any mathematician that reads it will know you typo-ed if you wrote one of those latter expressions. All operations might be matrices acting on vectors, but the matrices $A^{-1}$ and $A^{\mathsf{T}}$ fundamentally take input from different copies of $\mathbb{R}^n$ than the ones that $x$ and $x^{\mathsf{T}}$ live in. That is why matrix operations make sense even if the matrices aren’t square or symmetric: there is only one way to make sense of any operation. Even if you write it wrong in a proof, most people can see what the typo is. But then there’s Python.

In [4]: import numpy as np

In [5]: x = np.array([1,2])

In [6]: A = np.array([[3,4],[5,6]])

In [7]: np.dot(A,x)
Out[7]: array([11, 17])

In [8]: np.dot(A,np.transpose(x))
Out[8]: array([11, 17])

In [9]: np.dot(x,A)
Out[9]: array([13, 16])

I am like me and I can’t remember what rows and what columns are. I would like the interpreter to tell me the correct way of doing my linear algebra. At least one of the above matrix-vector-products should throw a type error. Considering the history of type systems, it is not surprising that the first languages didn’t introduce unit-based types. Nonetheless, it is a complete mystery to me why modern languages don’t type this way.

## A case for donation splitting

TLDR: if welfare compounds then risk-aversion is good.

Within EA circles, the question of splitting donations pops up every once in a while. Should you donate all your money to the singular top-rated charity your singular top-rated cause area, or is there reason to split your donations between various different causes or interventions?

People other than me have written and talked about this under various headers, I’ll list a small subset. Reasons not to diversify (Giving What We Can)Reasons to diversify: the value of information, explore vs exploit (Amanda Askell @ 80k)Reasons both for and against: risk aversion, diminishing returns, EV maximization (Slate Star Codex). In-depth blog post with mahy arguments both for and against (EA forum). Not listed but probably talked about before: splitting your donations gives you extra practice at donating which might lead to you making better donation decisions in the future.

In this post I want to make an argument in favour of splitting donations based on compounding economic returns and measurement error. Specifically, compounding returns favour more consistent growth over a slightly higher but variable growth.

Let’s consider a 100-year time horizon. Suppose that there are 100 charities, $C_1,\dots,C_{100}$, whose effectiveness is heavily-tailed: donating \$1000 to charity $C_i$ allows them to produce $i*\1000$ in welfare after a year. Charity evaluator BestowCapably measures the effectiveness of every charity $C_i$ every year $j$ and finds an effectiveness of $i + s_{i,j}$, where the $s_{i,j}$ are independently normally $N(0, \sigma^2)$ distribution. Let’s assume BestowCapably’s measurement error $\sigma$ does not go down over time.

The way I think of these quantities is that effectiveness is a heavy-tailed distribution and that measurement error is multiplicative (instead of additive).

We assume all welfare gains are reinvested in charity the next year, so that the gains compound over years. The initial welfare is $1$. We consider three different donation strategies: donate everything to the single best rated charity, split the donation between the top three rated charities, or split the donation between the top ten rated charities. We plot the compounded welfare after 100 years versus $\sigma$ below.

In the above graph, we see that,for low measurement error, donation splitting is worse than donating everything to the best charity, but for high measurement error, the situation reverses and splitting donations wins out.

## Section of doubt

The code I’ve used (included below) to simulate the scenario has a couple researcher degrees of freedom. It is unclear whether measurement error should scale with charity effectiveness. I used Gaussian noise without any justification. My choice of range of $\sigma$ to plot was chosen to have a nice result. The range of charity effecicies has close to no justification. The same stable result can be gotten by donating everything to AMF and nothing to speculative cause areas. The splitting incentive I illustrated only holds at the margin, not for the average donation. Because $\sigma$ is fixed, the magnitude of the effect of donation splitting in this model depends heavily on the number of charities (less charities means greater effect).

Nonetheless, if you care about multi-year impacts, it might be wise to consider more than just short-term expected value. Risk-aversion translates to expected counterfactual impact when results compound.

## Appendix: Python code

import random
import matplotlib.pyplot as plt
import math

charitycount = 100
yearstocompound = 100

# The charities are {1,...,n}
# Charity i has effectiveness i
# Effectiveness measurement carries exp noise of size stddev
# Outputs list of (i, i + noise)
def measurecharities(n, stddev):
charities = []
for effectiveness in range(1,n+1):
charities.append((effectiveness,random.gauss(effectiveness,stddev)))
return charities

# Given list of tuples (x, y),
# calculates the average of x's for
# the k tuples with highest y value.
def avgtop(list, k):
sortedlist = sorted(list, key=lambda tup: tup[1], reverse=True)
sum = 0.0
for i in range(k):
sum += sortedlist[i][0]
return sum/k

# Split donations among k charities
for k in [1,3,10]:
x = []
y = []
# We plot the effect for different noise magnitudes
for stddev in range(1,251):
logwelfare = 0.0
for i in range(yearstocompound):
welfaregain = avgtop(measurecharities(charitycount, stddev), k)
logwelfare += math.log(welfaregain)
x.append(stddev)
y.append(max(1,logwelfare))
plt.plot(x, y,label=k)
plt.legend()
plt.xlabel('Error in measuring effectiveness')
plt.ylabel('Log(' + str(yearstocompound) + '-year compounded welfare gains)')
plt.title('Donating to top k out of ' + str(charitycount) + ' charities')
plt.show()