Gaussian tail bounds

812 words

One-dimensional tail bounds

The standard normal distribution N(0,1) has probability density function \frac{1}{\sqrt{2\pi}}e^{-x^2/2}. There is no way to integrate this function symbolically in a nice way, but we do at times want to (upper) bound expressions of the form \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t. How can we do this?

One way is to follow this approach. Since t\geq x everywhere, we can upper bound \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t \leq \frac{1}{\sqrt{2\pi}}\int_x^\infty \frac{t}{x} e^{-t^2/2} \mathrm{d}t = \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}.

There is another tail bound which is a bit weaker for large x, but I like the proof better. We’ll give a tail bound by looking at the moment-generating function \lambda \mapsto \mathbb{E}[e^{\lambda X}], where X \sim N(0,1) is our normally distributed random variable. We can explicitly calculate this expectation and find \mathbb{E}[e^{\lambda X}] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{\lambda x - x^2/2}\mathrm{d}x = \frac{1}{\sqrt{2\pi}}e^{\lambda^2/2}\int_{-\infty}^\infty e^{-(x-\lambda)^2/2}\mathrm{d}x. The last term is just the entire Gaussian integral shifted a bit and hence \mathbb{E}[e^{\lambda X}] = e^{\lambda^2/2} Now we use Chernoff’s bound (an easy corrollary of Markov’s inequality) to find \mathbb{P}[X \geq t] \leq \mathbb{E}[e^{\lambda X}]e^{-\lambda t}, which we can now minimize over the choice of \lambda, setting \lambda=t, and we conclude that \mathbb{P}[X \geq t] \leq e^{-t^2/2}.

Multi-variate Gaussians

Let X \in \mathbb{R}^d be N(0,I_d) normally distributed, i.e., X is a vector with iid Gaussian N(0,1) entries. What tail bounds do we get on \|X\|? We start off with Markov’s inequality again. \mathbb{P}[\|X\| > t] = \mathbb{P}[e^{\lambda\|X\|^2} > e^{\lambda t^2}] \leq \frac{\mathbb{E}[e^{\lambda\|X\|^2}]}{e^{\lambda t^2}}.

Deriving the moment generating function \lambda \mapsto \mathbb{E}[e^{\lambda\|X\|^2}] of X^2 is an elementary calculation. \int_{-\infty}^\infty e^{\lambda x^2} \cdot e^{-x^2/2} \mathrm{d}x = \int_{-\infty}^\infty e^{\frac{-x^2}{2(\sqrt{1-2/\lambda})^2}}\mathrm{d}x = \frac{\sqrt{2\pi}}{\sqrt{1-2\lambda}}.

The coordinates of X are iid, so \mathbb{E}[e^{\lambda\|X\|^2}] = \mathbb{E}[e^{\lambda X_1^2}]^d = (1-2\lambda)^{-d/2}. The minimizer is at \lambda=(1-1/t^2)/2, and we find, requiring t \geq 1 for the last inequality,\mathbb{P}[\|X\| > t] \leq e^{-d(t^2-2\log t - 1)/2} \leq e^{-d(t-1)^2}.

Operator norm of Gaussian matrices

The operator norm or spectral norm of a n \times n matrix M is defined as \|M\| := \max_{x \in \mathbb{R}^n} \frac{\|Mx\|}{\|x\|}.

Now if M were a matrix with every entry independently N(0,1), what would the largest singular value of this random Gaussian matrix be? I’ll give an easy tail bound based on a net argument.

An \eta-net, \eta > 0, on the sphere is a subset N \subset \mathbb{S}^{d-1} such that for every point x \in \mathbb{S}^{d-1} there is a net element n \in N such that \|x-n\| \leq \eta, but every two net elements are at distance at least \eta from each other. A greedy algorithm can construct an \eta-net, and any \eta-net has size at most (4/\eta)^d. 1See e.g., Jiří Matoušek, Lectures on Discrete Geometry (Springer, 2002), page 314. The proof is based on a simple packing argument where balls of radius \eta/2 around each net element have to fit disjointly inside the ball of radius 1+\eta/2 \leq 1 centered at the origin.

Now let N\subset \mathbb{S}^{d-1} be a 1/2-net. By the above, the size of the net is bounded by |N| \leq 8^d.

The function x \mapsto \|Mx\| is \|M\|-Lipschitz. Hence we can bound \|M\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|\cdot\|x-\omega\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|/2. So we have now proved that \|M\| \leq 2\max_{\omega\in N} \|M\omega\|.

Now, as M\omega is N(0,I_d) normally distributed for any \omega\in\mathbb{S}^{d-1}, we can use the union bound over all points of N and conclude that, for all t \geq 1, \mathbb{P}[\|M\| \geq 2t\sqrt{d}] \leq 8^d e^{-d(t-1)^2/2}.

Maximum of n Gaussians

The distribution of the maximum \mathbb{P}[\max_{i \leq n} X_i \geq t] of n independent identically distributed variables X_1,\ldots,X_n \sim N(0,1) is, up to a constant factor, tight with the union bound \mathbb{P}[\max_{i \leq n} X_i \geq t] \leq ne^{-t^2/2}.

From this, we can find that the expected maximum is \mathbb{E}[\max_{i \leq n} X_i] = O(\sqrt{\ln n}). We derive this by bounding the integral \mathbb{E}[\max_{i \leq n} X_i] = \int_0^\infty \mathbb{P}[\max_{i \leq n} X_i \geq t] {\rm d}t. Split the integral at t = \sqrt{\ln n}, bound the integrand in the first part by 1 (it is a probability) and in the second part by ne^{-t^2/2}.

Average width of the simplex

Let x_1,\dots,x_{d+1} \in \mathbb{R}^d be the vertices of a regular simplex such that \|x_i\| = 1 for all i \in [d+1]. If \omega \in \mathbb{S}^{d-1} is chosen uniformly at random, the expectation of the difference \max_{i,j\in[d+1]} |\omega^{\mathsf{T}}(x_i-x_j)| is called the average width of the simplex. We can bound this up to a constant factor using our knowledge of Gaussians. Let H_t := \{y\in\mathbb{R}^d : \omega^{\mathsf{T}}y = t\}. The d-2-dimensional volume of H_t\cap \mathbb{S}^{d-1} is (1-t^2)^{(d-1)/2} times the volume of \mathbb{S}^{d-2} by Pythatoras’ theorem. Recalling that (1+1/\lambda)^\lambda \approx e, you can prove that the distribution of \omega^{\mathsf{T}}x_i is approximately N(0,1/\sqrt{d-1}). The Gaussian tail bound now says that the average width of the simplex is O(\frac{\sqrt{\ln d}}{\sqrt d}).

4. Who is worried about AI Risk?

429 words

I am skeptical of AI Safety (AIS) as an effective cause area, at least in the way AIS is talked about by people in the effective altruism community. However, it is also the cause area that my skills and knowledge are the best fit for contributing, so it seems worthwhile for me to think my opposition to it through.

Previously: [1] [2] [3][latest].

There are many people talking about the risks of artificial intelligence. I want to roughly split them into three groups for now, because they worry about very different issues that tend to talk past each other, confusing outsiders.

The LessWrong-aligned view seems most popular in the EA community. Examplified by the paperclip maximizer argument, LW-aligned worriers are concerned that an Artifical General Intelligence (AGI) would accomplish their objective in unforeseen ways, and as a consequence should be treated like you should treat an evil genie, except it’d be worse because it would have less understanding of basic words than philosophers have. The principles that AI should satisfy are listed by the Future of Humanity Institute. [Though I suspect at least some of the signatories to have the FATML-aligned view in mind.] A popular book on this is Superintelligence by Nick Bostrom.

Fairness, Accountability and Transparency in Machine-Learning (FATML) is a subfield of machine learning, concerned with making algorithmic decision making fair, accountable and transparent. Exemplified by Amazon’s recent recruiting debacle, FATML-aligned worries are concerned that modern algorithmic decisionmaking will exacerbate existing social, economic and legal inequalities. The princples that AI should satisfy are listed by The Public Voice, and these Google ML guidelines fit as well. [Though I suspect at least some of the signatories to have the LW-aligned view in mind.] Popular books include Weapons of Math Destruction by Cathy O’Neil, Algorithms of Oppression by Safiya Noble and Automating Inequality by Virginia Eubanks.

Other AI-related worries commonly heard in the media, that I want to separate from the previous two categories because, compared to the above categories, these issues are more about politics and less of a technical problem. Worries include killer drones, people losing their jobs because AI replaced them, and who the self-driving car should run over given the choice.

In the next couple of posts on AI-related topics, I will focus on the first two categories. My aim is to use the FATML-aligned view to compare and contrast the LW-aligned view, hopefully gaining some insight in the process. The reason I separate the views this way, is because I agree with the FATML-aligned worries and disagree with the LW-aligned worries.

3. Finding meaning in a perfect life

249 words

[This badly written blog post has been superseded by a slightly better written forum post over on the EA forum.]

I am skeptical of AI Safety (AIS) as an effective cause area, at least in the way AIS is talked about by people in the effective altruism community. However, it is also the cause area that my skills and knowledge are the best fit for contributing, so it seems worthwhile for me to think my opposition to it through.

Previously: [1] [2][latest].

My background makes me prone to overrate how important AI Safety is.

My fields of expertise and enjoyment are mathematics and computer science. These skills are useful for the economy and in high demand. The general public is in awe of mathematics and thinks highly of anyone who can do it well. Computer science is the closest thing we have to literal magic.

Wealth, fun, respect, power. The only thing left to desire is cosmic significance, which is exactly the sales pitch of the astronomical waste argument. It would be nice if AI-related existential risk were real, for my labour to potentially make the difference between a meaningless lifeless universe or a universe filled with happyness. It would give objective significance to my life in a way that only religion would otherwise be able to.

This is fertile ground for motivated reasoning, so it is good to be skeptical of any impulse to think AIS is as good as it is claimed to be in cost-effectiveness estimates.

2. How do we talk about AI?

525 words

[This badly written blog post has been superseded by a slightly better written forum post over on the EA forum.]

I am skeptical of AI Safety (AIS) as an effective cause area, at least in the way AIS is talked about by people in the effective altruism community. However, it is also the cause area that my skills and knowledge are the best fit for contributing, so it seems worthwhile for me to think my opposition to it through.

Previously: [1][latest].

All sentences are wrong, but some are useful. I think that a certain emotional salience makes us talk about AI in a way that is more wrong than necessary.

A self-driving car and a pre-driven car are the same thing, but I can feel myself thinking about the two in completely different ways.

Self-driving cars are easy to imagine: they are autonomous and you can trust the car like you trust cab drivers; they can make mistakes but probably have good intent, when they encounters an unfamiliar situation they can think about the correct way to proceed, and if something goes wrong then the car is at fault.

A pre-driven car are hard to imagine: it has to have a bunch of rules coded into it by the manufacturer and you can trust the car like you trust a bridge; it does exactly what it was built to do, but if it was built without proper testing or calculations, things will at some point go wrong. When it does, the company and engineers are at fault.

You can make these substitutions on any sentence in which a computer is ascribed agency. In the best case, “The neural network learned to recognize objects in images” becomes “The fitted model classifies images in close correspondence with the human-given labels”. In reality, that description might be too generous.

It helps to keep in mind the human component. “The YouTube algorithm shows you exactly those videos that make you spend more time on the platform” is accurate in some sense, but it completely glances over the ways in which in the algorithm does not do that. When you listen to music using YouTube’s autoplay, it isn’t hard to notice that suggestions tend to point backwards in time compared to the upload date of the video you’re watching right now, and that, apart from preventing repeats, autoplay is pretty Markovian (that is mathspeak for the algorithm not doing anything clever based on your viewing history, just “this video is best followed by that video”). Both of those properties are clearly a result from the way in which YouTube’s engineers modelled the problem they were trying to solve, I would describe YouTube’s suggestion as “The YouTube autoplay algorithm was made to link you to videos that most people watched and liked after watching the current video”.

When you rewrite AI-related statements, they tend to become more wordy. That is exactly what you would expect, but does make it unwieldy to have accurate conversations. I leave the search for catchy-but-more-accurate buzzwords as an open problem. I am particularly interested in how to translate the term “artificial general intelligence” (AGI).

Two conflicting concepts

185 words

Sometimes you hear a word or concept that changes how you look at the world. For me, these include speciecism and epistemic injustice.

Speciecism is analogous to racism and sexism, but for species: treating another being differently because they are of another species. Speciecism is about intent; if you eat chickens because they are chickens and not humans, that is speciecist, but if you eat chickens because you concluded from observation that they are incapable of suffering, that is not speciecist.

Epistemic injustice is when someone is wronged in their capacity as a knower. If you unjustly limit somebody’s ability to access or express knowledge, like forbidding them from learning to read or speak, that is an epistemic injustice.

I am an outspoken anti-speciecist and I think we should do what we can to prevent epistemic injustice in all forms. But some animals have learned enough language to meaningfully communicate with humans. Does that mean I should find it reprehensible that there are no schools for animals? I think I should and I think I do, but I feel hesitant to firmly claim the position.

A case for donation splitting

703 words

TLDR: if welfare compounds then risk-aversion is good.

Within EA circles, the question of splitting donations pops up every once in a while. Should you donate all your money to the singular top-rated charity your singular top-rated cause area, or is there reason to split your donations between various different causes or interventions?

People other than me have written and talked about this under various headers, I’ll list a small subset. Reasons not to diversify (Giving What We Can)Reasons to diversify: the value of information, explore vs exploit (Amanda Askell @ 80k)Reasons both for and against: risk aversion, diminishing returns, EV maximization (Slate Star Codex). In-depth blog post with mahy arguments both for and against (EA forum). Not listed but probably talked about before: splitting your donations gives you extra practice at donating which might lead to you making better donation decisions in the future.

In this post I want to make an argument in favour of splitting donations based on compounding economic returns and measurement error. Specifically, compounding returns favour more consistent growth over a slightly higher but variable growth.

Let’s consider a 100-year time horizon. Suppose that there are 100 charities, C_1,\dots,C_{100}, whose effectiveness is heavily-tailed: donating $1000 to charity C_i allows them to produce i*\$1000 in welfare after a year. Charity evaluator BestowCapably measures the effectiveness of every charity C_i every year j and finds an effectiveness of i + s_{i,j}, where the s_{i,j} are independently normally N(0, \sigma^2) distribution. Let’s assume BestowCapably’s measurement error \sigma does not go down over time.

The way I think of these quantities is that effectiveness is a heavy-tailed distribution and that measurement error is multiplicative (instead of additive).

We assume all welfare gains are reinvested in charity the next year, so that the gains compound over years. The initial welfare is 1. We consider three different donation strategies: donate everything to the single best rated charity, split the donation between the top three rated charities, or split the donation between the top ten rated charities. We plot the compounded welfare after 100 years versus \sigma below.

In the above graph, we see that,for low measurement error, donation splitting is worse than donating everything to the best charity, but for high measurement error, the situation reverses and splitting donations wins out.

Section of doubt

The code I’ve used (included below) to simulate the scenario has a couple researcher degrees of freedom. It is unclear whether measurement error should scale with charity effectiveness. I used Gaussian noise without any justification. My choice of range of \sigma to plot was chosen to have a nice result. The range of charity effecicies has close to no justification. The same stable result can be gotten by donating everything to AMF and nothing to speculative cause areas. The splitting incentive I illustrated only holds at the margin, not for the average donation. Because \sigma is fixed, the magnitude of the effect of donation splitting in this model depends heavily on the number of charities (less charities means greater effect).

Nonetheless, if you care about multi-year impacts, it might be wise to consider more than just short-term expected value. Risk-aversion translates to expected counterfactual impact when results compound.

Appendix: Python code

import random
import matplotlib.pyplot as plt
import math

charitycount = 100
yearstocompound = 100

# The charities are {1,...,n}
# Charity i has effectiveness i
# Effectiveness measurement carries exp noise of size stddev
# Outputs list of (i, i + noise)
def measurecharities(n, stddev):
    charities = []
    for effectiveness in range(1,n+1):
        charities.append((effectiveness,random.gauss(effectiveness,stddev)))
    return charities

# Given list of tuples (x, y),
# calculates the average of x's for
# the k tuples with highest y value.
def avgtop(list, k):
    sortedlist = sorted(list, key=lambda tup: tup[1], reverse=True)
    sum = 0.0
    for i in range(k):
        sum += sortedlist[i][0]
    return sum/k

# Split donations among k charities
for k in [1,3,10]:
    x = []
    y = []
    # We plot the effect for different noise magnitudes
    for stddev in range(1,251):
        logwelfare = 0.0
        for i in range(yearstocompound):
            welfaregain = avgtop(measurecharities(charitycount, stddev), k)
            logwelfare += math.log(welfaregain)
        x.append(stddev)
        y.append(max(1,logwelfare))
    plt.plot(x, y,label=k)
plt.legend()
plt.xlabel('Error in measuring effectiveness')
plt.ylabel('Log(' + str(yearstocompound) + '-year compounded welfare gains)')
plt.title('Donating to top k out of ' + str(charitycount) + ' charities')
plt.show()

Best Things of 2018

703 words

Not (best (things of 2018)) but ((best things) of 2018), because recommendations get more interesting if they are rate-limited and less interesting if a recency constraint is imposed.

Best interactive web essay

By internet creators Vi Hart and Nicky Case; Parable of the polygons. Cute little triangles and squares get segregated in ways none of them ever intended against their best wishes.

Best portrait article

Portraying one of the most important trans people of the past few years, Vice Broadly’s piece on Caitlyn Jenner was a nice read.

Best economist’s story

On why setting maximum prices is bad. They Clapped by Michael Munger. Very salient, go read it.

Best academic talk

I see a lot of talks from computer science researchers, and CS people are surprisingly good at giving captivating talks. But, quoting Virginia Woolf,

[..] one must read [any book] as if it were the last volume in a fairly long series, continuing all those other books that I have been glancing at. For books continue each other, in spite of our habit of judging them separately.

Virginia Woolf, A Room of One’s Own, or page 52 in Penguin’s Vintage Mini “Liberty”

And so a talk must be considered in its social context. Based on this principle, the clear winner for this category is this keynote speech by James Mickens of Harvard University at USENIX Security 2018: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible? Mickens is a captivating orator, the talk is funny and informative and gives a critical view on an important issue of the present day.

Best internet rabbit-hole

An old one for nostalgia. How to spot photo manipulation. Body By Victoria. Do click the links to follow-up posts, and the rest of the website is worth checking out as well.

Best description of psychologists

This text fragment reflects every interaction I’ve had with psychologists anywhere, both my gatekeepers and psychologists I visited for other reasons.

My anorexic patients sometimes complain of being forced into this mold. They’ll try to go to therapy for their inability to eat a reasonable amount of food, and their therapist will want to spend the whole time talking about their body image issues. When they complain they don’t really have body image issues, they’ll get accused of repressing it. Eventually they’ll just say “Yeah, whatever, I secretly wanted to be a ballerina” in order to make the therapist shut up and get to the part where maybe treatment happens.

Scott Alexander, Del Giudice On The Self-Starvation Cycle

Best video essay

This is not really a contest, Contrapoints’ The Aesthetic is the most beautiful piece of film I’ve seen in years. It is an honest expression of feelings and internal dialogue and conflict that trans women experience. It touches on so many uncomfortable issues without having any single clear message. Contrapoints raises the video essay to form of art. There is so much going on so many levels and I can just keep on watching the thing over and over again. Highly recommended watching for both trans and cis people.

The creator got quite some social media backlash on the video. There is exactly one reaction video that I felt was worth watching. Nobody Wins: ContraPoints, The Aesthetic, and Negative Representation by let’s talk about stuff. [This text essay is also pretty good. How Contrapoints Misunderstands Gender.]

Best book

My choice of best book for 2018 is Aphro-ism by Aph Ko and Syl Ko. It is a blog-turned-book, with a number of brilliant essays on, among others, veganism and social justice. I cannot overstate how much I like this book. I learned a lot from reading this book, and not just about the book’s subject matter.

The writings of the Ko sisters are very far from every thought I’ve ever had. This fact is reflected in how much I learned from the book, as well as in how difficult it was to understand it. I’ve re-listened this book 5 times by now. The first time, I understood literally nothing. Each time after that I understood a bit more, and I feel I understand most parts now. Not yet at the level of being to explain the ideas, but at the level of seeing good use value in them.

On the value of anecdotes

331 words

What is better, if everyone is wrong about the same 2% of facts, or if everyone is wrong about a different 4% of facts? Depending on how you answer this question, you should act in very different ways. I’ll take vegan advocacy as an example, but the question can be applies more generally.

If you’re in the first group, you would prefer a scientific data-driven approach. You would experiment with many different approaches to advocacy, analyse the data to find the single best way of doing outreach, and make everyone in vegan activism aware that this is the best way to do it.

If you prefer the 4% case, a local algorithm is the way to go. Think about what drove you to become vegan, and continue this strategy. If you were shocked into becoming vegan by a Cube of Truth, you should be participating in Cubes of Truth. If you became vegan after your friendly vegan neighbour exemplified that veganism is a totally normal lifestyle and they allowed you to pick their brain about why they became vegan themselves, then you should become the friendly vegan acquaintance of the people you know yourself.

One interesting question if you enact the local algorithm, is how to weigh anecdotes. The local algorithm described above only considers your data; one alternative algorithm is to use the approach that was effective on the majority of your direct friends that became vegan before you. Another algorithm looks at all your the friends of your friends, or everyone within distance 3 in the friendship-graph. If everyone is connected by everyone by a friendship-path of length 6, then the distance 6 algorithm is exactly the data-driven approach from the second paragraph.

Evolutionary theory suggests that the small-distance algorithms are effective, for the best outreach strategy will eventually out-compete all others. But for the distance 0 or 1 cases, you’re basically working on anecdotal evidence. I’m not sure anymore what the correct value is to place on anecdotes.

The life expectancy of trans women is not 35 years

904 wordsEvery once in a while I read an article or comment like this:

The average lifespan of trans women in the US is 35 years. Source

This number is wrong in so many ways, but many people seem to fall for it. The most viral variant consists of screencaps of this tumblr post.

User fuckallies on tumblr: 'On average, you have a 1 in 18,989 chance of being murdered A trans person has a 1 in 12 chance of being murdered The average life span of a cis person is about 75-90 The average life expectancy of a trans person is 23-30 years old 75% of people killed in anti LGBT hate crimes are poc Think about this the next time you go crying over “cisphobia” and “reverse racism”'

Every murder is a tragedy, and it is particularly sad when someone gets murdered just for being trans. That is all the more reason to get our numbers right: exaggeration doesn’t help the cause. The lifespan statistic above is absurd. There is no way that they can be true, and it is telling of our maths and science education that people believe them.

Let’s do a basic sanity check on the 23-30 years figure. If 50% of trans folks would reach the age of 60, the other half would have to be dead by age 10. But measuring “age of death of trans people” is really tricky, because a person will only enter the trans population after transition: if they die before that, nobody knows they’re trans. So realistically, all trans people that die are at least 16 years old.  If the trans people who get murdered do so at the earliest possible age (16), how many trans people can maximally reach the age of 75? Well, we solve for the fraction x:75x + (1-x)*16=35.

Does one in 3 trans people get killed over their lives? Of course not. The only place the 23-30 year figure could realistically come from is if the number described life expectancy conditional on getting killed or some other low-probability conditional.

Media reporting

Most journalists aren’t good with numbers. This effects their propensity to cite the low life expectancy for trans folks. The Guardian:

a 2014 report concluded that the average life expectancy of trans women in the Americas is between 30 and 35.

NPR:

“Transgender people have an average life expectancy of about 30 to 32 years,” Balestra says.

Huffington Post:

The statistic said the average life expectancy for a trans woman of color is 35.

The difference is striking. “transgender people”, “trans women in the US”, “trans women in the Americas”, “trans women of colour”. They’re all misreporting the same in number in different messed up ways.

The first popular English medium to cover the number, and the source that most chains of web pages end up on, is Washington Blade, which reports one realistic number and also the unrealistic one.

The commission indicates 80 percent of trans murder victims in the Americas during the 15-month period were 35 years old or younger. Its report further concludes the average life expectancy of trans people in the Western Hemisphere is between 30-35 years.

Washington Blade cites a report by the Inter-American Commission on Human Rights of the Organisation of American States (press release, report). The report mentions this

According to the data collected in the Registry of Violence, eighty percent of trans persons killed were 35 years of age or younger.

The data (xlsx, Spanish) contains 770 reports of violence against LGBT people from a lot of countries in the Americas, all incidents happened in the 15 months between January 1st 2013 and March 31st 2014. Not all countries are in the dataset and no skin colours of the victims are listed, but are. Some victims are deadnamed, others appropriately named. Searching the names online, there are indeed a lot of trans women of colour among the listed US victims.

The data contain 282 murders on trans people, of which 212 have an age of death attached. The average age of death among those 212 is 28.7. Among 14 murder cases in the US, all trans women, the average age was 29.8. Filtering the America-wide data for listed ages of murder of 35 and under, we find 168 cases, and 168 out of 212 is 80%. This is the source of the 80% vs 35 years claim, but keep in mind that this is as a fraction of cases that have ages attached to them.

Where does the life expectancy of 30-35 years come from? The IACHR report just says this:

In terms of the age of the victims, the IACHR notes that while it seems gay men of all ages are targeted, in the case of trans women, it is mostly younger trans women who are victims of violence. In this regard, the IACHR has received information that the life expectancy of trans women in the Americas is between 30 and 35 years of age.

No source is listed. I’m calling bullshit. There is no way that getting murdered reduces life expectancy by at most 5 years.

Section of doubt

The 1 in 12 murder rate up top is interesting because it cannot be refuted in a conservative street-fighting calculation.  The US has 325 million citizens and a life expectancy of 79 years, so every year roughly 4.1 million US citizens die. If 1 in 40000 people is trans, then you’d expect 100 US trans people to die each year. 2018’s TDOR has 23 murder cases from the US listed, so we’d estimate that trans people have a 1 in 4 probability of death by murder, not accounting for skin color.

I don’t trust the 1:40000 statistic within an order of magnitude, but it is an often cited one. So while I don’t take the 1:4 number to be remotely close to the truth, it is understandable if people believe the 1 in 12 figure.