6. Astronomical waste, astronomical schmaste

2880 words

[Part of this badly written blog post has been superseded by a slightly better written forum post over on the EA forum. I might clean up the other parts in the future as well, and if so I’ll publish them at the EA forum as well.]

Previously: [1] [2] [3] [4] [5][latest].

Epistemic status: there is nothing wrong with writing your bottom line first. The purpose of this article is to get my initial thoughts on AI risk down before I start reading more about the topic, because I fear that I might unknowingly grant AI risk proponents that the implicit assumptions they’re making are true. As I procrastinated a lot on writing this post, there have been an number of articles put out that I did not read. I do not intend this document to be a conclusive argument against ai risk so much as an attempt to justify why it might be reasonable to think ai risk is not real.

Is this text too long? Click here for the summary of the argument.

In this post, I want to tackle the astronomical waste argument as used to justify AI-related existential risk prevention as an EA cause area. I will first describe the argument that people make. After that, I will discuss a number of meta-heuristics to be skeptical of it. Lastly, I want to take the astronomical waste argument face-on and describe why it is so absurdly unlikely for AI risk to be simultaneously real and preventable that the expected value of working on AIS is still not very good.

Astronomical waste

The astronomical waste argument as most people tell it basically goes like this: the potential good that could be gotten if happy beings colonized the entire universe would be huge, so even if there is a tiny risk of space-colonization not happening, that costs a lot of value in expectation. Moreover, if we can decrease the risk by just a tiny bit, the expected utility generated is still big, so it might be a very cost-effective way to do good.

As many wise people have said before me, “Shut up and calculate.” I will be giving rough estimates without researching them a great lot, because these quantities are not that well-known to humanity either. For the duration of this post, I will be a speciesist and all-around awful person because that simplifies the estimates. Bostrom roughly estimates that colonizing the Virgo supercluster would yield 10^{38} human lives per century. The Virgo SC is one of about 10 million superclusters in the observable universe and we have roughly 10^{9} centuries left before entropy runs out, making a total of roughly 2^{180} potential human lives left in the universe.

I will try to argue that donating \$5000\approx\$2^{13} to an AI risk charity today will counterfactually produce less than one life saved in expectation. To make that happen, we collect 180 bits of unlikeliness for the hypothesis that donating that sum of money to AI Safety organizations saves a lives.

You need to collect less bits if your counterfactual cause area is more cost-effective than malaria prevention. Possibly \log_2(5000/0.20) \approx 14 bits fewer with a charity like ALLFED.

On meta-uncertainty

Some of my LessWrong-reading friends would argue that it is impossible to have credence 2^{-200} in anything because my own thinking is fallible and I’ll make mistakes in my reasoning with probability much higher than that. I reject that assertion: if I flip 200 coins then my expected credence for most series of outcomes should inevitably be close to 2^{-200}, because all 2^{200} events are mutually exclusive and their probabilities must sum up to at most 1.

Discounting the future (30 bits)

Inhabiting the observable universe might take a really long, and in all this time there is some probability of going extinct for reasons other than AI risk. Hence we should discount the total spoils of the universe by a decent fraction. 30 bits. More importantly, if the Alignment Problem were to be solved, you’d still need to be able to force everyone to implement in the solution to it.

Independent AGI developers would need to be monitored and forced to comply with the new AGI regulations. This is hard to do without a totalitarian surveillance state, and such governance structures are bad to live under. 15 bits.

And then there are adversaries, negative utilitarians, who will actively try to build unsafe AGI to destroy the universe. They will keep trying for the rest of human existence. Preventing this for all time seems unlikely without going into real Orwell-level surveillance. 15 bits.

Biases (20 bits)

I expect many EA’s to be wrong in their utility calculation, so I think I should propose mechanisms that cause so many EA’s to be wrong. Two such mechanisms are described in previous entries in this series [2] (9 bits) [3] (1 bits) and I want to describe a third one here.

When we describe how much utility could fit in the universe, our reference class for numbers is “how many X fits in the universe”, where X ranges over things like {atoms, stars, planets}. These numbers are huge, typically expressed as 10^n for n \in \mathbb{N}.

When we describe how likely certain events are, the tempting reference class is “statements of probability”, typically expressed as ab.cdefghij... \%. Writing things this way, it seems absurd to have your number start with more than 10 zeros.

The combination of these vastly different scales together with anchoring being a thing, makes that we should expect people to over-estimate the probability of unlikely effects and hence the expected utility of prevention measures.

I expect myself to be subject to these biases still, so I think it is appropriate to count a number of bits to counteract this bias. 20 bits.

Counterfactual actions (-1 bit)

Nothing is effective in and of itself, effectiveness is relative to a counterfactual action. For this blog post, the counterfactuals will be working on algorithmic fairness and/or digital rights campaigning/legislation, and mainstream machine learning research and engineering. -1 bit.

When is AI risky? (tl;dr)

This is a rough sketch of my argument. AI safety can only be an effective cause area if

  1. The future of the non-extinct universe would be good.
  2. The probability of an AI-related extinction event is big.
  3. It is possible to find ways to decrease that probability.
  4. It is feasible to impose those risk mitigation measures everywhere.
  5. The AI risk problem won’t be solved by regular commercial and/or academic AI research anyway.
  6. A single AI-related extinction event could affect any lifeform in the universe ever.
  7. Without AI first causing a relatively minor (at most country-level) accident first.
  8. Presently possible AI safety research should be an effective way of decreasing that probability.

I upper bounded the quantity in 1 by 2^{200} good lifes. Properties 2 and 3 are necessary for AI Safety work to be useful. Property 5 is necessary for AI safety work to have meaningful counterfactual impact. Property 6 is necessary because otherwise other happy life forms might fill the universe instead, and the stakes here on earth are nowhere near 2^{200}. If property 7 does not hold, it might mean that people will abandon the AI project, and it would be too easy to debug risky AI’s. Property 8 is in contrast to AI safety work only really be possible after major progress from now has been made in AI capabilities research, and is hence a statement about the present day.

The basic premise of the argument is that there is an inherent tension between properties 2 up to 6 being true at once. AI risk should be big enough for properties 2 and 6 to hold, but small enough for 3 and 5 to hold. I think that this is a pretty narrow window to hit, and which would mean that AI safety is very unlikely to be an effective cause area, or at least it is not so for its potential of saving the universe from becoming paperclips. I am also highly skeptical of both 7 and 8, even assuming that 2 up to 6 hold.

AI is fake (8 bits)

I think it is likely that we won’t be making a what we now think of as “artificial intelligence”, because current conceptions of AI are inherently mystical. Future humans might one day make something that present-day humans would recognize as AI, but the future humans won’t think of it like that. They won’t have made computers think, they would have demystified thinking to the point where they understand what it is. They won’t mystify computers, they will demystify humans. Note that this is a belief about the state of the world, while [2] is about how we think about the world. Hence, I think both deserve to earn bits separately. 5 bits.

I am not sure that intelligence is a meaningful concept outside principal component analysis. PCA is a statistical technique that gives a largest component of variation in a population independently of whether that axis of variation has an underlying cause. In particular, that might mean that superhuman intelligence cannot exist. That does not preclude thinking at superhuman speeds from existing but would still impy serious bounds on how intelligent an AI can be. 1 bit.

No matter the above, all reasonably possible computation is restricted to polynomial-time solvable problems, fixed-parameter tractable problems and whatever magic modern ILP-, MINLP-, TSP- and SAT-solvers use. This gives real upper bounds on what even the most perfect imaginable AI could do. The strength of AI would lie in enabling fast and flexible communication and automation, not in solving hard computational problems. I hereby accuse many AI-enthousiasts of forgetting this fact, and will penalize their AI-risk fantasies for it. 2 bits.

AI x-risk is fake (31 bits)

The risks of using optimization algorithms are well-documented and practitioners have a lot of experience in how to handle such software reponsibly. This practical experience literally dates back to the invention of optimization in what is by far my favourite anecdote I’ve ever heard. Optimization practitioners are more responbile than you’d think, and with modern considerations of fairness and adversarial input they’ll only get more responsible over time. If there are things that must be paid attention to for algorithms to give good outcomes, practitioners will know about them. 3 bits.

People have been using computers to run ever more elaborate optimization algorithms pretty much since the introduction of the computer. ILP-solvers might be among the most sophisticated pieces of software in existence. And they don’t have any problems with reward hacking. Hence, reward hacking is probably only a fringe concern. 3 bits.

Debugging is a long and arduous process, both for developing software and for designing the input for the software (both the testing input and the real-world inputs). That means that the software will be run on many different inputs and computers before going in production, each an independent trial. So, if software has a tendency to give catastrophically wrong answers, it will probably already do so in an early stage of development. Such bugs probably won’t survive into production, so any accidents are purely virtual or at most on small scales. 5 bits.

Even if AI would go wrong in a bad way, it has to go really really wrong for it to be an existential thread. Like, one thing that is not ab existential thread is if an AI decided to release poison gas from every possible place in the US. That might kill everyone there, but even the poison gas factories could run indefinitely, the rest of the world could just nuke all of North America long before the whole global atmosphere is poisonous. 10 bits.

Moreover, for the cosmic endowment to be at risk, an AI catastrophy should impact every lifeform that would ever come to exist in the lightcone. That is a lot of ground to cover in a lot of detail. 10 bits.

AI x-risk is inevitable (28 bits)

Okay, let’s condition on all the above things going wrong anyway. Is AI-induced x-risk inevitable in such a world? Probably.

  • There should be a way of preventing the catastrophies. 5 bits.
  • Humans should be able to discover the necessary knowledge. 3 bits.
  • These countermeasures have to be universally implemented. 10 bits.
  • Even against bad actors and anti-natalist terrorists. 10 bits.

AI becomes safe anyway (15 bits)

Let’s split up the AI safety problem into two distinct subproblems. I don’t know the division in enough detail to give a definition, so I’ll describe them by association. The two categories roughly map onto the distinction from [4], and also roughly onto what LW-sphere folks call the control problem and the alignment problem.

Capitalist’s AI problemSocial democrat’s AI problem
x/s-riskCyberpunk dystopia risk
Must be solved to
make money using AI
Must be solved to have
algorithms produce
social good
Making AI optimalMaking algorithms fair
Solving required for
furthering a single
entity’s values
Solving required for
furthering sentient
beings’ collective
values.
Only real if certain
implausible
assumptions are true
Only real if hedonistic
utilitarianism is false,
or if bad actors hate
hedonistic utility.
Prevent the light cone
from becoming paperclips
Fully Automated Luxury
Gay Space Communism
Specific to AGIApplies to all algorithms
Fear of SkynetFear of Moloch
Beating back unknown
invaders from mindspace
Beating back unthinkingly
optimistic programmers
Have AI do what we wantKnow what we want
algorithms to do
What AIS-focussed EAs
care about
What the rest of the
world cares about

I’m calling 20 bits on the capitalist’s problem getting solved by capitalists, and 15 bits on the social democrat’s problem getting solved by the rest of humanity. We’re interested in the minimum of the two. 15 bits.

Working on AIS right now is ineffective (15 bits)

There are two separate ways of being inefficient to account for. AIS research might be ineffective right now no matter what because we lack the knowledge to do useful research, or AIS work might in general be less effective than work on for example FAT algorithms.

The first idea is justified from the viewpoint that making AI will mostly involve demystifying the nature of intelligence, versus obtaining the mystical skill of producing intelligence. Moreover it is reasonable to think given that current algorithms are not intelligent. 5 bits. ((Note that this argument is different from the previous one under the “AI is fake” heading. The previous argument is about the nature of intelligence and whether it permits AI risk existing versus not existing, this argument is about our capability to resolve AI risk now versus later.))

The second idea concerns whether AI safety will be mostly a policy issue or a research issue. If it is mostly policy with just a bit of technical research, it will be more effective to practice getting algorithms regulated in the first place. We can gain practice, knowledge and reputation for example by working on FATML, and I think it likely that this is a better approach at the current moment in time. 5 bits.

Then the last concern is that AIS research is just AI capabilities research by another name. It might not be exactly true, but the fit is close. 5 bits.

Research is expensive (13 bits)

Let’s get a sense of scale here. You might be familiar with these illustrations. If not, check them out. That tiny bit is the contribution of 4 year researcher-years. One researcher-year costs at least $50,000.

The list of projects getting an ERC Starting Grant of 1.5 million euros. Compared to ambitious projects like “make AI safe”, the ERC recipient’s ambitions are tiny and highly specialised. What’s more, these are grant applications, so they are necessarily an excaggeration of what will actually happen with the money.

It is not a stretch to estimate that it would cost at least $50 million to make AI safe (conditional on all of the above being such that AIS work is necessary). So a donation of $5000 would be at most 0.0001 of the budget. 13 bits.

Hard to estimate issues (10 bits)

I’ll aggregate these because I don’t trust myself to put individual values on each of them.

  • Will the universe really get filled by conscious beings?
  • Will they be happy?
  • Is it better to lead a happy life than to not exist in the first place?
  • Is there a moral difference between there being two identical happy universes A and B versus them being identical right up to the point where A’s contents get turned to paperclips but B continues to be happy? And how does anthropic bias factor in to this?
  • Has any being ever had a net-positive life?

Sub-1-bit issues (5 bits)

I listed all objections where I was at least 50% confident in them being obstacles. But there are probably quite a number of potential issues that I haven’t thought of because I don’t expect them to be issues with enough probability. I estimate their collective impact to count for something. 5 bits.

Conclusion

It turns out I only managed to collect 174 bits, not the 180 bits I aimed for. I see this as weak evidence for AIS being better than malaria prevention but not better than something like ALLFED. Of course, we should keep in mind that all the numbers are made up.

Maybe you disagree with how many bits I handed out in various places, maybe you think I double-counted some bits, or maybe you think that counting bits is inherently fraught and inconsistent. I’d love to hear your thoughts via email at beth@bethzero.com, via Reddit at u/beth-zerowidthspace or at the EA forum at beth​.

Post-scripts

  1. Rereading this post, I am not super happy with it. I no longer agree with all the numbers I put here. I don’t think I broke up the different parts of the argument in the most intuitive way. And I think that naively countering the big number of potential utility with a really small probability is not the best approach altogether.
    I do like the idea of how I structured this, with the bit counts and stuff, but I am not sure that it makes sense in practice.

    I might write a better long-form refutation of the AI Safety cause at some point in the future, but for now that plan is on indefinite hiatus.

    If you want to commission me to do this, send me a message through email or Reddit.

  2. I updated the bit on discounting the future. I don’t know why I wrote what I did back then, because I think the new text was my reason for counting 30 bits there in the first place.

Leave a Reply