Evaluating the AI Safety cause area: a two-year plan

My pretending to have readers makes this blog a self-commitment scheme: if I write that I will do something, the imagined possibility of social shaming will make me more committed to actually doing the thing. That is why I’ll line out my plans for evaluating whether I believe the AI Safety cause area as perceived by EA’s is any good.

I used to think the arguments were pretty convincing, and I liked how it made my skills in math and CS super relevant for a morally important thing. But then I listened to Bostrom’s Superintelligence: Paths, Dangers, Strategies and found the book’s arguments thoroughly lacking. There is an asymmetry of passion in the AI risk circles where the believers are hyping the cause area while the non-believers don’t interact with the ideas at all.

The following step plan is based on the assumption that the LW-aligned view of most EA’s is misguided, but that there is important work to be done to make sure that what we are currently calling AI is used properly. Every step in the plan is meant to defend part of my beliefs, and I trust myself to notice when my argumentation is shaky. If my view changes, I should feel free to change the plan to look into my new views instead.

After completing a step, I will write a blog post describing the outcome and related thoughts. I will try to complete at least one step every two months so that the plan is done before you re-enter the job market after the end of your PhD. I don’t have to do the steps in the listed order.

  1. Make a plan listing the steps to think through the AIS cause. Completed.
  2. Uneducated guessing: directly oppose the astronomical waste argument. See if you can collect the necessary entropy to argue that, even if astronomical waste would be astronomically bad, supporting the AI Safety cause is still bad in expectation. Read at least what Bostrom, Beckstead and FRI have written on the topic, maybe more.
  3. Reread Superintelligence. Try to list the major points in the arguments and list objections. Does it actually argue for a Yudkowsian view, or is everyone misinterpreting it and does it secretly argue for a more mainstream view?
  4. Read various online sources to get more of an idea of what the mainstream view is among AIS EA’s. See if you can list research agendas on AIS different from MIRI’s.
  5. Argue that MIRI is not an effective charity regardless of the status of AIS as a field. I am not sure I truly believe this, but they seem so deeply incongruent with the standard academic practice that I should spend some time thinking about them. I kind of expect that this post will feel like punching down.
  6. I think Paul Christiano is a smart and serious person. He co-authored one of the best papers in your field of the past decade so he is not a crank. Read some of his writing on AIS to see if it holds up to scrutiny.
  7. Argue that average utilitarianism is superior to total utilitarianism, and astronomical waste cannot exist.
  8. Argue in more familiar terms why a paperclip maximizer wouldn’t act as some people fear they would.
  9. Argue that the capitalist’s alignment problem, insofar as it is meaningful and solvable, will be solved by the market.
  10. The social democrat’s alignment problem is a meaningful concept.
  11. Educated guessing: redo the entropy gathering from earlier, but now while having more knowledge.
  12. Cast mainstream “AI” related research in TCS in terms of the social democrat’s alignment problem. See what is out there at conferences like STOC/FOCS, COLT and NIPS. Check O’Neil’s Weapons of Math Destruction, Dwork et al’s line of work on algorithmic fairness, the work on learning non-discriminatory predictors, etc.

Leave a Reply