A few weeks ago, I had a conversation with Jeffrey Ladish and Eli Tyre, two AI safety researchers and activists who put out a call for open conversations with skepticism about “AI risk” arguments:

I’m looking for people to have 30-60 minute conversations with about AI. I’m especially interested in talking with people who have heard some of the AI risk arguments, and have used ChatGPT, but don’t see why or how AI could actually take over or seize control from humanity

My plan is to briefly walk through the reasons why I am concerned about near-term existential AI risk, with space for you to ask questions or state objections, and then mostly have an informal conversation about whatever seems most relevant and interesting to the two of us

This discussion was interesting, because I’ve had pretty minimal exposure to AI “existential risk” arguments. In general, I remain skeptical of existential risk arguments, but I’m interested in continuing to learn more.

A few of my personal take-aways from the conversation and from reflecting on the conversation afterward:

  • The format was great. Compared to other interview studies I’ve been involved in, Jeffrey and Eli were more interested in posing specific arguments and hearing me respond to them. They seemed to be most interested in figuring out which of the many premises needed to support existential risk claims seemed plausible vs implausible to people like me who haven’t thought deeply about them.
    • At the end of the call, I asked them for recommendations of learning resources (which I link below). My hope is that the data they collect from these conversations will inform more useful learning resources for people working adjacent to ML that are interested in learning more about AI safety.
  • Existential risk arguments rely fundamentaly on logical or philosophical arguments. There are relatively few empirical questions related to existential AI risk that can be answered in principle, and those that can be answered might only be easy to answer in hindsight.
    • In general, I’ve done almost no thinking about the relevant philosophical issues. For that reason, social consensus among the people I perceive to be experts (e.g. philosophers of AI, computer science researchers, etc.) is important to me.
    • For example, is it possible to build a silicon mind? I have no idea. As someone with very little exposure to ideas like this, I find it essentially impossible to imagine “consciousness, but not human-like”. But existential risk arguments seem to rely on non-human-like consciousness.
    • Similarly, I find “super intelligence” hard to reason about as a concept, and I was generally skeptical of claims about systems that meaningfully outperform existing groups of humans (e.g. teams, corporations).
    • Generally, I agreed with Jeffrey and Eli about their empirical description of the current world.
  • Existential risk from AI is fundamentally about making predictions about the future, and I believe that humans are very bad forecasters.
  • A lot of our conversation focused on group coordination.
    • Humans have gotten a lot better at group coordination, but we notably don’t seem to be improving our coordination exponentially.
    • Could AI systems improve themselves exponentially through better coordination? I’d expect them to experience significant coordination friction (in the same sorts of ways that humans do), but maybe AI systems have qualities that will enable them to avoid coordination problems? For example:
      • Access to your own and other’s memories in a direct way
      • Faster communication
      • Shared goals, or at least communicable goals
  • It seems like getting a good signal for self-improvement is hard.
    • A system with a reliable self-improvement signal and sufficient resources could probably improve itself.
  • It seems like there’s something distinct about the signal you get from “real life”, as opposed to doing a lot of computing on data you already have, but it’s hard to say what that is.
    • A Zoom call is a real reward signal! An AI system could certainly improve itself by talking with humans via Zoom, in the same way that humans improve ourselves by talking with humans via Zoom. But, it’s slow; intuitively, you’re constrained by communication speed.

Resources

Resources recommended by Jeffrey and Eli at the conclusion of the call:

Other resources I’ve stumbled across:

Notes on Richard Ngo’s AGI safety from first principles

I’m finding Ngo’s AGI safety from first principles to be a useful and even-handed introduction.

In the chapter on Superintelligence, Ngo makes several claims that I found surprising or non-obvious.

I think it’s difficult to deny that in principle it’s possible to build individual generalisation-based AGIs which are superintelligent, since human brains are constrained by many factors which will be much less limiting for AIs.

I don’t think it’s at all obvious that this is true. There may be many factors that constrain humans that won’t constrain hypothetical AGIs, but if those constraints aren’t the primary bottleneck to greater-than-human intelligence then AGIs may not be qualitatively different in their capabilities than humans. For example, I suspect I would be more capable if I could think 100x “faster” than I currently do, but it’s not at all clear to me how much more capable I would be.

There is little reason to believe that [humans] have reached the peak of [the ability to coordinate and benefit from cultural learning], or that AGIs couldn’t have a much larger advantage over a human than that human has over a chimp, in acquiring knowledge from other agents.

This seems like an empirical claim about group dynamics and hypothetical limits on coordination. Because I don’t know what the practical barriers are to improving human coordination, it’s not clear to me if silicon agents would or wouldn’t be substantially more effective at this.

For both of these claims, I would like to better understand the relevant human dynamics before assuming that AGIs can easily overcome those dynamics.