Mutual sabotage of AI probably won’t work

Apr 2

AI deterrence isn’t like nuclear deterrence

8 Comments

> AI development doesn't have clear red lines

In the beginning of the "How to Maintain a MAIM Regime" we discuss how states will want to start articulating what sorts of AI projects they would find to pose imminent destabilizing risks if pursued.

The AI _application_ of a fleet of thousands of AIs doing fully automated AI research (an intelligence recursion) is one such red line. A recursion would likely take months and be discernible and disruptable through sabotage.

Red lines can evolve over time and be supplemented with communication between states through Track IIs, Track Is, CBMs, and so on to limit misunderstandings and communicate which risk sources that states find most concerning.

Expand full comment

Reply (1)

Oscar Delaney

Apr 2

Thanks Dan, I agree it is possible to make progress on defining red lines, but we may still disagree about how much this will happen by default, or be easy.

The intelligence recursion doesn't seem like a clear red line to me, more like a slippery slope (depending on how sudden takeoff is of course - hard to predict). My expectation is that first we will get models that automate a decent chunk of AI R&D but not all, and so provide a ~2x speedup on the pace of AI progress, and then we will get 3x accelerator models, and 5x and 10x and so forth. Plausibly the line we want to draw is the 'fully automated' point you mention, but I think things might be quite far gone already by then, with humans just providing occasional high-level research taste comments but mostly-automated AI researchers making very fast progress, so this seems pretty late in the game to start MAIMing.

Expand full comment

Adam Khoja

Apr 2

Appreciate seeing thoughtful rebuttals of the paper! This post points to several real phenomena that make MAD a much more stable mutual vulnerability than MAIM, but arguably overstates the challenges for MAIM to work.

On "ASI is not necessarily world domination," I don't think this disagrees with the paper. It is precisely the AI capabilities gap between nations, rather than the absolute capabilities of any one nation, that determines whether a state's security is gravely imperiled by rival AI. If the capabilities gap between the US and China would remain small with high confidence through the entire process of development, there is less of an incentive for either party to maim. The complicating factor is that if the rate of development accelerates, perhaps during an intelligence recursion, a small time gap in AI development may still translate to a large relative capabilities gap, which probably would be actionable. This is what makes attempting a recursion a "bid for dominance" in the parlance of the paper in a way that other types of development might not be. Also, in this framing, stealing AI weights is arguably a type of maiming attack: it reduces the relative capabilities gap between states.

Is MAIM descriptive or normative? This could have been clearer in the paper. My view is that MAIM descriptively points to "fortunate" features of AI development and geopolitics which make deterrence attractive: states are fully militarily capable of surveilling and crippling rival AI projects requiring vast centralization of compute, and they are incentivized to do so in the face of a large rival capabilities gap or a high perceived loss of control risk from a rival AI project. All current large-scale development is credibly vulnerable to severe disruption by states as it approaches superintelligence.

The normative claim is that states should work to preserve the mutual vulnerability of AI projects even as the "facts on the ground" of AI development change: for example, AI development which requires less or less centralized compute, or heavily securitized projects that are harder to surveil or steal weights from, or hardened AI infrastructure. States that prefer the stability of a mutual vulnerability might agree to centralize their compute anyways; implement various transparency measures on the amount of compute they have, where it is, and what it's doing; not harden their AI infrastructure in various ways; etc. Some of these conditions probably can be imposed unilaterally, i.e. a nation can threaten to maim a project attempting to build a large underground datacenter. But others need mutual buy-in from relevant parties.

I agree that a state which wants to subvert MAIM would have several attractive ways to do so: threaten all-out retaliation to maiming attacks (I have been calling this "effective hardening" and think it's a plausible though destabilizing option), hardening and dispersing compute as much as possible, concealing their activities and progress to reduce the confidence attacks will succeed, etc. If we reach a point where AI is sufficiently geopolitically salient, I can totally imagine this being seen as an act of war, though I agree that this would be a reversal from the status quo.

I was a bit confused by some of the points on the MAIM analogy to MAD--"Offensive AI use could be difficult to attribute" and "Ability to retaliate is unclear"--as they seem to imply that MAIM means "If your AI attacks me, my AI attacks you" as opposed to "Rival AI development projects are mutually vulnerable to sabotage." MAIM is not about attacks that AIs perform. Kinetic strikes on an AI project are attributable, and I don't see why sabotaging a rival AI project would in general prevent them from sabotaging yours (separate from the valid question of whether a state can be confident that maiming attacks will work).

Otherwise, I broadly agree that limited visibility of rival activities and capabilities, and uncertainty about the reliability of a maiming attack make MAIM less stable than MAD. They don't seem like issues in the current regime of terrible security and few frontier datacenters, but they eventually will need to be actively addressed.

Expand full comment

Reply (1)

Oscar Delaney

Apr 3

Thanks Adam!

I agree with your points re AI and world domination - the capabilities gap does seem to be what matters most and could grow a lot during an intelligence explosion. In fact we just published a short reading list on AI and a decisive strategic advantage: https://humangeneralintelligence.substack.com/p/ai-the-spectre-of-decisive-advantage

Re descriptive and normative claims, that makes sense, but is notably the opposite prescription of many other frameworks. E.g. most people think improving cybersecurity is net positive and very important. I'm unsure on this point, but this is a good reminder cybersecurity isn't all good if it leads to countries feeling the need to escalate to kinetic attacks sooner.

Ah good point, yes I think those offensive AI use points we made are actually less relevant - of course it is still interesting that AI-mediated cyberattacks (like other cyberattacks, but perhaps more so) may be difficult to attribute, but true you might want to start MAIMing before your adversary has actually launched any attacks, just as their AI project seems to be approaching take-off. Also, I suppose in many worlds even if you can't directly attribute attacks, there might just be one other party in the world sophisticated enough to have done the cyberattack against you, so it may not be hard to guess.

Overall, our views seem closer together (though probably not identical) than I initially thought, actually :)

Expand full comment

Adam Karvonen

Apr 2

MAIM seems like the plausible default path we will take, assuming that countries become AGI-pilled. Right now we are still pretty early, so early SOTA models can't even play Pokemon well. As AI capabilities advance and as people begin to internalize the implications, countries may take AI more seriously.

Picture the scenario a few years into the future where models have became more capable and China gains a 3-9 month lead compared to the US, maybe due to something like having significantly more energy. Trump is in office, and his advisors / Musk / others are warning him that this could be game over. What is his response?

Something along the lines of "Turn off your data centers / give us transparency, or I'm going to take extreme measures". This could involve concrete threats, like targeting data centers with hypersonic missiles. It would be surprising if he just rolled over and let China get ASI first. This behavior is basically MAIM.

Now, why would China do anything different? Experts like Dominic Cummings think "China will take extreme measures and extreme risks to stop that from happening". Source: https://www.youtube.com/watch?v=EoG5EammWI4&t=1408s

The lack of a clear red line is not a dealbreaker. The Cuban Missile Crisis didn't happen because a red line like Russia launching nukes, it happened because a completely arbitrary thing (missiles in Cuba) made the US uncomfortable. Similarly, all that needs to happen for MAIM to work is that countries need to state "If I become uncomfortable, I will take extreme measures. Here are the things that would make me uncomfortable. If you don't e.g. take these transparency measures within x months, I will become uncomfortable".

Expand full comment

Reply (1)

Oscar Delaney

Apr 3

Thanks, that's a good point! In which case the psychology of key individuals potentially matters a lot more - how ASI-pilled is Xi (or will be in a few years), how much is he willing to risk everything to carry out MAIM, and at what threshold, etc. In a sense the scenario you outline is even scarier, as war is more likely without clear Schelling points to stop escalating at.

Expand full comment

Reply (1)

Adam Karvonen

Apr 3

I agree. IMO, if you think MAIM is a likely default path, then it could be advantageous to try get everyone thinking in terms of MAIM / publishing escalation ladders / thresholds / etc as early as possible. With time to think and negotiate, we may arrive at clear Schelling points.

Expand full comment

Reply (1)

Peter Wildeford

Apr 3

Though that also makes MAIM a self-fulfilling prophecy

Expand full comment

The Power Law

Mutual sabotage of AI probably won’t work