Discussion about this post

User's avatar
Dan Hendrycks's avatar

> AI development doesn't have clear red lines

In the beginning of the "How to Maintain a MAIM Regime" we discuss how states will want to start articulating what sorts of AI projects they would find to pose imminent destabilizing risks if pursued.

The AI _application_ of a fleet of thousands of AIs doing fully automated AI research (an intelligence recursion) is one such red line. A recursion would likely take months and be discernible and disruptable through sabotage.

Red lines can evolve over time and be supplemented with communication between states through Track IIs, Track Is, CBMs, and so on to limit misunderstandings and communicate which risk sources that states find most concerning.

Expand full comment
Adam Khoja's avatar

Appreciate seeing thoughtful rebuttals of the paper! This post points to several real phenomena that make MAD a much more stable mutual vulnerability than MAIM, but arguably overstates the challenges for MAIM to work.

On "ASI is not necessarily world domination," I don't think this disagrees with the paper. It is precisely the AI capabilities gap between nations, rather than the absolute capabilities of any one nation, that determines whether a state's security is gravely imperiled by rival AI. If the capabilities gap between the US and China would remain small with high confidence through the entire process of development, there is less of an incentive for either party to maim. The complicating factor is that if the rate of development accelerates, perhaps during an intelligence recursion, a small time gap in AI development may still translate to a large relative capabilities gap, which probably would be actionable. This is what makes attempting a recursion a "bid for dominance" in the parlance of the paper in a way that other types of development might not be. Also, in this framing, stealing AI weights is arguably a type of maiming attack: it reduces the relative capabilities gap between states.

Is MAIM descriptive or normative? This could have been clearer in the paper. My view is that MAIM descriptively points to "fortunate" features of AI development and geopolitics which make deterrence attractive: states are fully militarily capable of surveilling and crippling rival AI projects requiring vast centralization of compute, and they are incentivized to do so in the face of a large rival capabilities gap or a high perceived loss of control risk from a rival AI project. All current large-scale development is credibly vulnerable to severe disruption by states as it approaches superintelligence.

The normative claim is that states should work to preserve the mutual vulnerability of AI projects even as the "facts on the ground" of AI development change: for example, AI development which requires less or less centralized compute, or heavily securitized projects that are harder to surveil or steal weights from, or hardened AI infrastructure. States that prefer the stability of a mutual vulnerability might agree to centralize their compute anyways; implement various transparency measures on the amount of compute they have, where it is, and what it's doing; not harden their AI infrastructure in various ways; etc. Some of these conditions probably can be imposed unilaterally, i.e. a nation can threaten to maim a project attempting to build a large underground datacenter. But others need mutual buy-in from relevant parties.

I agree that a state which wants to subvert MAIM would have several attractive ways to do so: threaten all-out retaliation to maiming attacks (I have been calling this "effective hardening" and think it's a plausible though destabilizing option), hardening and dispersing compute as much as possible, concealing their activities and progress to reduce the confidence attacks will succeed, etc. If we reach a point where AI is sufficiently geopolitically salient, I can totally imagine this being seen as an act of war, though I agree that this would be a reversal from the status quo.

I was a bit confused by some of the points on the MAIM analogy to MAD--"Offensive AI use could be difficult to attribute" and "Ability to retaliate is unclear"--as they seem to imply that MAIM means "If your AI attacks me, my AI attacks you" as opposed to "Rival AI development projects are mutually vulnerable to sabotage." MAIM is not about attacks that AIs perform. Kinetic strikes on an AI project are attributable, and I don't see why sabotaging a rival AI project would in general prevent them from sabotaging yours (separate from the valid question of whether a state can be confident that maiming attacks will work).

Otherwise, I broadly agree that limited visibility of rival activities and capabilities, and uncertainty about the reliability of a maiming attack make MAIM less stable than MAD. They don't seem like issues in the current regime of terrible security and few frontier datacenters, but they eventually will need to be actively addressed.

Expand full comment
6 more comments...

No posts