Forecaster reacts: METR's bombshell paper…

Peter Wildeford

Apr 21

New data supports an exponential AI curve, but lots of uncertainty remains

Read →

16 Comments

Tristan Trim

Apr 23

This is quite in depth. Thanks for writing : )

> The gap between performing well on clean, well-defined software tasks versus navigating the messy uncertainty of real-world problems remains significant.

Not all software tasks are clean and well defined. Indeed, most of the valuable ones are not. But it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM). I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling. Is that because of the complexity of including extra factors, or do you discount the effect?

> This could help us see massive AI progress and give us more time to prepare.

I don't want to be mean, but anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.

Expand full comment

Reply (1)

Peter Wildeford

May 11

> it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM)

Yes, but I think a different kind of software task than the tasks in METR's task suite.

> I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling.

I don't ignore it in the modeling - this is exactly why the model contains the possibility of a superexponential trend.

> anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.

I don't think the risks of going further are demonstrated clearly enough yet for people to want to stop

Expand full comment

Reply (1)

Tristan Trim

May 11

I don't think the safety of going further is demonstrated clearly enough yet to justify not stopping.

Expand full comment

Saul

Apr 22

Excellent article. I assume the author will update his forecasts & assumptions as newer models are released?

Expand full comment

Reply (1)

Peter Wildeford

Apr 22Edited

Yes! That's the most exciting part I think.

Expand full comment

Daniel Kokotajlo

Jun 20

Well done! Your view here is very similar to my colleague Eli Lifland's. For AI 2027 we depicted something a bit more bullish, due to my somewhat shorter timelines. My timelines are somewhat shorter because (a) I think that the trend will be inherently superexponential -- at some point AIs will be good enough at error correction, planning, memory, etc. that their time horizons will explode, similar to humans -- and (b) I think that partial automation of AI R&D will start to speed things up by around 2026 - 2027. Maybe also (c) I place a lot of weight on the point that agency training for agentic coding tasks is only just now really getting started.

Expand full comment

Raelifin

Apr 21

Do you have any public commentary reacting to AI-2027.com? My model of the authors think that software development, or more specifically AI research, is the most important task to consider when forecasting capabilities, since it's the relevant one for recursive self-improvement. If AI in 2027 is still bad at making omelettes compared to humans, but it's much more able to make more powerful AIs, this seems to me like it could accelerate the curve compared to extrapolating what we currently see.

Expand full comment

Reply (1)

Peter Wildeford

Apr 21

"Forecaster reacts: AI2027" is in my to-do list!

I'd clarify that software development (especially of the kind METR measures) and AI research are probably fairly different tasks. But the point about omelette making is valid.

Expand full comment

Reply (1)

David J Higgs

Apr 22

I assume you were mainly speaking for Raelifin's benefit, and I will simply note that the authors of 2027 are of course keenly aware of the distinctions between METER's task measures, software programming more generally, and AI research even more generally.

A surprising number of people who are aware of, or even who have claimed to read AI 2027, seem unaware of the extensive research/forecasting that went into it, such as in the timelines forecast supplement.

Expand full comment

Reply (1)

Peter Wildeford

Apr 22

That's right! I agree they have a very sophisticated approach. I've read the entire forecast supplements. I look forward to diving in more but it will take time.

Expand full comment

David J Higgs

Apr 22

Another forecasting model suggesting AGI roughly at the end of this decade, beginning of the next one, or not long after that. And taking zero account of even modest future RSI effects before then (such as when full junior/senior programming get automated, pure math relevant to AI research, AI research work as a whole, etc.).

Increasingly it seems likely that the ~2030 compute/data runway left for continued scaling will be just enough, or get us close enough that a slow down proves little enough obstacle. Initial modest RSI, possibility of a new (or multiple!) minor paradigm advancement on the level of reasoning models, etc., seem to add up to pre-2035 median timelines being the obvious choice even with (some) deference given to outside view considerations.

Although I don't currently place that much weight on the possibility of things like a near term future invasion of Taiwan, the Trump admin seriously screwing up American AI development efforts in a variety of ways, some natural economic recession of a scale greater than the Great Recession, intense regulation/international treaty efforts strongly delaying capability advancements, etc., which might be biasing my estimates too early...

Expand full comment

Reply (1)

Peter Wildeford

Apr 22

BTW it's not true that I take "zero account of even modest future RSI effects", that's what motivates the possibility of a superexponential and/or sustained fast doubling rates.

On the other hand, the plausible compute/data slowdown at 2030, Taiwan issue, nor Trump variance are all not factored into this model.

Expand full comment

Reply (1)

David J Higgs

Apr 23Edited

Oh yeah, my bad, that's true. I suspect it is nonetheless an underestimate of AI researcher relevant productivity gains, since GPT 3.5 and earlier were certainly providing ~nothing, and most if not ~all of the recent speed-up seems like it's from the reasoning model paradigm. Plus, there's a step change from little task efficiency increases/partial automations, on the way to full agentic job automations like junior dev.

But I guess that could be interpreted as simply the more aggressive super exponentials/fast doubling time options. And of course there's some chance that modest RSI doesn't impact AGI timelines a great deal, though harder to see how that happens other than in a "totally out of model long timelines, AI Winter + New Paradigm needed" scenario.

Edit: great analysis btw, I appreciate how thorough, well sourced, and balanced your articles are, especially on AI. :)

Expand full comment

Carolyn Meinel

Apr 22

Excellent points about exponential forecasts. I remember in 1975, Rockwell and Being were predicting freight to geosynchronous orbit at 4$/lb by1995. Around then the Space Shuttle was supposed to deliver freight to low Earth orbit for 25$/ib by 1985.

Expand full comment

Tim

Apr 21

I'm broadly skeptical about the external validity of this benchmark, but it does seem useful for measuring the potential productivity gains from AI in highly specific domains.

Expand full comment

Reply (1)

Peter Wildeford

Apr 21

Agree. I'm pretty skeptical too, as I hoped I made it clear in the post.

Expand full comment

The Power Law

Forecaster reacts: METR's bombshell paper…