> The gap between performing well on clean, well-defined software tasks versus navigating the messy uncertainty of real-world problems remains significant.
Not all software tasks are clean and well defined. Indeed, most of the valuable ones are not. But it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM). I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling. Is that because of the complexity of including extra factors, or do you discount the effect?
> This could help us see massive AI progress and give us more time to prepare.
I don't want to be mean, but anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.
> it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM)
Yes, but I think a different kind of software task than the tasks in METR's task suite.
> I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling.
I don't ignore it in the modeling - this is exactly why the model contains the possibility of a superexponential trend.
> anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.
I don't think the risks of going further are demonstrated clearly enough yet for people to want to stop
Well done! Your view here is very similar to my colleague Eli Lifland's. For AI 2027 we depicted something a bit more bullish, due to my somewhat shorter timelines. My timelines are somewhat shorter because (a) I think that the trend will be inherently superexponential -- at some point AIs will be good enough at error correction, planning, memory, etc. that their time horizons will explode, similar to humans -- and (b) I think that partial automation of AI R&D will start to speed things up by around 2026 - 2027. Maybe also (c) I place a lot of weight on the point that agency training for agentic coding tasks is only just now really getting started.
Do you have any public commentary reacting to AI-2027.com? My model of the authors think that software development, or more specifically AI research, is the most important task to consider when forecasting capabilities, since it's the relevant one for recursive self-improvement. If AI in 2027 is still bad at making omelettes compared to humans, but it's much more able to make more powerful AIs, this seems to me like it could accelerate the curve compared to extrapolating what we currently see.
I'd clarify that software development (especially of the kind METR measures) and AI research are probably fairly different tasks. But the point about omelette making is valid.
I assume you were mainly speaking for Raelifin's benefit, and I will simply note that the authors of 2027 are of course keenly aware of the distinctions between METER's task measures, software programming more generally, and AI research even more generally.
A surprising number of people who are aware of, or even who have claimed to read AI 2027, seem unaware of the extensive research/forecasting that went into it, such as in the timelines forecast supplement.
That's right! I agree they have a very sophisticated approach. I've read the entire forecast supplements. I look forward to diving in more but it will take time.
Another forecasting model suggesting AGI roughly at the end of this decade, beginning of the next one, or not long after that. And taking zero account of even modest future RSI effects before then (such as when full junior/senior programming get automated, pure math relevant to AI research, AI research work as a whole, etc.).
Increasingly it seems likely that the ~2030 compute/data runway left for continued scaling will be just enough, or get us close enough that a slow down proves little enough obstacle. Initial modest RSI, possibility of a new (or multiple!) minor paradigm advancement on the level of reasoning models, etc., seem to add up to pre-2035 median timelines being the obvious choice even with (some) deference given to outside view considerations.
Although I don't currently place that much weight on the possibility of things like a near term future invasion of Taiwan, the Trump admin seriously screwing up American AI development efforts in a variety of ways, some natural economic recession of a scale greater than the Great Recession, intense regulation/international treaty efforts strongly delaying capability advancements, etc., which might be biasing my estimates too early...
BTW it's not true that I take "zero account of even modest future RSI effects", that's what motivates the possibility of a superexponential and/or sustained fast doubling rates.
On the other hand, the plausible compute/data slowdown at 2030, Taiwan issue, nor Trump variance are all not factored into this model.
Oh yeah, my bad, that's true. I suspect it is nonetheless an underestimate of AI researcher relevant productivity gains, since GPT 3.5 and earlier were certainly providing ~nothing, and most if not ~all of the recent speed-up seems like it's from the reasoning model paradigm. Plus, there's a step change from little task efficiency increases/partial automations, on the way to full agentic job automations like junior dev.
But I guess that could be interpreted as simply the more aggressive super exponentials/fast doubling time options. And of course there's some chance that modest RSI doesn't impact AGI timelines a great deal, though harder to see how that happens other than in a "totally out of model long timelines, AI Winter + New Paradigm needed" scenario.
Edit: great analysis btw, I appreciate how thorough, well sourced, and balanced your articles are, especially on AI. :)
Excellent points about exponential forecasts. I remember in 1975, Rockwell and Being were predicting freight to geosynchronous orbit at 4$/lb by1995. Around then the Space Shuttle was supposed to deliver freight to low Earth orbit for 25$/ib by 1985.
I'm broadly skeptical about the external validity of this benchmark, but it does seem useful for measuring the potential productivity gains from AI in highly specific domains.
This is quite in depth. Thanks for writing : )
> The gap between performing well on clean, well-defined software tasks versus navigating the messy uncertainty of real-world problems remains significant.
Not all software tasks are clean and well defined. Indeed, most of the valuable ones are not. But it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM). I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling. Is that because of the complexity of including extra factors, or do you discount the effect?
> This could help us see massive AI progress and give us more time to prepare.
I don't want to be mean, but anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.
> it is exactly software tasks that will cause "Feedback loops" (aka RSI, aka FOOM)
Yes, but I think a different kind of software task than the tasks in METR's task suite.
> I think this is a strong enough predicted effect to not assume "business as usual". You make a note of it, but then ignore it in your modelling.
I don't ignore it in the modeling - this is exactly why the model contains the possibility of a superexponential trend.
> anyone who hasn't already seen the massive AI progress and wanted us to stop and prepare before going further seems... like they need to justify their position more than they have.
I don't think the risks of going further are demonstrated clearly enough yet for people to want to stop
I don't think the safety of going further is demonstrated clearly enough yet to justify not stopping.
Excellent article. I assume the author will update his forecasts & assumptions as newer models are released?
Yes! That's the most exciting part I think.
Well done! Your view here is very similar to my colleague Eli Lifland's. For AI 2027 we depicted something a bit more bullish, due to my somewhat shorter timelines. My timelines are somewhat shorter because (a) I think that the trend will be inherently superexponential -- at some point AIs will be good enough at error correction, planning, memory, etc. that their time horizons will explode, similar to humans -- and (b) I think that partial automation of AI R&D will start to speed things up by around 2026 - 2027. Maybe also (c) I place a lot of weight on the point that agency training for agentic coding tasks is only just now really getting started.
Do you have any public commentary reacting to AI-2027.com? My model of the authors think that software development, or more specifically AI research, is the most important task to consider when forecasting capabilities, since it's the relevant one for recursive self-improvement. If AI in 2027 is still bad at making omelettes compared to humans, but it's much more able to make more powerful AIs, this seems to me like it could accelerate the curve compared to extrapolating what we currently see.
"Forecaster reacts: AI2027" is in my to-do list!
I'd clarify that software development (especially of the kind METR measures) and AI research are probably fairly different tasks. But the point about omelette making is valid.
I assume you were mainly speaking for Raelifin's benefit, and I will simply note that the authors of 2027 are of course keenly aware of the distinctions between METER's task measures, software programming more generally, and AI research even more generally.
A surprising number of people who are aware of, or even who have claimed to read AI 2027, seem unaware of the extensive research/forecasting that went into it, such as in the timelines forecast supplement.
That's right! I agree they have a very sophisticated approach. I've read the entire forecast supplements. I look forward to diving in more but it will take time.
Another forecasting model suggesting AGI roughly at the end of this decade, beginning of the next one, or not long after that. And taking zero account of even modest future RSI effects before then (such as when full junior/senior programming get automated, pure math relevant to AI research, AI research work as a whole, etc.).
Increasingly it seems likely that the ~2030 compute/data runway left for continued scaling will be just enough, or get us close enough that a slow down proves little enough obstacle. Initial modest RSI, possibility of a new (or multiple!) minor paradigm advancement on the level of reasoning models, etc., seem to add up to pre-2035 median timelines being the obvious choice even with (some) deference given to outside view considerations.
Although I don't currently place that much weight on the possibility of things like a near term future invasion of Taiwan, the Trump admin seriously screwing up American AI development efforts in a variety of ways, some natural economic recession of a scale greater than the Great Recession, intense regulation/international treaty efforts strongly delaying capability advancements, etc., which might be biasing my estimates too early...
BTW it's not true that I take "zero account of even modest future RSI effects", that's what motivates the possibility of a superexponential and/or sustained fast doubling rates.
On the other hand, the plausible compute/data slowdown at 2030, Taiwan issue, nor Trump variance are all not factored into this model.
Oh yeah, my bad, that's true. I suspect it is nonetheless an underestimate of AI researcher relevant productivity gains, since GPT 3.5 and earlier were certainly providing ~nothing, and most if not ~all of the recent speed-up seems like it's from the reasoning model paradigm. Plus, there's a step change from little task efficiency increases/partial automations, on the way to full agentic job automations like junior dev.
But I guess that could be interpreted as simply the more aggressive super exponentials/fast doubling time options. And of course there's some chance that modest RSI doesn't impact AGI timelines a great deal, though harder to see how that happens other than in a "totally out of model long timelines, AI Winter + New Paradigm needed" scenario.
Edit: great analysis btw, I appreciate how thorough, well sourced, and balanced your articles are, especially on AI. :)
Excellent points about exponential forecasts. I remember in 1975, Rockwell and Being were predicting freight to geosynchronous orbit at 4$/lb by1995. Around then the Space Shuttle was supposed to deliver freight to low Earth orbit for 25$/ib by 1985.
I'm broadly skeptical about the external validity of this benchmark, but it does seem useful for measuring the potential productivity gains from AI in highly specific domains.
Agree. I'm pretty skeptical too, as I hoped I made it clear in the post.