AGI by 2030? What Policy Leaders, Tech Leaders, and Pokémon say
Three podcasts and one Pokémon reset show where AI is going
Former White House advisor Ben Buchanan suggests AGI might arrive within Trump's presidential term. Meanwhile, Anthropic CEO Dario Amodei predicts it by 2026-2027, while DeepMind CEO Demis Hassabis gives AGI a 50% chance by 2030. And at the same time, Claude's Pokémon adventures reveal both impressive capabilities and surprising limitations of today's systems. What should we make of this? Let’s dig in.
…The Government knows AGI is coming
Ben Buchanan was the former special adviser for artificial intelligence in the Biden White House. This was a powerful position that directly coordinated Biden’s efforts on AI. So I was super curious to see a podcast between Ben Buchanan and Ezra Klein. Here are my favorite parts:
What is AGI? Klein defines AGI as systems “capable of doing basically anything a human being could do behind a computer — but better”. Buchanan defines AGI as “a system capable of doing almost any cognitive task a human can do.” Buchanan adds that such systems would have the “capacity to, in some cases, exceed human capabilities, regardless of the cognitive discipline.” However you define it, we’re talking about a large change in how labor will work, with new risks and large benefits for governments to track and navigate.
AGI could be momentous: Klein doesn’t mince words here – “while there is so much else going on in the world to cover, I do think there’s a good chance that, when we look back on this era in human history, AI will have been the thing that matters. [...W]e are on a path to creating transformational artificial intelligence capable of doing basically anything a human being could do behind a computer — but better. [...] If you’ve been telling yourself this isn’t coming, I really think you need to question that. It’s not web3. It’s not vaporware. A lot of what we’re talking about is already here, right now.”
We don’t know how to prepare: Klein continues – “I think we are on the cusp of an era in human history that is unlike any of the eras we have experienced before. And we’re not prepared in part because it’s not clear what it would mean to prepare. We don’t know what this will look like, what it will feel like. We don’t know how labor markets will respond. We don’t know which country is going to get there first. We don’t know what it will mean for war. We don’t know what it will mean for peace.” (I agree. The stakes and uncertainty are why I’ve been urgently researching the question.)
AGI could happen in Trump’s term: Klein also notes that he has talked with a lot of people who think AGI might arrive in this current Presidential term. A lot of these people are in AI companies, who may be incentivized to hype their own products. But Ben Buchanan has a similar prediction and is not part of the corporate hype – he was in the White House, never in an AI company. (Personally I put the odds of AI capable of automating all remote work emerging within Trump’s term at 25%, though I expect the Trump admin to still be a key time during which AI capabilities heat up and we do need to be prepared for AGI to come very fast.)
The Biden government was paying close attention: Buchanan says that the Biden government saw “very clear trend lines” while he was inside.
The shared priority – beat China: Buchanan mentions that everyone in DC politics basically agrees the US needs to lead in AI, mostly to keep China from dominating. Buchanan, echoing President Kennedy, suggests the US be in the lead to shape whether AI is “a sea of peace or a terrifying theater of war.”
Cyber risks from advanced AI: Ezra is concerned that AI can speed up hacking. Buchanan agrees that AI can find software exploits fast and old/legacy systems will be more at risk. But AI can also harden code by automatically finding and patching security holes. The net effect is still uncertain.
Security of AI companies: Klein points out it’s annoying to adopt fully secure procedures, so vulnerabilities remain. Engineers don’t want to have to work out of secure compartmentalized facilities! Buchanan is worried that in Silicon Valley’s casual environment, advanced methods might leak at house parties where people talk too freely. The National Security Memorandum in Oct 2024 asked labs to tighten security. Buchanan argued that the Biden government aimed to help rather than hamper them.
On DeepSeek: Buchanan says DeepSeek surprised Americans with a seemingly less compute-intensive but high-performing model. But the Biden White House had been watching DeepSeek since its first publicly released model in 2023 and wasn’t surprised. Buchanan says: “when you look at what DeepSeek has done, I don’t think the media hype around it was warranted, and I don’t think it changes the fundamental analysis of what we are doing. They still are constrained by computing power. On what to do: “We should tighten the screws and continue to constrain them. They’re smart. Their algorithms are getting better. But so are the algorithms of U.S. companies.” (See my Ten Takes on DeepSeek for more on this.)
On labor impacts: Both Buchanan and Klein see many knowledge jobs being reshaped or partially replaced in the near-term. And AGI by definition should be able to do a wide variety of human jobs, leaving minimal places for humans to “reskill” into. Klein shared some frustration that we’ve known about possible labor displacement from AI for years, but the “quality of thinking” or policymaking is lacking.
On potential upsides from AI: AI might find novel molecules or drastically shorten time to develop new treatments. The real bottleneck then becomes the painfully slow real-world testing process (animals, human trials). Other domains where AI could excel include education, climate modeling, biotech, etc. Klein critiques that institutions (like FDA or big bureaucracies) are not set up to handle a sudden surge of AI-driven innovation. Klein and Buchanan discuss how AI could transform what humans are capable of doing and be amazing for small businesses.
Closing statement: Buchanan closes by mentioning that under Biden, they tried to lay an institutional foundation for safety, modest controls, and to keep the US ahead of China. Ultimately, the Trump administration must make major decisions as advanced systems roll out.
~
…Amodei and Hassabis sitting on a couch
These government predictions align with what we're hearing from leading AI CEOs. Once you get over the weird gap in couches, it’s cool to see Dario Amodei (CEO of Anthropic) and Demis Hassabis (CEO of DeepMind) in a joint conversation with Economist editor-in-chief, Zanny Minton Beddoes.
My takeaways:
Yet more definitions of AGI.
Amodei defines AGI as when a model can do everything a human can do— akin to a Nobel laureate-level across many fields. Estimates arrival in 2026 or 2027 (!!!).
Hassabis uses a similar definition: a system with all the cognitive capabilities humans have. However, Hassabis argues that such a system must be able not only to solve known tasks but also originate ideas on par with Einstein’s creation of General Relativity. Hassabis thinks this form of AGI might be “a bit further out”, giving a 50% chance by end of the decade (i.e. ~2030).
Because they’re using slightly different definitions, it’s hard to understand how much they disagree. Such is the issue with AGI.
Will there be a “threshold moment” where AI takes off rapidly? This hypothetical involves AI iteratively developing more advanced successor AIs such that whoever develops the first “runaway AI cycle” will get an unassailable lead. Amodei essentially thinks this could occur, and says this is why the US should pursue a lead in AI over China.
What does Amodei think we should do about risks from AI? AI companies should create demonstrations of risk within labs, such as testing whether models can help would-be terrorists produce biological weapons or whether AI would be intentionally deceptive. Amodei thinks we might need “10 times stronger evidence” of risk to truly galvanize world leaders.
How does a fractured global environment affect the arrival of AGI? Amodei argues that governments need more awareness of what’s at stake with AI. While the near-term hype is huge, Amodei thinks that people still underestimate what’s coming in the medium/long term.
What does Hassabis think about regulatory approach to AGI? Hassabis is aiming to be in the middle of US-style “full speed ahead” and EU’s caution. He thinks we should embrace the massive benefits, use AI to advance science and medicine. Hassabis is excited for huge productivity gains and time-savings from AI agents becoming mainstream and being able to autonomously do much more for you than current chatbots. But a the same time, we should acknowledge and mitigate major risks — bad actors who can use powerful AI to enable large-scale harm and AGI itself having harmful goals.
What does Hassabis think we should do to get global cooperation on AGI? He re-introduces the idea of a “CERN for AGI” — a large-scale, international collaboration on the final steps to creating safe AGI. Hassabis envisions a “neutral” space where top researchers can share knowledge and keep each other in check.
Do Amodei and Hassabis worry about turning into a ‘Robert Oppenheimer’ figure? Hassabis admits losing sleep over these questions and feels a huge responsibility. Both of them believe no single person should carry that responsibility and yet that is basically where things are at right now. This whole section of the conversation feels like a cry for help.
~
…Claude plays Pokémon got a reset
While these timelines might seem aggressive, they're coming from the very people building these systems. But what's the reality of current capabilities? I enjoyed following Claude Plays Pokémon on Twitch, discussing it in Weekend Links #5. It’s an interesting puzzle to me for three reasons:
It's revealing that an AI system can outperform PhDs on scientific quizzes and solve very complex math problems, yet struggles with Pokémon. This game, which 7-year-olds can beat with minimal guidance, repeatedly stumps Claude despite the AI having extensive knowledge about Pokémon in its training data.
If government and tech leaders are right that we will get AGI within 2-5 years, these struggles are interesting to see. It’s certainly possible they could get addressed quickly, and we have already seen rapid progress in under one year. But there’s still a lot more to go.
If AI agents are to be the next big thing, how capable should we expect those agents to be? What kinds of tasks get them stuck? What can be done to get them unstuck in a way that generalizes to other tasks? Claude plays Pokémon gives us a lens into this.
So it was fascinating to see Claude get through Mt. Moon with 80 hours of continuous gameplay and score the second gym badge quickly… only to then get stuck in the same spot in Cerulean City for three days, overthinking everything, and looping around talking to the same two NPCs over and over again. Based on this, it was voted to reset the game all the way back to the beginning and try again, but this time with an upgrade to the memory tool to help it navigate a bit better. As of writing, Claude has been in the Viridian Forest for about ten hours, headed to the first gym battle with Brock.
To learn about this further, I turn to a podcast with David Hershey from Anthropic, who is running “Claude Plays Pokémon” as a personal project.
Rapid improvement: Hershey notes that “Claude Plays Pokemon” initially began in 2024 June with Claude 3.5. Early attempts were not great; the model would often fail basic steps like leaving the house. Over time, with new versions of Claude, it steadily improved, reaching new milestones, like finally being able to select a starter Pokemon. Though it still gets stuck at times (like the extended slog through Mount Moon and now the Cerulean City reset), the improvement over the past eight months is undeniable.
A real-time benchmark: Hershey emphasizes that “Claude Plays Pokémon” isn’t just fun, it’s a way to benchmark AI capabilities. By watching Claude’s behaviors over millions of tokens, they see where it fails, where it improves, and how it handles extended tasks. Observing how Claude addresses challenges offers insight for real-world applications, especially around agent-based tasks.
How does it work? Hershey clarifies that the harness is fairly straightforward and not heavily optimized to ensure it’s a pure environment for testing Claude’s capabilities. The system includes three key components: a conversation loop with the model, a toolkit for in-game actions, and a knowledge base that allows Claude to store and recall long-term facts. The system prompt is minimal, mostly providing instructions on how to use the tools.
Claude’s mistakes: Claude struggles to interpret the tiny Game Boy screen effectively, occasionally mixing up walls and open spaces. Hershey tried numerous ways to prompt around this confusion, but nonetheless Claude just attempts to walk through walls. Claude also has hallucinations — for example, Claude might see an NPC and declare it is “Professor Oak” even when it is not. The knowledge base sometimes helps, but it can also introduce new hallucinations if Claude’s “memory” is incorrect. It seems like Claude is prone to occasionally writing incorrect things to the knowledge base, which can get Claude stuck.
Improvements via tool use: The biggest improvement for movement came from letting Claude pick a coordinate, then automatically pressing the correct directions so it does not repeatedly slam into walls. Without this, Claude might forever try to walk through obstacles, not understanding why it fails.
This is expensive: Running “Claude Plays Pokemon” 24/7 with a large context window can cost thousands of dollars. Hershey notes that the project receives internal support from Anthropic and would be a big financial commitment for most individuals.
Claude changing behavior: Interestingly, once Claude started nicknaming Pokemon, Claude then became protective and would heal them more often. This demonstrates that small personal touches or emotional attachments can influence the model’s choices.
Overall, this suggests that models are still struggling with longer-term tasks. The issue seems to be about memory management and turning strategy into tactics. Claude seems to have a good general strategy for beating the game. Indeed, it can recite a complete plan from start-to-finish based on knowledge in its training data. Claude also handles good individual actions well, such as taking steps, navigating toward objects, and interacting intelligently. However, it struggles with mid-level planning. Claude doesn't effectively combine individual actions into tactical approaches that advance small parts of its overall strategy, which can make it get stuck in loops.
The convergence of government insiders and tech leaders on aggressive AGI timelines suggests we're approaching a technological inflection point. However, Claude's struggles with Pokémon demonstrate that significant hurdles remain in translating knowledge into effective real-world action. This tension between theoretical capability and practical execution will likely determine whether AGI arrives in 2-3 years (Amodei), by 2030 (Hassabis), or beyond. But we all know that AI can make large improvements in a small amount of time — Claude gets through much more of Pokémon than just six months ago, and will likely get through much more Pokémon again in another six months.
And regardless of exact timing, the preparations Buchanan, Amodei, and Hassabis describe — from export controls to institutional readiness to watching the benchmarks and trend lines — seem prudent given the transformative impact experts agree is coming.





Great content! Seems like there needs to be a website which just has a table tracking the AGI estimates of leading insiders alongside their definitions, much like AI lab watch but for timelines.
Do you view the Pokémon struggles as a nines of reliability problem, or more of a jagged edge problem, or some third thing? Based on your description it sounds kind of like both - a jagged edge where the midlevel planning and OODA loop has disconnects and fails to accurately update to reality, which might suggest a training data gap, resulting in unreliable performance over long enough time horizons.