Numbers, Not Hype: Talking AI Policy on the CAIP Podcast
Seven risks, massive opportunities, and why 2025 is make‑or‑break
I recently had the honor of being on the Center for AI Policy Podcast, hosted by Jakub Kraus. We discussed forecasting, the strategic landscape of AI, and the implications of both forecasting and AI strategy for US policy.
You can listen to the podcast on YouTube, Apple Podcasts, or Substack by going to the original show page here.
Otherwise, here are my notes about what we talked about:
Why explicit probabilities beat vague predictions
We are all constantly making implicit forecasts in our daily lives and in policymaking — for example, judging what a policy will accomplish or if it will backfire. The goal of formal forecasting, as practiced on platforms like Metaculus, is to make these judgments explicit, specific, and quantifiable.
Implicit forecasting has major problems. Philip Tetlock's research found that many political experts' predictions were no better than random chance.
Another key issue is what Tetlock calls vague verbiage — using terms like "it may happen" or "it might happen," which can be interpreted in wildly different ways. This creates counterproductive confusion.
For the Bay of Pigs invasion, the Joint Chiefs told President Kennedy there was a "fair chance of success." They meant a probability of less than 50%, but Kennedy interpreted it as a much higher likelihood. Using explicit numbers could have prevented this misunderstanding.
The key benefit of explicit forecasting is developing a quantitative track record. This allows you to identify who is actually good at predicting the future and to learn from your own mistakes, ideally leading to better decision-making.
How forecasting skill develops through practice
Forecasting skill isn't magic; like any skill, forecasting can improve with dedicated practice.
The process involves making explicit predictions, tracking your performance, and conducting “post-game analysis” to reflect on what you got right and wrong.
There is a generalist forecasting skill, where someone practiced in the art of quantifying uncertainty can often outperform a domain expert at forecasting in their own field, simply because forecasting is a distinct skill from subject matter expertise. The domain expert may know more about the domain, but struggle to quantify this.
The best approach, according to Tetlock's research, is forecaster-subject expert teaming. In these teams, domain experts are best at analyzing a problem and identifying the key factors and the generalist forecaster is better at taking that analysis and assigning an accurate final probability. This combination produces a stronger result than either party could achieve alone.
I use this method professionally, such as iterative processes like the Delphi method or the IDEA protocol, where a panel of experts and forecasters make individual predictions, discuss them as a group, and then refine them.
Forecasting failures in 9/11 and Iraq War
Forecasting expertise exists within the US government (e.g., in intelligence communities) but doesn't permeate the entire policymaking world.
Many high-profile intelligence failures were ultimately forecasting failures, sometimes stemming from not even asking the right questions in the first place.
Example: 9/11. Key questions — such as the likelihood of a major terrorist attack on US soil using hijacked airplanes — were not adequately conceptualized or assessed. Better forecasting could potentially have prompted increased airline security before the attacks.
The failure was also one of coordination, as different agencies held separate pieces of the puzzle, a flaw that led to the creation of the Director of National Intelligence.
Example: The Iraq War. Did Saddam Hussein possess weapons of mass destruction? That’s a forecasting question. The intelligence assessment that he did was massively overconfident and "devastatingly wrong," leading to a disastrous policy decision.
Why no one can predict AI's exact future
Forecasting is often met with skepticism, but it's not about having a crystal ball. It's about expressing uncertainty with precision.
Analogy: A weather forecast giving a 10% chance of rain doesn't mean it won't rain; it means that in 10 out of 100 similar situations, it will.
This is especially true for AI, a field rife with both hype and genuine uncertainty. Anyone who tells you they know exactly what will happen with AI is not a credible forecaster.
The future contains a wide range of possibilities, including worlds where AI develops much faster than we expect and worlds where it's a much slower burn.
Reacting to AI2027
On the AI 2027 report by Daniel Kokotajlo and others — it’s an “excellent meticulous piece of work” and a form of “hard science fiction.” It presents a vivid, plausible scenario based on technical analysis, making it a story worth taking seriously.
However, it's just one possible future. The long chain of dependent events in the story makes it incredibly unlikely that things will play out exactly as described. The report is a tool for imagining a possibility, not a definitive prediction.
Building AI threat assessment capabilities
I recommended the US government establish a Rapid Emerging Assessment Council for Threats (REACT) to convene experts and respond to sudden AI-related threats.
Beyond reactive capabilities, there's a need for proactive capacity. The biggest challenge is that AI is a new, critical, and poorly understood technology, and the government simply needs more technical talent to anticipate developments.
Analogy: Fire safety.
Reactive: A fire department is dispatched to put out fires after they start.
Proactive: Building codes and fire risk assessments are used to prevent fires. This involves forecasting — identifying where buildings might catch fire and mitigating that risk.
AI policy needs both. We need a "fire department" for when AI systems "catch on fire," but we also need the "building codes" to ensure we're developing AI safely from the start.
Seven key AI risks facing government
People think there is only one risk from AI. But I outline seven major categories of AI risk that the government needs to monitor:
Misuse: Bad actors using AI for things like massive cyberattacks or designing novel bioweapons.
Loss of Control to Rogue AI: As AI systems become more autonomous, we face a genuine risk of losing control over their actions.
Loss of Control to Adversaries: China could win the AI race and use AI to dominate the world. Other foreign adversaries like Russia could steal, poison, or manipulate US AI systems to wreak havoc.
War over AI: The race for AI supremacy could lead to direct military conflict.
Concentration of Power: AI could lead to an unhealthy concentration of power, threatening democratic structures.
Strategic Surprise: AI could develop in ways we don't anticipate, leading to major societal destabilization.
Societal Impacts: Even in a positive scenario, the displacement of human workers by AI will create immense societal adaptation challenges.
AI’s opportunities and benefits
I’m not just a “negative Nancy” on AI. There's a tremendous AI opportunity for America. AI could usher in a golden age of innovation by assisting scientists in medicine, material science, logistics, and more.
AI has the potential to create “AI-powered abundance,” where every individual has access to unbounded creativity, no longer limited by personal resources or abilities.
Crucial caveat: This future must be built on a fundamental commitment to human freedom. It cannot be a paternalistic utopia where AI systems dictate our choices for our own good. AI should enhance human values and preserve our ability to make our own decisions.
Why employment projections may miss AI’s impact
The conversation zooms in on the US Bureau of Labor Statistics (BLS) incorporating AI into its projections. Kraus points out their methods assume a pace of technological change consistent with the past, which might not hold for AI.
I agree AI will likely not be a gradual, incremental technology. I suggest the BLS should consider models that account for rapid change.
Labor market effects are complex and labor impacts from AI will be more than just mass firings. AI impacts could manifest as companies hiring less, not backfilling open roles, or reassigning tasks around what AI can and cannot do.
Kraus mentions seeing a claim on Twitter that translator employment has actually grown rather than shrunk, even though this should be affected negatively by AI translation.
I suggest this could be the Jevons paradox: as AI makes translation cheaper and more efficient, the overall demand for translation services increases. This shifts the role of human translators to supervising AI, potentially increasing the total number of jobs.
This is similar to how the ATM did not eliminate bank tellers — their jobs just changed.
AI software capabilities double every 7 months
The discussion turns to a paper from METR (Model Evaluation and Threat Research) which found that the complexity of software engineering tasks that AI agents can reliably complete has been doubling roughly every seven months. Preliminary evaluations of newer models like OpenAI's o3 and o4-mini suggest the pace might be accelerating.
I wrote a Substack post reacting to the paper, offers important nuance:
Caveats: The tasks in the study are highly specific — they are solitary, static, and have clear definitions and success criteria. Real-world work is messy, collaborative, and often happens in low-feedback environments. AI struggles when it can't get immediate, clear feedback on whether it failed and why.
Counter-Caveats: Don't dismiss the findings. We are seeing models succeed at tasks that were impossible for AI just a few years ago. AI is now better at many software tasks than myself, despite me being a former professional software engineer.
If AI masters software engineering, it could accelerate its own development, potentially creating a runaway feedback loop.
When AI forecasters will beat humans
A recent Metaculus competition analysis found that top AI bots were closing the gap with human "Pro Forecasters," with the difference no longer being statistically significant.
I created a Metaculus question on this topic back in 2021; the community median currently predicts this will happen by 2031.
How good is AI at forecasting today?
It's exceptional at the first step — rapidly gathering and synthesizing vast amounts of information. I use it for this frequently.
AI is likely better than a typical person who has little forecasting practice.
AI is very good at identifying the "revealed wisdom" or conventional view on a topic.
However, AI is not yet at the level of the best human forecasters. True skill often lies in finding a counterintuitive insight that the conventional wisdom has missed. AI can't do that yet.
I generally agree with the 2031 timeline, after which I may not have much to add to an AI's forecast, similar to how I can't help beat a top AI at chess even if I teamed up with another top AI.
2025 as a pivotal year for AI development trends
I see 2025 as a key "make-or-break" year that will give strong signals about the future pace of AI development.
Indicator 1: AI Agents. Currently, AI agents that can book flights or shop for groceries are "kind of crap."
Bullish Signal: If these tools become genuinely helpful and widely used by the end of 2025, it suggests a very rapid AI future.
Bearish Signal: If they are still not working well, it suggests widespread labor impacts may be decades away, not years.
Indicator 2: Reasoning Skills. We've seen progress in AI for math and science, but it's unclear how well this will scale.
Bullish Signal: If we continue to see rapid improvements in these skills through 2025, we should expect even faster progress ahead.
Bearish Signal: If progress stalls, it suggests a slower road or that fundamental new innovations are needed.
AI will transform society more than the internet
There is a wide range of possible futures, we know AI will be a tremendously important technology.
I say this with all the precision of a forecaster and not as a hypester, that AI will be much more transformative than the internet.
Because of this, it is the responsibility of every citizen — not just policymakers — to become informed about AI's benefits, risks, and societal implications so we can be thoughtful about how it is deployed.
Why exactly is 2025 a pivotal year for the AI outlook ? Why is it not 2024 or 2026 or 2029 ?
2025 seems like a very arbitrary, hype pick.