AI security is important practice for when stakes go up
Why today's safeguards matter for future AI capabilities
The latest Grok 3 model from xAI recently demonstrated a concerning capability: generating detailed instructions for chemical weapons production1, complete with supplier lists and workarounds for regulatory controls.
While this incident isn't cause for immediate alarm, it serves as a crucial wake-up call for the broader challenges of AI security. Currently, a lot of this information can already be found scattered around the internet, and Grok 3’s plans likely aren’t detailed enough to actually fill in all the gaps in the complex task of assembling and deploying these weapons without detection2. And it sounds like xAI is taking this incident seriously and updating their AI security guidelines, so hopefully this will be an example of how good external red teaming leads to better security outcomes.
But here's the thing: we're currently in a practice round for when AI becomes genuinely dangerous. It’s important to quickly start getting things right. The consequences of getting this wrong right now are minimal, but if AI companies are right about AIs becoming rapidly more capable, we will need to make sure security goes up to match.
AI is getting more capable
It’s not just the biosafety domain where there are possible future concerns. Google's Project Zero found AI could discover a novel zero-day vulnerability in a widely-used open source project that had previously been undiscovered by humans despite significant effort to identify vulnerabilities. While it matters that the “good guys” found this first, we cannot rely on this. A hostile nation could use AI to rapidly find vulnerabilities in power grids, voting systems, or military networks.
And we certainly know that AI is rapidly improving. MMLU, a test for AIs of knowledge across 57 subjects, started in 2021 with scores far below human experts. Today, MMLU involves AI scoring at levels matching human experts.
Fed up by MMLU being too easy, David Rein et. al. developed GPQA, which stands for “Google Proof Q&A”, a quiz of science questions in biology, physics, and chemistry designed to be so hard that it can only be answered with significant domain expertise and cannot be solved even by Google searching. PhD experts typically score only 80% on questions within their own field and non-experts even with full web access barely exceed 30% accuracy. In short, if you see an AI answer GPQA correctly, it means the system is handling extremely challenging “Google-proof” problems that require genuine deep reasoning rather than surface-level lookup.
In July 2023, according to data from EpochAI, GPT-4 was able to get about 30% on this test, matching non-experts. But a mere 18 months later, AI systems like OpenAI o1 already perform — and possibly exceed — expert human level.
Notably, these improvements in expertise span not only math and humanities, but also complex topics like virology and hacking.
And this is just the beginning. Combined with major companies announcing large AI investments (Stargate $100B/yr, Microsoft $80B/yr, Amazon $100B/yr, Google $75B/yr, and Meta at $65B/yr), we will continue to see strong improvements out of future models.
Where this might go next
Anthropic recently introduced their 'AI Safety Levels' (ASL) framework — a classification system that categorizes risks from AI models based on AI capabilities. In their latest Claude 3.7 model system card, Anthropic stated that their next model has “a substantial probability” of meeting their “ASL-3” threshold.
What does ASL-3 actually mean? All of today’s models, including Claude 3.7 and Grok 3, are “ASL-2”, which means they might help with isolated harmful tasks. ASL-3 is the next big step up — models that substantially increase catastrophic misuse risk. In practical terms, Anthropic says that ASL-3 systems will be advanced enough to potentially enable highly sophisticated cyberattacks, automate complex parts of bioweapon development, or facilitate other activities that could cause large-scale harm. ASL-3 systems could effectively help threat actors coordinate across multiple domains to overcome existing security barriers in a way that current models cannot.
No currently deployed commercial AI system has reached ASL-3 status, according to Anthropic's assessment. However, the timeline for crossing this threshold appears to be accelerating - from “years away” in previous assessments to “the next model” in current projections.
This means that at some point in the not too distant future, advanced AIs may become capable of giving instructions for a bioweapon. This may provide would-be terrorists a step-by-step plan plus custom troubleshooting support, allowing them to get much further than if they had to learn on their own. It would be the difference between being able to consult a variety of textbooks on a problem and having an on-call tutor. Once we get to this level of capabilities, we need to be ready for there to be a much larger number of threat actors that can do large-scale harm, and be capable of containing the danger while still allowing the positive aspects of AI to go forward unhindered.
What we can do now
We’re lucky right now to be playing on an “AI safety” tutorial level with real but manageable stakes. Beyond the immediate concerns about terrorism, this challenge has broader implications. The AI security challenge isn't just about safety — it's fundamentally about American leadership and national security. While the US currently maintains a competitive edge in AI development, China's military-civil fusion strategy has explicitly targeted AI as a domain for strategic advantage. China and North Korea could attempt to steal or hack American AI and use it to threaten the US, and we need to be ready.
Companies like Anthropic and OpenAI have been rising to this challenge, implementing increasingly sophisticated red-teaming protocols and testing frameworks to contain exploits before deployment. Additionally, techniques like Constitutional AI and Deliberative Alignment have shown some effectiveness in reducing the possibility of harmful outputs without sacrificing model performance. However, we need to be careful to ensure that these protocols are strong enough to withstand people who would seek to use AIs to do large-scale harm, and that other companies like xAI and DeepSeek are developing similar protocols. And we need security to be more robust to all the various ways models can be broken, stolen, or hijacked. We will only be as secure as the weakest link and there are still a lot of weak links we don’t know how to protect.
According to Anthropic, “ASL-3” models require stronger safeguards than we have today — robust misuse prevention and enhanced security. More R&D and investment will be needed to build these. This can be done with a pragmatic approach that leverages America's private sector innovation, further developing industry standards and norms with targeted government coordination. We will need stronger norms and stronger investments in safety, or it’s possible that capabilities could outmatch our abilities to properly safeguard them. This isn't about restricting technology — it's about hardening our systems against potential exploitation by adversaries.
The bottom line is that ASL-3 represents a critical threshold where AI safety shifts from being primarily a corporate responsibility to becoming a legitimate national security concern. At this level, the security protocols developed by individual companies may no longer be sufficient without increased coordination between industry and government.
To be clear, current AI is definitely far from an existential threat. But treating it like one might be exactly what we need to become ready. The real question isn't whether today's AI is dangerous — it's whether we're using this practice round effectively enough to be ready for what comes next.
The original tweet mentions “chemical weapons production” but from context this tweet seems to be about botulism. This is a biotoxin, not a chemical weapon, and is easily available online. However, I will keep with calling this a chemical weapon given that is what the original tweet states.
Grok 3 isn’t an imminent threat but I must admit it isn’t perfectly clear whether it is able to provide an “uplift” (or increase in ability) to a threat actor relative to existing 2023-era resources (e.g., Google and textbooks). This isn’t a clear binary and some of it depends on the identity of the threat actor and their existing knowledge. But also the precise testing on this is not provided by xAI. Currently, I am considering Grok 3 to not provide non-trivial uplift as Anthropic did not place the ASL-3 designation on Claude 3.7, and I would think Grok 3 and Claude 3.7 are of comparable ability.




As always, you have done a great job of basing your opinions on facts.
I'm going to veer into the forecasting aspect of your analysis via two people who have often gotten things right under chaotic circumstances: Nobel prize winner Paul Krugman, and famed short seller Jim Chanos. Bottom line: the tech bros powering unprecedented infusions of capital into GenAI are likely about to get whacked hard and soon.
From behind the Krugman substack's paywall. interview with famed short seller Jim Chanos: Well, so I've been a bear on the data centers, the old data center companies, because now the new guys are building bigger and better and faster ones and the old ones are obsolete. But the problem is that it's not so much the data centers that depreciate, they do because of the air conditioning and all the guts of them. It's the chips that you're paying $50,000 a piece for that are being leapfrogged by the same company, And so the question is how fast are you depreciating and are we gonna get into the realm of accounting chicanery? How fast are you depreciating these hundreds of billions of dollars if you have to keep re-upping newer and more expensive chips? So, you know, that's where the rubber hits the road and the numbers are getting big enough, that in a couple of years, those are gonna be uncomfortable questions.
From Chanos in the free version of Krugman's substack: As Bethany McClean, the journalist who broke the story on fraud at Enron, pointed out, the key problem was that “fracked oil wells show a steep decline rate: The amount of oil they produce in the second year is drastically smaller than the amount produced in the first year” — a fact not reflected in their profit statements. (Chanos called it an “accounting scam.”) Chesapeake Energy, one of the leaders in the field, eventually filed for bankruptcy, but not before Aubrey McClendon, its swashbuckling former chief executive, had been forced out. As McClean noted, McClendon died in 2016... when his car slammed into a concrete bridge on Midwest Boulevard in Oklahoma City. He was speeding, wasn’t wearing a seatbelt and didn’t appear to make any effort to avoid the collision.
From Krugman in his Substack: So how does this relate to AI? Chanos pointed to the huge capital spending that big tech companies are now making on AI: The numbers are now getting so large from just even a couple years ago that the returns on invested capital are really now beginning to turn down pretty hard for these companies.