Ten Takes on DeepSeek

Feb 1

No, DeepSeek is not a $6M model nor a failure of US export controls. Let's restore some sanity to the DeepSeek takes.

3 Comments

Great writeup! I recommend other readers see Teortaxes’ response here: https://x.com/teortaxestex/status/1885695040825016664

As well as Epoch AI’s writeup on similar themes: https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1

Expand full comment

Tim Duffy

Feb 1

Excellent article, just one thing I'd add. As pointed out to me by Josh You here, the 3x per year you cite is for pretraining algorithmic efficiency only, and post-training improvements mean that the current rate of algorithmic efficiency improvement is more like 10x/year: https://x.com/justjoshinyou13/status/1884295329266426255

Expand full comment

Reply (1)

Joshua Blake

Feb 1

The quote implies the 3x includes all "post-training enhancements", some of which will also be additional compute.

Expand full comment

The Power Law

Ten Takes on DeepSeek