Discussion about this post

User's avatar
Filip's avatar

Great writeup! I recommend other readers see Teortaxes’ response here: https://x.com/teortaxestex/status/1885695040825016664

As well as Epoch AI’s writeup on similar themes: https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1

Expand full comment
Tim Duffy's avatar

Excellent article, just one thing I'd add. As pointed out to me by Josh You here, the 3x per year you cite is for pretraining algorithmic efficiency only, and post-training improvements mean that the current rate of algorithmic efficiency improvement is more like 10x/year: https://x.com/justjoshinyou13/status/1884295329266426255

Expand full comment
1 more comment...

No posts