Excellent article, just one thing I'd add. As pointed out to me by Josh You here, the 3x per year you cite is for pretraining algorithmic efficiency only, and post-training improvements mean that the current rate of algorithmic efficiency improvement is more like 10x/year: https://x.com/justjoshinyou13/status/1884295329266426255
Great writeup! I recommend other readers see Teortaxes’ response here: https://x.com/teortaxestex/status/1885695040825016664
As well as Epoch AI’s writeup on similar themes: https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1
Excellent article, just one thing I'd add. As pointed out to me by Josh You here, the 3x per year you cite is for pretraining algorithmic efficiency only, and post-training improvements mean that the current rate of algorithmic efficiency improvement is more like 10x/year: https://x.com/justjoshinyou13/status/1884295329266426255
The quote implies the 3x includes all "post-training enhancements", some of which will also be additional compute.