Discussion about this post

User's avatar
kenakofer's avatar

What do you think of the benchmarks SWE-bench or SWE-bench verified for tracking real world software engineering skill? Those scores are rising, but I'm not sure how easy it is to game them.

Expand full comment
Carolyn Meinel's avatar

Has anyone ever tested AI programming at tasks that require numerical analysis? Example: systems of nonlinear differential equations, in which the number of iterations need to be limited as a function of how quickly results begin diverging, or whether convergence is rooted in reality?

Expand full comment
1 more comment...

No posts