I've found bridge (the card game) analysis to be another case where LLMs fail.
I've given 4o / o3 bridge questions. they failed miserably. 4o couldn't even get the cards right from a screenshot. o3 did. but his idea of how to play bridge was impossible.
we do have very decent bridge playing software though. since many many years.
Once a true reasoning function is added to a generative large language model, then we likely will be in big trouble. So perhaps we should be cautious about accepting maketing-speak allegations of "reasoning."
Why can’t they train it to get some “points” higher than zero if they admit they just don’t know an answer?
I've found bridge (the card game) analysis to be another case where LLMs fail.
I've given 4o / o3 bridge questions. they failed miserably. 4o couldn't even get the cards right from a screenshot. o3 did. but his idea of how to play bridge was impossible.
we do have very decent bridge playing software though. since many many years.
I remain concerned about calling that "think" function "reasoning." https://www.merriam-webster.com/dictionary/reasoning By comparison, see Jeremy Lichtman here: https://bestworld.net/videos
Also, that think function has been around for a while, not novel to 03. Here's an example dated March 31, 2025: https://bestworld.net/canada-election-march-31
Once a true reasoning function is added to a generative large language model, then we likely will be in big trouble. So perhaps we should be cautious about accepting maketing-speak allegations of "reasoning."