
@polynoamial
Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o-series 🍓 reasoning models
I'm happy GPT-5.5 tops this eval I'm even happier it's still doing the best when measured vs tokens, cost, or wall-clock time! x.com/dawnsongtweets…
We've known about LLM test-time compute scaling since @OpenAI o1. Yet 2 years later labs still report scalar evals for models; safety orgs are still surprised when a scaffold does better via 100x inference; and RSPs still ignore inference budget when deciding critical thresholds. x.com/polynoamial/st…
After AlphaGo, the skill of human Go players noticeably improved. I suspect we will see a similar pattern in math. x.com/wtgowers/statu…
Source: henrikkarlsson.xyz/p/go