Perfil monitorado

Noam Brown

@polynoamial

Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o-series 🍓 reasoning models

Posts coletados: 19 posts
Última publicação: Último · 29 de jul, 21:11
Frequência: Sync · 15 min

29 jul · 21:11·ver no X
Autocompaction is extremely effective x.com/thsottiaux/sta…
25 jul · 00:35·ver no X
This was one of the bigger open questions in quantum cryptography x.com/42_gravity/sta…
20 jul · 14:42·ver no X
Long-running models can solve hard open-ended problems, but their persistence can create safety risks that shorter-horizon evaluations miss. We’re sharing what we learned from studying a long-running model, and how those findings are shaping our approach to evaluations, alignment, monitoring, and user control. t.co/yePIzJGsAU
16 jul · 11:29·ver no X
2023: LLMs struggle with 4th grade word problems 2024: LLMs can do high school math 2025: LLMs get a gold medal at the IMO Now, GPT-5.6 solves famous frontier math/stat questions. The IMO is today and 5.6 one-shotting a perfect score isn't even news. Where will we be next year? x.com/EdgarDobriban/…
10 jul · 15:26·ver no X
More test-time compute leads to greater intelligence. But as we push ttc from seconds to weeks, latency becomes a bottleneck. GPT-5.6 Sol Ultra scales parallel ttc. The time taken to generate a proof to a 50-year-old problem drops from perhaps a whole day to a single hour. x.com/__eknight__/st…
10 jul · 15:19·ver no X
GPT-5.6 Sol Ultra produced a proof of a 50 year old math conjecture. Unlike the Erdős Unit Distance Problem, this was done with a model publicly available *today*. I look forward to seeing what scientists and researchers are able to do with this model! x.com/__eknight__/st…
10 jul · 21:04·ver no X
There is also a Lean formalization of the proof: x.com/__eknight__/st…
6 jul · 19:30·ver no X
I'm at ICML this week and I'll be doing Q&A today (Tuesday) from 3-4pm at the @OpenAI booth with my reasoning research colleagues. Come by and ask us a question!
3 jul · 02:04·ver no X
Excellent work from @AISecurityInst investigating the impact of test-time compute budgets for frontier AI model evaluations. They make the case even more convincingly than I could! x.com/aisecurityinst…
26 jun · 14:37·ver no X
GPT-5.6 is incredibly strong and fast for coding. I hope we can make it available to everyone soon. x.com/openai/status/…
18 jun · 13:50·ver no X
When we announced @OpenAI o1 some researchers from other labs told me we made a strategic mistake and should have kept it secret so we could accelerate ourselves and pull farther ahead of the competition. Studies like these make me confident we made the right choice. x.com/openai/status/…
18 jun · 13:14·ver no X
Kevin is one of the best journalists covering AI. I especially appreciate how he takes the time to *use* frontier AI and deeply understand its capabilities and limitations. I’m excited to see what he does next! x.com/kevinroose/sta…
18 jun · 12:58·ver no X
I can think of no better person to help shape frontier AI policy than @deanwball. He has a clear understanding of where AI is headed. I look forward to working with him at @OpenAI! x.com/deanwball/stat…
17 jun · 21:21·ver no X
I'm always thrilled to have more Noams at @OpenAI, but I'm especially thrilled to welcome @NoamShazeer! x.com/NoamShazeer/st…
11 jun · 14:35·ver no X
I'm happy GPT-5.5 tops this eval I'm even happier it's still doing the best when measured vs tokens, cost, or wall-clock time! x.com/dawnsongtweets…
9 jun · 12:35·ver no X
We've known about LLM test-time compute scaling since @OpenAI o1. Yet 2 years later labs still report scalar evals for models; safety orgs are still surprised when a scaffold does better via 100x inference; and RSPs still ignore inference budget when deciding critical thresholds. x.com/polynoamial/st…
9 jun · 01:57·ver no X
x.com/i/article/2057…
28 mai · 06:39·ver no X
After AlphaGo, the skill of human Go players noticeably improved. I suspect we will see a similar pattern in math. x.com/wtgowers/statu…
28 mai · 06:41·ver no X
Source: henrikkarlsson.xyz/p/go