Perfil monitorado

John Schulman

@johnschulman2

Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music

Posts coletados: 14 posts
Última publicação: Último · 10 de jul, 14:16
Frequência: Sync · 15 min

10 jul · 14:16·ver no X
We started Thinking Machines a year and a half ago with a couple of instincts: that people should have much more ability to customize models and do research on them, and that even as AI becomes more autonomous, there's a lot more to build to make humans and AIs work well together. A lot has happened since then, especially the massive progress in agents, so we wanted to revisit those instincts in light of everything we've learned, argue about them, and write down what we actually believe now. This is where we landed after a lot of debate. I'm happy with it!
30 jun · 22:42·ver no X
hiring post-training hackers to make Tinker even better! x.com/tinkerapi/stat…
30 jun · 22:36·ver no X
People sometimes ask why fine-tune when general-purpose models keep getting better. Bridgewater's work is a good reminder that with the right data -- here, expert judgements -- you can beat prompting-only approaches by a lot. @ddkang and the Bridgewater AIA Labs team are great -- glad to see them sharing this.
17 jun · 21:54·ver no X
PPO had a second wave in the LLM era for reasons unanticipated by the original paper - the importance-ratio objective fixes biases from numeric error, async training, and forward pass noise - the clipping objective affects entropy through a mechanism that we didn't know about at the time of publication (DAPO, t.co/sBo9DeFS5Y)
17 jun · 12:11·ver no X
PPO: rejected from NIPS 2017
10 jun · 20:00·ver no X
Looking forward to what comes out of Geoffrey's new alignment org. His 2018 paper on AI safety via debate is one of my all-time favorites: an elegant framing of the scalable oversight problem, way ahead of its time. x.com/geoffreyirving…
31 mai · 14:56·ver no X
Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things
29 mai · 13:07·ver no X
Glad to see this -- renderers are a foundational component of the LLM stack. Renderers map between tokens and messages, which are invariant to tokenizer and formatting details. Most APIs, datasets, and RL environments are defined in terms of messages. Getting the details wrong x.com/PrimeIntellect…
28 mai · 12:46·ver no X
Glad to be advising refine.ink, which uses AI to help authors and reviewers do deeper, more thorough analysis than unaided humans could practically do. Seems like a very positive direction for AI in science. x.com/ben_golub/stat…
11 mai · 17:48·ver no X
Sharing our work on full-duplex multimodal models -- real-time interaction that's natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to x.com/thinkymachines…
11 mai · 17:50·ver no X
Seeing the demos come together over the last week has been awesome -- so many things that previously required a special-purpose model (e.g. real-time translation, event detection in video) turn out to be zero-shot instruction following once you have a general-purpose model with
13 abr · 14:38·ver no X
Luke and Rudolf's writing on keeping humans central in an AI-powered world sparked a lot of discussion at Thinking Machines. For me, it captured some things I'd been thinking about but hadn't put as clearly. The more I got to know them and learned about their work, the more I x.com/WorkshopLabs/s…
26 mar · 17:09·ver no X
Great work by Chroma training a search agent with SoTA efficiency. Lots of cool details: a prune tool for editing context mid-search, a synthetic data pipeline with verification steps, and a curriculum that shifts from recall to precision. Trained with Tinker! x.com/trychroma/stat…
20 mar · 15:39·ver no X
Models that are great at calibrated predictions will be transformative for decision making. Excited about Mantic's work and proud they're using Tinker. Their new blog post digs into their methodology and findings. x.com/tshevl/status/…