The next big breakthrough will be AIs learning on the job

Jun 26, 2026 · 19 min

AI recap

A preview of why “learning on the job” may be AI’s next leap

Based on the show notes, this episode explores a major AI research bet: systems that improve through real work rather than only pretraining. It appears to touch on grindability, RLVR, updating model weights, “dreaming,” and a look ahead to 2027.

*This is a preview based only on the published show notes and metadata, not a recap of the audio.* This episode looks aimed at listeners who want a compact tour of one possible next paradigm in AI: models that don’t just answer questions, but get better through doing useful tasks. The title and chapter list suggest the core argument is that the next big breakthrough may come from AIs effectively learning while they work. The timestamps hint at the structure. It starts with “the big research bet the labs are making,” then moves into a comparison between **grindability** and **verifiability**—suggesting the episode may argue that it’s not enough for tasks to be checkable; they also need to support repeated effort and improvement. From there, it appears to ask whether **RLVR alone** can generalize, before turning to the challenge of “getting the learning back to the weights,” which sounds like a key technical bottleneck. Two later sections stand out: **“Dreaming”** and **“What 2027 looks like.”** Based on the notes alone, those chapters suggest a mix of mechanism and forecasting: first, how systems might internally generate useful learning experiences, and second, what near-term progress could look like if this research direction works. If you’re interested in AI progress, training paradigms, or where labs may be placing their biggest bets, this episode seems likely to be a forward-looking listen. If you want a practical takeaway, the notes suggest the value here is less about product news and more about a conceptual framework for how future AI systems might improve.

About this episode

Read it <a target="_blank" href="https://www.dwarkesh.com/p/the-next-paradigm">here</a>.Thanks to Mercury for sponsoring this essay.<a target="_blank" href="https://mercury.com/">Mercury</a> has automated basically my entire bill pay process for my business. I just give contractors a dedicated email address, and when they send an invoice, Mercury automatically creates a draft payment for me to review. I no longer have to hunt through my inbox for invoices or deal with messy spreadsheets to track my bills. Mercury handles it all. Learn more at <a target="_blank" href="http://mercury.com">mercury.com</a>Timestamps:(00:00:00) – The big research bet the labs are making(00:02:12) – Grindability is just as important as verifiability(00:06:10) – Will RLVR alone generalize?(00:08:41) – Getting the learning back to the weights(00:15:22) – Dreaming(00:17:23) – What 2027 looks like Get full access to Dwarkesh Podcast at <a href="https://www.dwarkesh.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4">www.dwarkesh.com/subscribe</a>