Dwarkesh Patel: Next-Gen AI Likely Built by Getting Things Done

Dwarkesh Patel, a 25-year-old tech podcast host included in the 2024 TIME100 AI, summarized the current bets of cutting-edge AI labs into the keyword RLVR (Reinforcement Learning with Verifiable Rewards), which involves allowing the model to trial-and-error in tasks where right or wrong can be automatically judged. However, Patel argues that relying solely on this "verifiable task training" is probably not enough for next-generation AI, because a task must also be "grindable"—meaning it can be repeatedly solved and rolled out on a large scale. He notes that code and mathematical problems are grindable due to their parallel, reproducible, and resettable training environments, explaining the rapid progress in those fields. In contrast, AI has made slower progress in "using a computer" because, while tasks like placing an order are verifiable, they are difficult to replicate and replay on a large scale, as real websites can identify bots and change states, and creating simulators remains a high-cost and low-scalability project.