DeepSeek’s R1 has just rewritten the rulebook for open large language models. Published Wednesday in Nature, this Chinese mainland start-up achievement marks the first time a major LLM has passed a formal peer review—and it did so by charting its own path, independent of rival outputs.
Launched in January, R1 was engineered for reasoning-heavy tasks like mathematics and programming. By focusing on pure reinforcement learning, DeepSeek shaped its model to reward accurate answers over mimicking human examples. The result? A high-performance tool that competes with U.S. tech giants at a fraction of the usual cost.
As an open-weight model, R1 welcomes developers and enthusiasts worldwide—no paywall, no gates. It quickly climbed the charts on the Hugging Face platform, surpassing 10.9 million downloads and becoming one of the most accessed AI tools in the community.
Behind the scenes, DeepSeek disclosed that training R1 cost just $294,000, while building its foundational model ran about $6 million. Compare that to the tens of millions often spent by other labs, and R1’s efficiency becomes impossible to ignore.
One of R1’s secret sauces is group relative policy optimization, an approach that lets the model estimate and assess its own outputs instead of leaning on an auxiliary algorithm. This blend of auto-evaluation and trial-and-error pushed R1’s reasoning skills to new heights.
“This is a very welcome precedent,” said Lewis Tunstall, a machine-learning engineer at Hugging Face, who reviewed the study. “A public peer-review norm helps us spot risks and benchmarks alike, making the AI space safer and more transparent.”
Now, researchers around the globe are eyeing R1’s playbook to supercharge existing models and extend advanced reasoning beyond code and calculations. By setting this open, cost-cutting standard, R1 has kick-started a revolution in AI development that could reshape the next wave of innovation.
Reference(s):
DeepSeek's R1 sets benchmark as first peer-reviewed major AI LLM
cgtn.com