BEIJING, Sept 18 (Reuters) – Chinese AI startup DeepSeek has reignited the global AI debate with a surprising revelation. The company disclosed that it spent just $294,000 to train its R1 reasoning model, a figure that sharply contrasts with the hundreds of millions of dollars reportedly spent by U.S. rivals, such as OpenAI and Anthropic.
The disclosure came in a peer-reviewed paper published in Nature, marking the first time the Hangzhou-based company has shared a cost estimate for one of its large models.
A Rare Glimpse Into DeepSeek’s AI Operations
DeepSeek has been a relatively quiet player since its headline-making debut in January 2025, when it released lower-cost AI models that sent shockwaves through global markets. Investors panicked, fearing China-based innovation might disrupt the dominance of U.S. firms like Nvidia, Microsoft, and OpenAI.
Founder Liang Wenfeng and the team have kept a low profile since then, releasing only a handful of updates. This new paper, co-authored by Liang, offers a rare look into the company’s approach.
The report stated that R1 was trained using 512 Nvidia H800 chips, designed specifically for the Chinese market after U.S. export restrictions cut off access to Nvidia’s more advanced H100 and A100 chips.
Training ran for just 80 hours, a fraction of the time usually required for large-scale AI training, raising eyebrows across the global AI community.
Read More: Huawei Set to Challenge Nvidia with Next-Level AI Computing Power
Training Costs: DeepSeek vs. U.S. AI Leaders
The contrast is stark. In 2023, OpenAI CEO Sam Altman admitted that training foundation models cost “much more than $100 million,” though he didn’t provide specifics.
By comparison, DeepSeek’s $294,000 figure is microscopic. This efficiency, if accurate, could mark a turning point in the economics of AI.
However, some U.S. officials and companies are skeptical. Earlier reports suggested DeepSeek may have procured restricted H100 chips despite U.S. sanctions. Nvidia, meanwhile, has maintained that the company only used lawfully acquired H800s.
In its supplementary documents, DeepSeek admitted for the first time that it owns A100 GPUs. These, it said, were used only in early-stage experiments before the main R1 training began.
Read More: Social Media Feels ‘Fake’ Due to Bots, Says Sam Altman
Model Distillation: A Shortcut to Cheaper AI?
Another hot-button issue is model distillation, the process by which one AI model learns from another. Critics, including U.S. AI experts and White House advisers, accused DeepSeek of “distilling” OpenAI’s models into its own.
DeepSeek defended the practice, saying distillation improves performance while keeping costs manageable. It allows smaller teams to leverage knowledge embedded in earlier, larger models without replicating the enormous training costs.
In January, the company acknowledged that it had used Meta’s open-source Llama model for some of its distilled versions.
In the Nature paper, DeepSeek also revealed that training data for its V3 model included web pages containing AI-generated answers—some likely produced by OpenAI systems. The company insisted this was incidental, not intentional.
Why This Matters
DeepSeek’s disclosure has reignited questions about China’s role in the AI race. If companies there can build competitive models at a fraction of the cost, it could reshape the balance of power in AI development.
For businesses and governments, this raises two key questions:
-
Will low-cost AI models threaten U.S. dominance?
-
Or will concerns over transparency, intellectual property, and security slow adoption of Chinese AI abroad?
For now, DeepSeek’s R1 cost disclosure leaves more questions than answers. However, one thing is certain: the global race to build smarter, more affordable AI has entered a new phase.
Read More: China Blocks Nvidia AI Chip Sales Amid US Tech Tensions
FAQs
1. What is DeepSeek’s R1 model?
The R1 is a reasoning-focused AI model developed by DeepSeek, designed to perform advanced problem-solving and decision-making tasks.
2. How much did it cost to train the R1 model?
According to DeepSeek’s Nature paper, the total cost was $294,000, significantly lower than U.S. rivals’ training costs.
3. What hardware was used for training?
The model was trained on 512 Nvidia H800 GPUs for 80 hours, though A100 GPUs were used in preparatory stages.
4. Why is model distillation controversial?
Distillation allows one AI model to learn from another, raising concerns about intellectual property if knowledge from proprietary systems is indirectly used.
5. Why does this matter for the global AI race?
If DeepSeek’s low-cost training is scalable, it could challenge the dominance of U.S. AI firms and accelerate China’s influence in the AI industry.



