Pacific Money

China’s AI Shock? What DeepSeek Disrupts (and Doesn’t)

Recent Features

Pacific Money | Economy | East Asia

China’s AI Shock? What DeepSeek Disrupts (and Doesn’t)

DeepSeek’s success is not based on outperforming its U.S. counterparts, but on delivering similar results at significantly lower costs. The AI price war has begun.

China’s AI Shock? What DeepSeek Disrupts (and Doesn’t)
Credit: Depositphotos

In December 2024, the Hangzhou-based AI company DeepSeek released its V3 model, igniting a firestorm of debate. The result has been dubbed “China’s AI Shock.”

DeepSeek-V3’s comparable performance to its U.S. counterparts such as GPT-4 and Claude 3 at lower costs casts doubt on the U.S. dominance over AI capabilities, undergirded by the United States’ current export control policy targeting advanced chips. It also called into question the entrenched industry paradigm, which prioritizes heavy hardware investments in computing power. To echo U.S. President Donald Trump’s remarks, the emergence of DeepSeek represents not just “a wake-up call” for the tech industry but also a critical juncture for the United States and its allies to reassess their technology policy strategies. 

What, then, does DeepSeek seem to have disrupted? The cost efficiencies claimed by DeepSeek for its V3 model are striking: its total training cost is only $5.576 million, a mere 5.5 percent of the cost for GPT-4, which stands at $100 million. The training was completed using 2,048 NVIDIA GPUs, achieving resource efficiency eight times greater than U.S. companies, which typically require 16,000 GPUs. This was accomplished using the less advanced H800 GPUs instead of the superior H100, yet DeepSeek delivered comparable performance.

DeepSeek’s low-cost model thus challenges the conventional wisdom that the sophistication of large models equates to massive computing power accumulation. This development potentially breaks the dependency on the U.S. AI chips amidst semiconductor embargoes, thereby raising questions about the traditional policies centered around high-end computing power control. 

Unclear Costs

There are several aspects of discussion surrounding the DeepSeek-V3 model that require further clarification, however. The V3 model is on par with GPT-4, whereas the R1 model, released later in January 2025, corresponds to OpenAI’s advanced model o1. The reported cost of $5.576 million specifically pertains to DeepSeek-V3, not the R1 model. This figure does not include the total training costs, as it excludes expenses related to architecture development, data, and prior research.

The V3 model was trained using datasets generated by an internal version of the R1 model before its official release. This approach aimed to leverage the high accuracy of R1-generated reasoning data, combining with the clarity and conciseness of regularly formatted data. But the documentation of these associated costs remains undisclosed, particularly regarding how the expenses for data and architecture development from R1 are integrated into the overall costs of V3.

Incremental Innovation, Not Disruption

From a technological competition standpoint, DeepSeek’s advancements in foundational LLM technologies like Multi-head Latent Attention (MLA) and Mixture-of-Experts (MoE) demonstrate efficiency improvements. But these advancements should not cause excessive concern among policymakers, as these technologies are not tightly guarded secrets.

That said, there is genuine innovation behind the current excitement surrounding DeepSeek’s achievements. MLA technology enhances traditional attention mechanisms by using low-rank compression of key and value matrices. This drastically reduces the Key-Value (KV) cache size, resulting in a 6.3-fold decrease in memory usage compared to standard Multi-Head Attention (MHA) structures, thereby lowering both training and inference costs. DeepSeek also appears to be the first company to successfully deploy a large-scale sparse MoE model, showcasing their ability to boost model efficiency and reduce communication costs through expert balancing techniques.

While these developments are unusual, they may just represent iterative enhancements in the field of AI rather than a disruptive leap that could shift the overall balance of technological power.

Indeed, neither the DeepSeek-V3 nor the R1 model represents the pinnacle of cutting-edge technology. Their advantage stems from delivering performance comparable to their U.S. counterparts but at significantly lower costs. In this regard, it is natural to question the cost-efficiency of the seemingly extravagant development approach adopted by the U.S. tech industry to equate sheer computing power with the sophistication of AI models. 

Yet, this type of cost-effective innovation is often not the focus of those at the technological forefront, equipped with abundant, advanced resources. The initial iteration of any innovation typically incurs high expenses. However, as cost-cutting innovations emerge, they drive down expenses, allowing latecomers, particularly in regions like China, to quickly adopt these advancements and catch up with leaders at a reduced cost.

Limits of U.S. Chip Sanctions

DeepSeek’s approach, showcasing the latecomer advantage through reduced training costs, has sparked a debate about the real need for extensive computing power in AI models. Critics question whether China really needs to depend on U.S. advanced chips, challenging the high-end computing-centric policy that guides Washington’s current semiconductor export control scheme. If performance parity can be achieved with lower-tier chips, then the premium for higher-tier chips might be unjustified. 

This might be a misunderstanding, however, as higher-tier chips generally offer greater efficiency. In economic terms, it would be impractical for any China-based companies like DeepSeek to avoid using more advanced chips if they were accessible.

Furthermore, the reduction in training costs potentially lowering user fees signals a decrease in the financial barriers to AI service adoption. The global AI industry is likely to see an increase, rather than a decrease, in demand for computing power as competition among services intensifies. For China to keep up in the AI race, it will need a continuous supply of more sophisticated, high-end chips.

In these regards, the Scaling Law still holds true. DeepSeek has just demonstrated that comparable outcomes can be achieved with less capital investment – in mathematical terms at least. On the hardware front, this translates to more efficient performance with fewer resources, which is beneficial for the overall AI industry. And if DeepSeek’s cost-efficiency disruption proves to be feasible, there is no reason why U.S. AI companies cannot adapt and keep pace.

Exporting China’s AI Pricing Race

What, then, should the United States and its allies truly be concerned about? The key question is: What if Chinese AI services can deliver performance comparable to their American counterparts at lower prices? DeepSeek exemplifies a development scenario that policymakers should closely monitor – China is initiating a global price war in AI services, a battle that has already been underway domestically. 

The actual training costs of DeepSeek-V3 and R1 models remain unclear. And the public knows very little about whether they achieve such efficiency using only lower-tier H800 GPUs. The practicality of these claims is yet to be determined. But it is crucial here not to confuse cost with price. The exact expenditures by DeepSeek are uncertain, and it is not clear whether the company has used American models to train its own in ways that might violate terms of service. One thing we know for sure is that DeepSeek is offering its AI services at exceptionally low prices.

For example, DeepSeek-R1 charges just $0.14 per million input tokens (when using cached data) and $2.19 per million output tokens. In contrast, OpenAI’s o1 model costs $1.25 per million cached input tokens and $10.00 per million output tokens. This means DeepSeek-R1 is nearly nine times cheaper for input tokens and about four and a half times cheaper for output tokens compared to OpenAI’s o1. 

DeepSeek’s competitive pricing, in a sense, can be seen as an international projection of China’s 2024 domestic AI service price war. For instance, Alibaba reduced the price of its Qwen-Long by 97 percent in May last year and further decreased the cost of its visual language model, Qwen-VL, by 85 percent in December. However, unlike DeepSeek, many Chinese AI companies have lowered their prices because their models lack competitiveness, making it difficult to rival U.S. counterparts. Even with these price cuts, attracting high-quality customers remains a challenge. In contrast, DeepSeek offers performance comparable to competing products, making its pricing genuinely attractive.

For democratic allies, the rise of Chinese AI services that are both affordable and highly effective raises two primary strategic concerns, especially in light of recent sovereign AI initiatives. First, there are national security risks, particularly related to data privacy and the potential manipulation of outcomes. Second, China’s aggressive pricing in AI services poses a threat to the development of AI industries in other countries, resembling the dumping practices previously seen with solar panels and electric vehicles in Europe and America.

If this scenario unfolds, one must recognize that China’s AI price advantage is unlikely solely driven by reduced training costs, which other companies may soon adopt. Attention should also be given to non-market mechanisms, such as government subsidies, which could provide China with a competitive edge in the future.

Dreaming of a career in the Asia-Pacific?
Try The Diplomat's jobs board.
Find your Asia-Pacific job