In case you haven’t heard, DeepSeek, a Chinese company, released a pretty darn good AI at rock bottom prices. “DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.” DeepSeek is also much cheaper to use.
From an economic point of view, this sort of advance was entirely predictable, though a bit surprising just how quickly it came.
AI and machine learning are throw-spaghetti-at-the wall computer programming. Previous generations of computer programming were really careful about efficient algorithms, using as little memory and compute power as possible. Memory and compute power became really cheap, so it became feasible to just throw computer power at a fitting and forecasting problem. That proved fantastically successful, in that these highly inefficient models times huge computational capacity are able to do amazing things. But now that we’re spending in the hundreds of millions per training, all that free computing isn’t so free any more. It was obvious that a huge amount of attention would pour into the next generation of faster computing, and into more efficient algorithms. After all, the human brain does it with about 20 watts. It seems like DeepSeek did it with more efficient algorithms, so much so that it could use less powerful chips. Faster computing will be harder, but it will come next. The rewards to faster computing had fallen for a decade or so. They are back on again.
The first iteration of anything is fantastically expensive. Then the cost cutting gets to work. From Apollo to Starship. In two years.
Who gains and loses? Well, obviously, those hundreds of millions of dollars of investment in the first generation of AI training are going to get undercut by the new low cost competitors. Thanks. And premium prices for NVIDIA chips won’t last either. Predictably, tech stocks tumbled.
“Nvidia fell nearly 13%, wiping out more than $400 billion in market value and weighing on the tech-heavy Nasdaq Composite.” It seems the market can add and subtract, contra the rhetoric of much contemporary finance.
It will be tempting among commentators to add up this loss of stock market “wealth” and say it’s a terrible thing. But no. The winners will not be the producers of AI, which looks to become a marginal cost commodity with remarkable speed, but the users of AI. And the benefit will be on quantities, not on monopoly or fixed-cost rents. (I say this because the stocks of companies that can potentially use cheap AI did not jump up. Maybe they will, once more people figure out how to use cheap AI to make profits for a while. Maybe those companies haven’t been founded yet. That is now the Wild West.) The stock market measures the present value of profits, not the present value of social benefits. The profit, and ultimate benefit, of railroads was not so much in the railroad itself, but in the wheat fields of Kansas.
“China, China, China,” the doomsayers fresh off the TikTok and Nippon Steel bans are sure to warn. And indeed “Users of DeepSeek’s latest flagship model, called V3 and released in December, have noticed that it refuses to answer sensitive political questions about China and leader Xi Jinping. … ‘The only strike against it is some half-baked PRC censorship,’” Shouldn’t we ban it along with all things China before it poisons young minds? Or worse. AI lives on data, someone must be worrying that DeepMind is collecting our chatbot interactions for further training, and somehow the CCP will undermine America as TikTok does with its data on which 13 year olds watch Taylor Swift videos.
DeepSeek (unlike TikTok) amazingly but brilliantly revealed the source code. I’m not sure how they plan to make money, but thank you. Brilliant because it undermines the case that would surely come for banning it from the US. It would seem one can simply locate the CCP censorship bits and remove them. The WSJ thinks so “this could be removed because other developers can freely modify the code.”
More deeply, though, imagine the world some of the ascendant right wants to take us to, in which anything “China” is banned from the US. We would not see DeepSeek. Competition is the main source of efficiency, and competition needs to be global in the world of 100 million dollar upfront costs.
Update:
Commenter Di Wang catches an important point. DeepSeek could increase the demand for chips, even AI chips. If demand for training AI expands more than proportionally to the cost reduction, an AI that individually needs less chips could lead to a market that uses more chips. (I avoided the word “elasticity,” but you know where to put it.)
I didn’t think it needed saying, but the story has obvious implications for the usual industrial-security strategy, throw hundreds of billions of dollars of subsidies at existing technologies and bar international competition.
A great explainer here from Morgan Brown
1/ First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory.
2/ DeepSeek just showed up and said "LOL what if we did this for $5M instead?" And they didn't just talk - they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.
3/ How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like "what if we just used 8? It's still accurate enough!" Boom - 75% less memory needed.
4/ Then there's their "multi-token" system. Normal AI reads like a first-grader: "The... cat... sat..." DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you're processing billions of words, this MATTERS.
5/ But here's the really clever bit: They built an "expert system." Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.
6/ Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task.
7/ The results are mind-blowing: - Training cost: $100M → $5M - GPUs needed: 100,000 → 2,000 - API costs: 95% cheaper - Can run on gaming GPUs instead of data center hardware
Don't ban....COMPETE. One nice thing I saw on X yesterday was a lot of technerd tweeting about "It's on." One other nice thing, Biden and the Democrats wanted to regulate AI. Trump is hands off. The announcement on Stargate that he hosted was predictably pitchforked by the left wing press, but to people who work in AI it wasn't that big of a deal either since it over promised on not only the money but the capability. Yesterday's Chinese release makes Stargate moot.
Yes, they will censor. Try asking about a famous photo with a man standing in front of a tank. It's China. The interesting thing about DeepSeek is that it sure seems like it was built not only using a decentralized management team, but decentralized logic to make it work. China generally abhors decentralization. It is a command and control place.
After you brush off the initial fear, realize that the US has more compute power and a larger volume of brighter minds working on AI. Given the chance to compete, the capitalistic society should beat the communist society.
Instead of being fearful, we ought to be thankful for creative destruction. Compete, compete, compete. It's better for all of us.
I am sceptical about this. Neural network training is an optimization problem. You try to find a global maximum or minimum of a cost function. There are numerous algorithms to do this, e.g. genetic algorithms, simulated annealing, and others. So, if true, they must have found a very efficient optimization algorithm. But this is hard, as the optimization algorithms are very well researched. I would assign ~2% probability that what they claim is true.