007 lessons for India from the new bond of the tech world, Deepseek

Yesterday came a big jolt which shook the tech world and the stock markets. A Chinese research lab called Deepseek, unveiled a free, open-source AI model that beats the most powerful ones there are on the market. In words of C. Raja Mohan, Deepseek reminds us of the Soviet Union’s launch of Sputnik, the Earth’s first artificial satellite in October 1957.

While the novelty of the announcement was itself startling, also notable was the fraction of the cost with which it was developed, especially in an environment where China has been denied access to cutting-edge technology from the USA. Deepseek spent just $ 5.6 million dollars to build its version 3.0.

Here are some critical lessons for us all to ponder.

Profile Image
Hrridaysh Deshpande
January 29, 2025 6:32 AM
Top Image

“OH: I can’t believe ChatGPT lost its job to AI” – Zachary Jean Paradis on FB

“Chinese didn’t just surprise the US with better, faster and cheaper AI, they have humiliated India who prides itself in Jugaad!” – Abhijit Thosar on FB

Yesterday came a big jolt which shook the tech world and the stock markets. We all read about the news that shocked us all. Nvidia lost $589 billion in a single day due to this announcement. It was not a new launch from one of those established names like Open AI or Anthropic. It came out all the way from the east, from China from a research lab called Deepseek. It is a free, open-source AI model that beats the most powerful ones there are on the market.

While the novelty of the announcement was itself startling, also notable was the fraction of the cost with which it was developed, especially in an environment where China has been denied access to cutting-edge technology from the USA. Deepseek spent just $ 5.6 million dollars to build its version 3.0.

Compared with counterparts they spent billions and continue to do so. Recently trumpeted Stargate announcement began with a $ 100 billion dollar investment from SoftBank, OpenAI, Oracle and likes. The Indian government has lined up Rs. 10, 371 crores over next five years to install 10, 000 specialized AI processors. Open AI raised $ 10 billion in 2023, and it burned that money in 18 months. It raised another $ 6.6 billion and borrowed $ 4 billion. By 2029, Open AI intends to spend $ 37.5 billion a year. Google expects its capital expenditures in 2024 to soar to over $ 50 billion. Microsoft not to be left behind has taken a bet of $ 13 billion on Open AI.

Compared to this incredible spread of money to get ahead in the AI race, a scrappy model of Deepseek stands out. On performance, it outperforms its competitors in terms of inference time compute and its efficiency. It is ahead of Antropic’s Claude Sonnet 3.5, Meta’s Llama, and Open AI’s GPT 4.0 on wide ranging tests and in terms of accuracy. It aces tests like 500 math problems, math evaluations, coding, competitions, and spotting and fixing bugs in code. It has a rich reasoning model called R1 which outperforms Open AI’s o1. For AI reasoning is key, the model that thinks before it generates a response, going beyond pattern recognition to analyze, draw logical conclusions, and solve complex problems. For now, o1 reasoning model is still cutting edge. But for how long?

How did they assemble all the hardware? How did they assemble the data to do all this? These are questions that everyone is now looking answers to. What Deepseek did is to use a process called distillation which is basically use a very large model to help the small model to get smart.

To achieve this much and under the shadow of the semiconductor restrictions from the US Government is remarkable. China is unable to receive the powerful AI chips like the the Nvidia’s H-100 GPU’s. These are essential to build a competitive AI model. They achieved their model using Nvidia’s less performing H-800’s. They took whatever hardware that was available to them and used it efficiently. They used available data sets, applied innovative tweaks, and leveraged existing models. They used approximately 2048 H800 GPU’s which is equivalent to like 1000 to 1500 H 1000 GPU’s. That is 20 to 30x lower computing power than what is required for GPT 4.

Their detractors might say they copied. In fact, Sam Altman in a thinly veiled comment said so. He said, it is easy to copy but to make something new with all its risk is difficult. In fact, Deepseek’s model has an identity crisis. When asked, what model are you? Deepseek responds I am an AI language model called ChatGPT, developed by Open AI, specifically I am based on the GPT 4 architecture.

The reality of the tech industry is that everyone copies. The transformer model was brought in by Google and they built the first large language model which they did not productize. Open AI did a similar thing but in a productized manner. Even though, Deepseek leveraged OpenAI’s existing outputs and architecture principles, they introduced their own enhancements. They came up with several clever solutions. They trained a mixture of experts’ model which is not easy to train. People find it difficult to catch up with the MOE architecture, as it causes irregular spikes. They find numerics are not stable which needs to restart training checkpoint again. This instability requires a lot of infrastructure. Deepseek understood this well They came up with neat solutions to balance that without adding additional hacks. They figured out floating point 8-bit training for some of the numerics. They came up with a lot of numeric stability stuff due to hardware limitations they had inherited. Deepseek needs only 2.788M H800 GPU hours for its full training. They published a paper, made it available to the world in form of open source. In their paper, they report that most of the training was stable which means that they can rerun the same tests on more data or better data.

The founder of Deepseek is Liang Wenfung. Their mission as on their website is “Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism”. It’s not Deepseek alone. There are other Chinese AI models which are also in the making, and which are making serious progress despite having the same challenges of having limited technology access. Kai Fu Lee, former leader of Googles operations in China through his start-up ZeroOneDot AI has built a $ 1 billion dollar start-up in 8 months. The Chinese companies are training their models in less than $ 5 million dollars whereas the cost of training GPT 4.0 was $ 80 to $ 100 million dollars. Alibaba’s ‘Qwen’ has costed 85% less on its large language model.

The low cost of the Chinese models is another disruptor. Kai Fu Lee of ZeroOneDot says that “our inference cost is 10 cents per million tokens which is 1/30’th of what a comparable model charges”. Open AI charges $ 4.40 per million tokens. Compared to all others, this is a model that is 10x cheaper than GPT 4 and 15x cheaper than Sonnet.

When Open AI released ChatGPT to the world in November 2022 it was unprecedented and uncontested. In straight 26 months all that has become less significant. Now, the company faces not only the international competition from Chinese models, but fierce domestic competition from Google’s Gemini, Anthtropic’s Claude, and Meta’s Open-source Llama model.

The lessons from this groundbreaking event are many.

  1. There is nothing permanent – it once again underlines the ultra competitive nature of the world where success today is no assurance of for tomorrow, leadership today is not permanent, and anyone could come up and unseat you. Accepting status-quo is what is seen commonly where businesses tend to remain in their own world hoping that the good going will continue as it is. We disguise complacency with stability. The level of comfort due to past success, good business results and the mindset of why change if it’s not broken keeps us going and happy with some incremental innovation along the way before someone comes up and brings us out of the slumber.
  2. Innovation is a serious practice – Innovation is the most widely used word in the corporate vocabulary and yet like a particular fashion, it goes out and comes back in. When the markets are buzzing, the capital is earning better returns innovation goes out and whenever there is a downward trend, innovation springs back to the attention. This ad-hoc treatment towards managing innovation, when necessary, would never yield the desired outcomes. Innovation and growth ought to be a part of the corporate planning process.
  3. Value of Collaboration – Interestingly, coming from China, these models are open source. Being open source, one can adopt it. It once again underlines the power of collaboration and decentralization as more and more organizations embrace it to drive innovation faster and more efficiently compared to proprietary closed ecosystems. The widespread open-source model allows developers to skip the demanding, capital-intensive steps of building and training models themselves. Now they can build on top of existing models, making it significantly easier to jump to the frontier. It also means that any company like Open AI who claims to be at the frontier today could lose it tomorrow.
  4. Staying at top requires as much creativity as capital – having capital is not an assurance of success. Having resources and committing them for a project does not yield success. Creativity of the people, the talent pipeline and how they are groomed and encouraged is a distinctive edge which can provide long term competitive advantage. Necessity is the mother of invention and talent is the genesis for innovation. In the face of adversity, non-availability of financial or technological resources, Deepseek figured out workarounds and ended up building something a lot more efficient. It is evidently and admittedly built on the existing engine. Theoretically anyone can build it. Their edge is the talent that they have assembled to do something like which is hard to emulate.
  5. Innovation is not just about ideas, but how well you execute them – going beyond creativity, which is about generating new ideas, innovation is how these ideas are leveraged into reality carrying economic or social value. It does not stop here until it is complemented by concrete actions and execution.
  6. Timing – Apple a tech leader has been criticized for not investing into AI from the massive cash pile it sits on. This might be validation of why they decided to wait on before getting into AI head-on. This approach echoes Nandan Nilekani when he said that India should not aim to build another LLM, but to wait for the tech giants to put in billions of dollars and then use it to create synthetic data, build small models quickly and train them using appropriate data.
  7. Challenge to India – Sam Altman, while visiting India in June 2023 famously remarked that “It’s totally hopeless to compete with us on training foundation models”. Looks like the challenge was accepted on our behalf by our not so friendly neighbor. Seriously looking back when has a new, groundbreaking technology came out of India? Either it was 5000 years ago or in present times, it happened in the Government domain. We, as a country have demonstrated our technological powers numerous times, but each time by a government lab. When India was denied Supercomputer in 1991, CDAC came up with Param, National Aeronautics Laboratory with Flo Solver and DRDO with Pace. Denied the cryogenic engines, the flight of GSLV in 2014 using self-developed cryogenic engines proved a point. There are many such examples from the Government side, but none so noteworthy from the Private sector. Perhaps the private sector is busy discussing the working hours in a week and forgot to work.
Bottom Image