Learning Library

← Back to Library

DeepSeek: Origins, Funding, and Training Costs

Key Points

  • DeepSeek was founded in May 2023 as a spin‑off of the Chinese hedge fund Highflyer, which had already invested in AI for its trading strategies and supplied the startup with 10,000 Nvidia A100 GPUs in 2021.
  • The company claims its latest model was trained on 2,000 GPUs for 55 days at a reported incremental cost of $5.58 million, a figure that aligns with the expected cost curve drop for large language models in the $5‑10 million range.
  • Critics note that the disclosed cost likely excludes major expenses such as R&D labor, electricity, cooling, data acquisition/curation, and storage infrastructure, meaning the true total cost is probably higher.
  • DeepSeek’s rapid model development and open‑source releases have drawn significant attention, exemplified by the fact that Sam Altman’s recent Washington briefing focused more on DeepSeek than on OpenAI.
  • The discussion references Dario Amodei’s essay predicting cheaper training for comparable‑class models, underscoring that DeepSeek’s engineering efficiencies are part of a broader trend toward lower‑cost AI training.

Full Transcript

# DeepSeek: Origins, Funding, and Training Costs **Source:** [https://www.youtube.com/watch?v=XoM5w8OYlXs](https://www.youtube.com/watch?v=XoM5w8OYlXs) **Duration:** 00:08:33 ## Summary - DeepSeek was founded in May 2023 as a spin‑off of the Chinese hedge fund Highflyer, which had already invested in AI for its trading strategies and supplied the startup with 10,000 Nvidia A100 GPUs in 2021. - The company claims its latest model was trained on 2,000 GPUs for 55 days at a reported incremental cost of $5.58 million, a figure that aligns with the expected cost curve drop for large language models in the $5‑10 million range. - Critics note that the disclosed cost likely excludes major expenses such as R&D labor, electricity, cooling, data acquisition/curation, and storage infrastructure, meaning the true total cost is probably higher. - DeepSeek’s rapid model development and open‑source releases have drawn significant attention, exemplified by the fact that Sam Altman’s recent Washington briefing focused more on DeepSeek than on OpenAI. - The discussion references Dario Amodei’s essay predicting cheaper training for comparable‑class models, underscoring that DeepSeek’s engineering efficiencies are part of a broader trend toward lower‑cost AI training. ## Sections - [00:00:00](https://www.youtube.com/watch?v=XoM5w8OYlXs&t=0s) **DeepSeek Origins, Chip Supply, Spotlight** - The segment explains that DeepSeek, founded in May 2023 as a spin‑off of Chinese hedge fund Highflyer, received 10,000 Nvidia A100 GPUs in 2021, details its training‑cost claims, and notes its sudden prominence after Sam Altman’s Washington briefing focused on the company rather than OpenAI. ## Full Transcript
0:00okay I've taken my time on doing this 0:02video because I wanted to get it right 0:04this is a sort of larger look at what 0:06deep seek is where they got their chips 0:08what has been happening and it's sort of 0:10timely because apparently when Sam mman 0:13went to Washington DC for his big 0:15briefing yesterday all the questions 0:16were not about open AI they were about 0:18deep seek and so I thought doing a 0:21little bit of a deeper dive might be 0:22appropriate so with that in mind where 0:25did deep seek come from Deep seek was 0:27founded in May of 2023 they spun off 0:31from highflyer which is a hedge fund in 0:34China that had integrated AI into its 0:36trading strategies this will come back 0:38later uh recognizing you know AI has 0:42potential Beyond Finance at this point 0:44so highflyer invests highflyer invests 0:48in deep seek specifically getting 10,000 0:52Nvidia a100 gpus for them in 0:572021 by the the the in other words like 1:00was founded 2023 they got the a100 gpus 1:02in 2021 because they believed in AI they 1:05handed or believed to have handed the 1:0710,000 Nvidia A1 100s over to deep seek 1:10as part of the initial investment so 1:12that Hardware acquisition positioned 1:14deep seek for really aggressive AI 1:17development which is sort of what we see 1:18as they start to open source models this 1:20is if you got surprised by them you 1:22haven't been reading their papers and 1:24watching their model capability 1:27climb this brings me to training costs 1:30deep seep claims that their training run 1:32cost $ 5.58 million it used 2,000 gpus 1:36they did not share the model for reasons 1:38that should be pretty obvious uh and 1:41they said 55 days of training time the 1:44the cost they're 1:45reporting is much lower than what other 1:50models have reported but not wildly 1:54lower and that's a point that Dario 1:56amade made in his essay where he said we 1:58should kind of expect model in the 5 2:01to10 million range if the incremental 2:03cost of for example Claude uh like a 2:06year ago was 10 plus million we expect 2:09models in the same class to get cheaper 2:11over time that's not a surprise so this 2:14is kind of 2:15expected none of that to take away from 2:17the brilliant engineering the team is 2:19doing it doesn't come for free to make 2:21the stuff cheaper you actually have to 2:22work at it um but this is in line with 2:24previous cost curve reductions that 2:26we've seen there are other reasons to be 2:29skeptical though of whether it was 2:30actually $5 million so the $5 million 2:33figure is really around the incremental 2:35cost for the actual training run so it 2:37doesn't cover the R&D labor cost I've 2:39seen a lot of people noting how many 2:41people are on those Open Source papers 2:43it doesn't cover electricity it doesn't 2:45cover cooling costs both of which are 2:47significant it doesn't cover 2:49pre-training data acquisition 2:51acquisition and curation if you think 2:53they just suck in the internet and do no 2:55curation that is just not how it works 2:58nowadays and then then it doesn't cover 3:00infrastructure and storage so how do you 3:03optimize your storage systems how do you 3:06manage all the data all of that the 3:09estimate that I saw from semi analysis 3:11which is a firm that analyzes 3:13semiconductors for Wall Street is that 3:16they think deep se's annual AI budget is 3:19somewhere around half a billion dollars 3:21which is a bit more than five million 3:23and maybe they're off but it just I 3:26think it's helpful to understand that 3:28when you name a price that's just a 3:31piece of the total price that's all it 3:33is it's like saying you know I bought my 3:36Toyota door 3:38for a hundred bucks I don't think I 3:41could get my Toyota door for 100 bucks 3:43let's just use that phrase I bought my 3:44Toyota door for 100 bucks and not 3:47mentioning the fact that you had to buy 3:49the whole 3:52car and Wall Street bought 3:54that so what about this 3:58Hardware the 4:00the best analysis I've seen and I've 4:01seen this multiple places is that they 4:04have got their hands deep seek has got 4:06their hands on about 50,000 Nvidia 4:08Hopper class gpus and that includes H 4:11800s which is a China specific version 4:13of the h100 that has some lower 4:15performance and 4:17h20s um and the H20 is a restricted 4:19variant that's designed to be able to be 4:22sold to China and still be 4:25compliant plus that significant a100 4:28stock pile from 4:302021 plus possibly Cloud compute 4:33agreements with Chinese providers which 4:34would allow them to kind of get around 4:36things at the end of the day these chips 4:40are slower than the full power 4:42h100 but deep seek ironically could 4:45scale up quantity to maintain 4:47performance for at least for a while and 4:50that explains why the feds specifically 4:53the 4:54FBI is looking at the Singapore back 4:58door that's under investigation now 5:00um and the Singapore backd door is the 5:02nickname for the fact that Nvidia has an 5:06inordinately large percentage of sales 5:09to Singapore and it is wondered if some 5:13of those sales to Singapore are getting 5:14renamed and re-exported back to China 5:18from 5:19Singapore and yeah the FBI is looking at 5:22that we may see tighter export controls 5:23I don't know we'll 5:25see so really what this comes down to is 5:30are the hardware constraints and 5:33training costs going to be helpful for 5:35for determining a competitive advantage 5:37and what deep seek tried to say 5:39essentially was no like we can train a 5:41model for $5 billion it's fine um and 5:45what I would argue and what other model 5:47makers are arguing is that the marginal 5:50differences matter a lot if you start 5:52out in this race just a little bit ahead 5:54it adds up a lot because these models 5:56are improving at an accelerating rate 5:59it's like you're starting a little bit 6:00ahead but everyone's running faster and 6:02faster over 6:04time and that kind of adds up like if 6:08you look at where deep seek is at from a 6:09model capability perspective they have 6:12just 6:13released a model on the class of the 6:16models that finished their training runs 6:18a year 6:19ago so like Claude so okay they're about 6:24a year behind and they're uh you know 6:27somewhat cheaper and by the way that is 6:29some if you Dario released the 6:31incremental costs for for Claude And 6:34it's like yeah you could say the Toyota 6:35door costs $150 to use my metaphor it's 6:39he was saying it's like $10 million in 6:41change um okay 6:44fine like maybe it was a little bit more 6:47than $10 million but the point is like 6:49you can slice it down and say it only 6:51cost a certain number of million when 6:53the reality 6:54is they need a lot of capital for all 6:56the other stuff I discussed for the R&D 6:58for the labor for the data and compute 7:00and infrastructure all of 7:01that okay so I suspect I am going to 7:05guess that we are looking at tighter 7:08chip export regulations I don't know 7:10that I'm not a profit but it just makes 7:13sense to me I think that if anything 7:15deep seek drawing this much attention to 7:17itself might end up being 7:18counterproductive for them from an 7:20access to chips perspective we will see 7:23I also would call out that this is a 7:25trailing Edge indicator releasing a 7:27model is the end of a long process and 7:29if you've been able to get around export 7:33restrictions successfully for a bit and 7:35then release a very widely known 7:37model well you got around 7:41export export restrictions successfully 7:44for 2021 22 23 you got some benefit out 7:47of it it is not clear if that is going 7:50to persist and if they start to clamp 7:52down on that that is going to change in 7:54a year in two years where deep seek will 7:56be at and I think that is the argument 7:57that the feds will be looking at we will 7:59see how the FBI investigation goes so 8:02that's the story of deep seek it's 8:03actually not that that uh old a firm but 8:07it's super aggressive and by the way if 8:08you just heard of them now they've been 8:10publishing papers they've been building 8:11model capabilities since they were 8:13founded in 2023 so this is not actually 8:15new they've been in the space and it's 8:17all good engineering stuff and I do want 8:19to call out that none of this is shadeed 8:21on the team uh they did great 8:23engineering work and I will continue to 8:25say that that was not fake um there's 8:28just a lot of other stuff going on and I 8:29wanted to share the full story cheers