Learning Library

← Back to Library

Llama 3.1 Debut, GPT‑4o Mini, AI Price War

Key Points

  • Meta released Llama 3.1, the first high‑performance frontier AI model made openly available, sparking excitement about community‑driven model building, business opportunities, and AI‑safety considerations.
  • OpenAI followed with GPT‑4o mini, a tiny, ultra‑cheap model that intensifies the emerging “frontier model” price war and raises questions about the long‑term sustainability of rapid, low‑cost AI launches.
  • The panel highlighted a key technical gap: while OpenAI’s offerings are primarily cloud‑based APIs, the demand for truly embedded, on‑device models remains unsolved, though the company may address it in the future.
  • Hosts introduced a diverse expert lineup—Barsey (senior partner, consulting AI), Chris Haye (distinguished engineer, CTO of customer transformation), and newcomer Mariam Ashuri (director of product management at Watson X)—to dissect the week’s AI developments.
  • A light‑hearted moment noted Mark Zuckerberg’s new, “surf‑style” public appearance alongside the Llama 3.1 announcement, with unanimous panel preference for the fresh look over his classic nerd‑hoodie image.

Full Transcript

# Llama 3.1 Debut, GPT‑4o Mini, AI Price War **Source:** [https://www.youtube.com/watch?v=bQzPaRYC9BE](https://www.youtube.com/watch?v=bQzPaRYC9BE) **Duration:** 00:20:20 ## Summary - Meta released Llama 3.1, the first high‑performance frontier AI model made openly available, sparking excitement about community‑driven model building, business opportunities, and AI‑safety considerations. - OpenAI followed with GPT‑4o mini, a tiny, ultra‑cheap model that intensifies the emerging “frontier model” price war and raises questions about the long‑term sustainability of rapid, low‑cost AI launches. - The panel highlighted a key technical gap: while OpenAI’s offerings are primarily cloud‑based APIs, the demand for truly embedded, on‑device models remains unsolved, though the company may address it in the future. - Hosts introduced a diverse expert lineup—Barsey (senior partner, consulting AI), Chris Haye (distinguished engineer, CTO of customer transformation), and newcomer Mariam Ashuri (director of product management at Watson X)—to dissect the week’s AI developments. - A light‑hearted moment noted Mark Zuckerberg’s new, “surf‑style” public appearance alongside the Llama 3.1 announcement, with unanimous panel preference for the fresh look over his classic nerd‑hoodie image. ## Sections - [00:00:00](https://www.youtube.com/watch?v=bQzPaRYC9BE&t=0s) **AI News: Llama 3.1 & GPT‑4o Mini** - The host introduces a panel to discuss Meta’s Llama 3.1 release and OpenAI’s cheap GPT‑4o mini, exploring open‑source impacts, business implications, and the sustainability of the frontier model price war. ## Full Transcript
0:00hello and happy Friday this is mixture 0:01of experts and I am your host Tim Juan 0:04each week mixture of experts brings 0:05together a world-class panel of 0:07technologists Engineers and more to help 0:09make sense of the title wave of news 0:11each week in AI this week on the show we 0:13cover two big stories first metast 0:16strikes back with the launch of llama 0:173.1 and Zuck is out with a brand new 0:20look we talk about the state-of-the-art 0:21and language models in open source and 0:23we talk about the implications on the 0:25business of AI and on AI safety this is 0:27going to be game changer for the market 0:30because it's enabling the open source 0:31Community to start building using a very 0:34powerful model that is available to them 0:37to create to smaller models and put it 0:39back to the market second open AI 0:42continues its string of launches with 0:43GPT 40 mini a relatively tiny wildly 0:46cheap model we talk about the ongoing 0:48Frontier Model price war and how 0:50sustainable it is over the long run 0:52Chris did I hear you say embedded models 0:54open AI on device that's that's the use 0:56case that I can't solve open AI is an 0:59API I think they'll get 1:06there as always I'm joined by an 1:08incredible group of panelists that will 1:10help us navigate what has been another 1:12action-packed week in AI so today we've 1:14got three panelists we've got barsy 1:16senior partner Consulting AI for US 1:18Canada and ladam we've got Chris Haye 1:20distinguished engineer and is the CTO of 1:22customer transformation and finally 1:24joining us for the first time is Mariam 1:26ashuri director of product management at 1:28Watson x. 1:30[Music] 1:34so our first story today of course is 1:36the launch of llama 3.1 um this is 1:39obviously an enormous uh technical 1:42Milestone it's the first time arguably 1:45that we've had Frontier AI models that 1:47are available in the open source and 1:49we're going to talk all about the 1:50technical aspects of this and why you 1:52should care as a listener um I think to 1:54just get us started though I think one 1:56of the things that uh I also loved about 1:58the announcement is that Mark Zuckerberg 1:59person Al took to Facebook to announce 2:02this and he did not only debut llama 3.1 2:05but he also debuted like a new look um 2:08if you remember kind of classic version 2:09Mark Zuckerberg like very pale very 2:11serious very nerdy looking you know with 2:13the hoodie um but he's been looking real 2:15like kind of like surf beach guy you 2:17know is kind of his new look um so I 2:19think just to start I want to ask each 2:21of the panelists uh do you like the old 2:23Zuck look or do you like the new Zuck 2:25look better uh Chris to you first old 2:27Zuck new Zuck new Zuck 2:31goit NE is the new 2:34cool and last but not least Mario what 2:37do you think old Zuck new Zuck I'd go 2:39for new okay so we have bold consensus 2:42on the new Zuck I think we're gonna come 2:44to miss the old Zuck I I really like old 2:47nerdy Zuck um but uh but again you know 2:49seasons change and we have to go with it 2:52so let me just introduce the story of 2:54today um if you haven't been watching 2:55the news or if you're even remotely 2:57related with AI I think you'll know that 2:59meta has come out and launched llama 3.1 3:02um it is the latest edition of its llama 3:05class of models um and uniquely meta in 3:08comparison to say open AI or anthropic 3:11really is kind of pursuing open- Source 3:13AI in a very big way and I think one 3:16really big thing that we've seen is the 3:17launch of the gigantic sort of llama 3:20405b model which is a highly capable 3:23model state-of-the-art and now just 3:25available for free uh in the open source 3:28and I think the first person I want kind 3:29of turned to was uh Mariam you were 3:32working on this on day one of the launch 3:34for Watson X and I'd love to just like 3:37for you to talk a little bit about that 3:38right because like it was really easy 3:40right you just kind of like threw it up 3:41and you know people downloaded it I'm 3:43kidding of course like I'm kind curious 3:44just like get your War story of like 3:46what that felt like to launch a model 3:48like that and there's anything like you 3:49learned or walked away with from that 3:51experience it was actually nothing that 3:54I call Easy especially because it's a 3:56giant model so we had to run it cross 4:00noes mult multide inferencing this was 4:03the first time that we had such a 400 4:06billion parameter models on our platform 4:09but um it was exciting and I'm super 4:13excited not just for our customers but 4:15also for the community I think the 4:17amazing thing that meta did yesterday 4:19with llama 3 llama 4:223.51 40 five billion the name is getting 4:26longer and longer yeah they really need 4:27to work on The Branding for all these 4:31I don't even know what's going on um 4:32it's it's changing the license to let 4:35the market to use the model for 4:38distillation and teaching the smaller 4:41models this is going to be game changer 4:44for the market because it's enabling the 4:46open source Community to start building 4:48using a very powerful model that is 4:50available to them to create a smaller 4:53models and put it back to the market so 4:56for that reason I'm super excited for 4:57the opportunity it unlocks 5:00yeah I think I'm I'm thrilled by it just 5:02because I think for a long time open 5:03source you know has been very exciting 5:05but arguably has not been like it's been 5:07lagging a little bit in performance and 5:08now a suddenly there's like what looks 5:10to be the possibility that the open 5:11source is going to be just as powerful 5:13just as exciting just as 5:14state-of-the-art in a lot of ways can I 5:16just ask a question on behalf of like 5:18the listeners who may be less familiar 5:19with the space which is like why is meta 5:22doing this it's like incredibly 5:24expensive to build one of these models 5:25and if I have it right they're like 5:26literally just giving it away for free 5:29which you know I'm I'm not a big city 5:31business guy but like I don't even know 5:33why that works I don't know show you 5:34want to comment on like what is going on 5:36here like why is meta doing this and do 5:38we think they could make money on this 5:40they're losing money aren't they yeah 5:41absolutely so I think we there are 5:43certain clients certain vendors like 5:45meta or Nvidia that have other sources 5:49of revenue right what they sell by 5:51selling the bottel is going to be a 5:53rounding error as compared to what NV is 5:55going to do with Hardware so they're 5:56giving away the nimron with meta they 5:59all these other social properties that 6:00they make money on so just give you a 6:02quick data point on meta 6:04itself you on the chief AI off scientist 6:08at meta absolutely incredible 6:09personality he was sharing a data point 6:11on 6:13before two years back when somebody's 6:16posting something on Facebook and you're 6:17trying to figure out if there's 6:18misinformation or hate crime abuse and 6:21things of that nature the NLP Best in 6:23Class would give you about 24 25% hit on 6:26identifying that as some bad content now 6:29with the Llama models they're getting 6:31close to 92 94% so they're able to do a 6:34lot more good and filtering with their 6:35models and they're able to to use that 6:38to enhance their own products right when 6:41you're experiencing Instagram or 6:43Whatsapp or or Facebook they're 6:45embedding AI that is now being helped by 6:48the entire crowdsourcing of everybody 6:49else right so they have different 6:51Avenues of Revenue so for them this is 6:53not a lost leader product so to say 6:57they're trying to figure out a right 6:58Community around it can build towards it 7:00contribute and they are benefiting in 7:02their products uh by using this AI so in 7:05the very very small subset of of vendors 7:07who can potentially do this Google AWS 7:10Asher IBM we're all in the business of 7:13selling the AI to other businesses right 7:15so you won't come to a point where you 7:17can just open up your your models and 7:20then IBM comes in and says you know what 7:22mic drop I'm going to open source my 7:24granite models as well we want the 7:25community to come and help so it's it's 7:27a very it's a this Market is about split 7:29between companies like meta and IBM who 7:32are opening up the models completely 7:34versus the clo Source models yeah 7:35definitely and I think part of the 7:36question that really is in my mind is 7:39you know the kind of pressure this 7:41creates for the close Source models 7:43right because I think on some level it's 7:45kind of like look if you're open AI 7:46you're like we've got this like crazy 7:48you know machine intelligence and we're 7:50going to rent you access to it right and 7:53this is in some ways kind of flips the 7:54whole thing on its head right it 7:55basically says look access to that is 7:58going to be free um and I guess Mario 8:01I'm curious to someone who's kind of in 8:03the space like do you think that this is 8:05ultimately going to force like an open 8:06AI or an anthropic to also have to go 8:08open source in the end because it feels 8:10like once it's free you know why would 8:12you pay for Claude or something like 8:14that well look what mol did yesterday 8:17with mistol large too they put it out 8:21the weights for research only this is 8:23their Flagship model we've been having a 8:26lot of conversation in the past about 8:28protecting the weight to make sure that 8:29it doesn't go out and here it is it's 8:32out for research only coh here did 8:35something similar a few months ago so 8:38I'd see the trend where the trend is 8:40going is to nurture that openness but 8:43reserve the rights for commercial 8:44purposes I think the final item we might 8:46want to touch on before we jump here is 8:48Mariam you kind of pointed out M trial 8:51was kind of like moving in a very 8:52similar Direction here in terms of their 8:54kind of like licensing and research 8:56presumably part of that research 8:58licensing is also to like say Hey you 9:00can red team this model you can make it 9:01better you can Surface all the safety 9:03issues um I'm curious about like how you 9:05see kind of platforms like uh Mr kind of 9:08like f falling into this ecosystem right 9:11because they are not one of the really 9:13big corporate players right um and um 9:16but they still seem to be really kind of 9:17like being able to kind of ride this 9:19open source wave um curious if kind of 9:21thoughts on like how they'll fit into 9:23the competition here as it continues to 9:24evolve I think it's important to think 9:27about what problem each of these model 9:29providers are solving if you look into 9:31mol they are EUR born um favorite of the 9:36Europe so that's that's the market they 9:38are supporting a wide range of European 9:40languages not a specific dialect but a 9:43wide range over there so if you are in 9:45Europe if you're speaking that language 9:47there is a way higher chance that mol is 9:50a better positions model for you um so I 9:53think it's important to understand what 9:55the use case is who is going to be using 9:57it and then what is the right model for 9:59that Target use case versus generalizing 10:02of hey is Mr Large better than 10:05[Music] 10:09anthropic so I'm going to move us on to 10:11the second story that we're going to 10:12cover today we could obviously be 10:14talking a lot more about this but I 10:15think almost we're going to flip to the 10:17other big Dimension um that we see kind 10:20of evolving in the AI market so one of 10:22them is from close to open right which I 10:24think is definitely a big shift and 3.1 10:26really just puts a big meta flag you 10:29know on that on that change I think the 10:31other big change that we've been 10:32tracking and we've been talking about in 10:34the last few episodes but I want to hit 10:35on it really hard is basically the 10:37movement from gigantic models to very 10:39little models right very fast very cheap 10:42um and and smaller models um and the 10:45kind of Peg for this is just last week 10:46open AI announced their latest Salvo in 10:49this battle um which is a model called 10:51gp2 uh GPT 40 mini to maram your earlier 10:55point about like they really got to 10:56improve the name on these but that's 10:57what they released and what's so 10:59striking about this announcement is that 11:01the pricing is like legitimately crazy 11:04it's like 15 cents per 1 million input 11:06tokens 60 cents per 1 million output 11:09tokens and they point out in their blog 11:11post that since 20 uh 2022 the cost per 11:14token have dropped 11:1699% um and I think the first question I 11:18just want to launch with and Chris maybe 11:19you're well positioned to answer this is 11:22are we just in a price War here like is 11:24this even sustainable like I'm kind of 11:25curious about like how much of this is 11:27open AI really being able to cut the 11:29costs of serving low enough that they 11:31can offer these models still out of 11:33profit versus them just really kind of 11:35in this big like just pushing the price 11:38to zero battle against their Rivals 11:40because it kind of feels like this is 11:41also part of the open source competition 11:42as well right is like how can we offer 11:44free to keep up with all the other free 11:46options that are happening out in the 11:48world and so I mean I guess Chris to you 11:50kind of like do you think we're in a 11:51price War I think we are a little bit in 11:53a price War I think open AI has some 11:56other considerations as well because 11:58although though we've spent all of our 12:00time playing with GPT 4 GPT 40 I think 12:04the reality is the vast majority of 12:06people are running GPT 3.5 which is a 12:10bigger model and realistically open AI 12:14had to remove their free model and run a 12:16cheaper model and actually that's kind 12:18of what they've done there with GPT 40 12:20mini so yeah wonderful here we go we're 12:22so great here's a cheaper model Etc but 12:25actually they managed to get all of the 12:28very large G bt35 off of their books and 12:31now they're running a uh a smaller model 12:33to be able to serve the majority of the 12:34requests that are hitting chat gbt so I 12:37think they needed to do that just for 12:40kind of commercial reasons I think we 12:42are pushing towards smaller models all 12:44the time the only time you really need 12:48the larger models is for reasoning and 12:50planning right for the smaller models 12:53most of the time with good fine tuning 12:55you can get the model to do what you 12:57want and I think open AI realized that 12:59as well the world realizes that open AI 13:02has already seen that a lot of people 13:03start with GPT 4 as or GPT 40 their 13:07starter model when they're in the 13:08Enterprise but very quickly when they go 13:10to production they bring it down to a 13:12smaller model and that's been eating 13:13their lunch and they wanted to stop that 13:15as well so and then as we move to 13:18devices embedded devices they need to be 13:20able to play in that space so it's 13:21critically important for open a to have 13:24that smaller model in the market space 13:26um so I think that's great but is it a 13:30price War partially but it's also a cred 13:33War as well Chris did I hear you say 13:35embedded models open AI on device that's 13:38that's the use case I can't solve open 13:40AI is an API I think they'll get there 13:43if you really think about this for a 13:44second so if we take a guess at what 13:46size the OM Mini model is it's probably 13:50around the 11 billion parameter Mark 13:52right maybe a little bit more maybe a 13:53little bit less there is going to come a 13:56point when you start dealing with apple 13:58when you start dealing with uh Google 14:00Etc you are going to have to provide a 14:03model at some point to run on a device 14:06and if you don't you're going to be 14:07locked out of a market we've already 14:09seen that with the iPhone uh and their 14:11recent announcements so they are going 14:13to have to do something in that space I 14:15think this is a move towards that no 14:17doubt so they're not offering embedded 14:19just now but they will in the future 14:21yeah I think that definitely is going to 14:22happen I think also another thing in 14:24Chris and what you're saying is do we 14:27have too much intelligence now you know 14:30this is kind of where you're pointing 14:31towards right and Mariam I don't know if 14:33you agree as kind of someone who's 14:34working on Watson X is like you know 14:37it's almost like there's some level 14:39always like oh why don't you want the 14:40bigger and better model it can do so 14:41many more things but I guess kind of 14:43Chris you're making the argument that 14:44like maybe we don't really need those 14:47things um like there's almost basically 14:48like we've now passed the point our 14:50models are so capable now that like 14:52actually they're past the point of like 14:54what we actually need on an everyday 14:55basis well think about it the larger the 14:58model the more more powerful it is but 15:00also the higher uh the larger compute 15:02resources it needs it translates to 15:05latency that's response time if your 15:07Enterprise you want to use it in 15:08production that translates to carbon 15:10footprint and energy consumption that is 15:12the topic of conversation these days and 15:15that translates to cost so cost is 15:16actually just one of the factors so in 15:19some highly regulated environments it 15:21might be even the the other two might be 15:23bigger blocker uh to move forward but 15:25the comment that you made about the 15:27price I I feel like there are two fold 15:30to this if you are a model provider you 15:32want to set the price as much as you can 15:34to increase the adoption if you are a 15:37consumer of those we see half of the 15:39market has already moved from 15:41exploration to pilots and 10% to 15:43production that when you get to a scale 15:46the the cost adds up like if you think 15:49about that for normal prediction use 15:51cases you might be having like 500,000 15:54predictions a day of if you're a door 15:57Dash or I used to work for l 15:59so in those environments if you want to 16:01use jna just do the calculation price 16:04per API call it adds up so it's really 16:07not sustainable so in order to get it to 16:09production and scale model provider has 16:12to like find a way to set the price low 16:16and the way that we can implement it is 16:18usually through smaller moving through 16:20smaller um platform so it's it's sort of 16:24driving the de the demand is driving 16:26where the market is going but also this 16:28is is the right thing for the whole 16:30Market to do and maram just to add to 16:32that right so this week as well open AI 16:35added the ability to find tune their 16:37Mini model and I have to think these two 16:40things are related right because when 16:42you go to production as you say maram 16:44one of the things you're going to want 16:45to do to cut bring that inference down 16:47and improve the reliability is take that 16:49model and fine-tune it with your data 16:51rather than having large prompts yeah 16:53see in the market emerging for 16:55Enterprise is grabbing a much a 16:57smaller trusted models I would say and 17:01fine-tune it on their proprietary data 17:04the data about their users and the data 17:06about the domain because at the end of 17:07the day they want to have something 17:09differentiated in the market because 17:11these large models everyone has access 17:13to that and the power of differentiation 17:16is really the proprietary data in order 17:18to do that you should be able to fine 17:19tune it with the data that no one else 17:21has access to yeah Mariam and I we were 17:24on a call with a client yesterday and we 17:25got with this nice argument he just 17:27started off by saying he's a head of AI 17:29for a big big Fortune company and he 17:32started off by saying that hey I expect 17:33these models to be intelligent so I 17:35don't like the small ones I really want 17:36big large ones so they can actually do 17:38something meaningful for me and we had a 17:40nice chat with him explaining him how 17:41this whole pricing and stuff like that 17:43work let's take an example for just the 17:45pricing part maram and this is something 17:46that you and I do quite a bit on excels 17:48trying to showcase the the range right 17:51and your favorite example is let's take 17:53a 30 minute uh recording of a of of a 17:55call and let's summarize it into one 17:57page right and if if you look at some of 17:59the tokens make some assumptions around 18:00that let's do this a thousand times 18:02right a thousand times you're 18:03summarizing a call transcript into a 18:05page with the 40 Mini model that is a 18:10dollar and if you look at the 18:11best-in-class models from Claude and 40 18:14that is about $30 to $40 right so you 18:17get us quite a bit of a range and then 18:19you bring in the Llama model the biggest 18:20one and the 4 and five billion pret 18:23model it's going to be 80 bucks so now 18:25you have an open-source model that's 18:27costing $80 for those thousand 18:29summarizations you have the 18:31best-in-class Frontiers that are half of 18:33that at 40 bucks and then you have a 18:36dollar if you're using the 40 mini even 18:38hosting your own models if you're 18:40hosting a llama 8 model which is much 18:42smaller than the 40 mini hosting your 18:45own model on AWS Azure IBM Google of the 18:47world that's going to be like $34 for 18:49you so now you have just look at the 18:51price points a dollar openi mini you 18:54don't have to have any headache on 18:55what's Happening they're giving you all 18:56kinds of indemnifications they'll give 18:58you ways of find tuning it make it your 18:59own a dollar $34 is the free really 19:03really small llama 388 bilon parameter 19:05models then there's $40 if you're doing 19:07the best in class and there's $80 if 19:09you're doing the biggest llama model 19:10open source just it becomes very real 19:13when you start doing a million of these 19:14a day so we have to wrap up uh we're 19:16almost at time of course as always we 19:18could spend a lot more time talking 19:19about this I think one of the most 19:21interesting things coming out of this 19:22conversation is maybe it becomes worth 19:24less and less to train bigger and bigger 19:26models and so here's my spicy take I 19:28want to end with with a yes or no 19:29question which is at some point in the 19:31future will open AI stop training larger 19:34and larger models and just focus on the 19:36models they have 19:38Chris open AI is going to build a model 19:41that's powered by the sun in the 19:43future got it show it open a will keep 19:47going at it there's a lot more to be 19:49done to get to human 19:50intelligence and Mariam the regulations 19:53is going to stop that at some point wow 19:57okay so at some point open a will stop 19:59um well a lot more to get into there 20:01Mariam we'll just have to have you back 20:02on the show at some point regrettably uh 20:05but I hope you had a good time Chris 20:06show it again thanks for joining us and 20:09uh for all you listeners out there 20:10thanks for joining us as well if you 20:12enjoyed what you heard you can get us on 20:13Apple podcast Spotify and podcast 20:15platforms everywhere and we'll see you 20:17next week