Learning Library

← Back to Library

Disney Signs AI Licensing Deal

Key Points

  • Disney is striking a three‑year licensing agreement with OpenAI that lets the company use Disney characters in generative AI models while also taking a roughly $1 billion equity stake in OpenAI to steer fan‑made content back onto Disney‑controlled platforms.
  • The deal marks a shift from typical AI licensing (which usually only grants data for training) toward a strategic partnership that gives Disney both creative control and a financial foothold in the AI ecosystem.
  • In the broader AI roundup, driverless robo‑taxis from Google, Amazon and Tesla have gone mainstream, Walmart has moved to Nasdaq to rebrand as an AI‑first enterprise, and IBM has teamed with Kaggle to launch a leaderboard for real‑world AI model performance.
  • Panelists debate why OpenAI would accept a “sub‑billion‑dollar” deal, noting that the arrangement provides valuable IP access, a new distribution channel for Disney‑generated content, and a potential long‑term revenue stream beyond pure data licensing.

Sections

Full Transcript

# Disney Signs AI Licensing Deal **Source:** [https://www.youtube.com/watch?v=dJTwev1O-mE](https://www.youtube.com/watch?v=dJTwev1O-mE) **Duration:** 00:38:19 ## Summary - Disney is striking a three‑year licensing agreement with OpenAI that lets the company use Disney characters in generative AI models while also taking a roughly $1 billion equity stake in OpenAI to steer fan‑made content back onto Disney‑controlled platforms. - The deal marks a shift from typical AI licensing (which usually only grants data for training) toward a strategic partnership that gives Disney both creative control and a financial foothold in the AI ecosystem. - In the broader AI roundup, driverless robo‑taxis from Google, Amazon and Tesla have gone mainstream, Walmart has moved to Nasdaq to rebrand as an AI‑first enterprise, and IBM has teamed with Kaggle to launch a leaderboard for real‑world AI model performance. - Panelists debate why OpenAI would accept a “sub‑billion‑dollar” deal, noting that the arrangement provides valuable IP access, a new distribution channel for Disney‑generated content, and a potential long‑term revenue stream beyond pure data licensing. ## Sections - [00:00:00](https://www.youtube.com/watch?v=dJTwev1O-mE&t=0s) **Untitled Section** - - [00:03:37](https://www.youtube.com/watch?v=dJTwev1O-mE&t=217s) **Testing New AI Models with Copyrighted Prompts** - The speaker explains how people routinely challenge fresh image and text models by generating copyrighted characters and quirky test prompts, highlighting legal restrictions, Disney‑OpenAI dynamics, and the prospect of author consent for fan‑fiction‑style outputs. - [00:07:08](https://www.youtube.com/watch?v=dJTwev1O-mE&t=428s) **AI, Fanfiction, and Disney’s IP Strategy** - The speakers discuss how AI‑generated fanfiction reshapes traditional authorial social contracts, prompting corporations like Disney to partner with AI firms and monetize the emerging landscape. - [00:12:25](https://www.youtube.com/watch?v=dJTwev1O-mE&t=745s) **AI Hype Overshadows Technical Progress** - The speaker critiques how 2024’s AI narrative has been driven by business deals and media hype, leaving genuine research and technical advances on the sidelines. - [00:18:48](https://www.youtube.com/watch?v=dJTwev1O-mE&t=1128s) **Full-Stack AI Race & Commoditization** - The speakers discuss how AI models are becoming commoditized, sparking a competitive “full‑stack” push—led by Nvidia and others—to integrate, distill, and simplify deployment, turning the focus toward economic viability and user‑friendly access. - [00:22:32](https://www.youtube.com/watch?v=dJTwev1O-mE&t=1352s) **Hype vs Architecture of Mixture‑of‑Experts** - The speakers critique the marketing hype around a new AI model, noting its architecture mirrors earlier designs (transformer, Mamba, mixture‑of‑experts) and explain how such models use only a fraction of total parameters during inference by activating subsets of experts across sizes like nano, super, and ultra. - [00:26:50](https://www.youtube.com/watch?v=dJTwev1O-mE&t=1610s) **Embedding Alignment via System Prompts** - The speakers discuss how directly integrating alignment principles into a model’s system prompt—rather than relying chiefly on downstream prompting—shapes distinct model personalities and prompts the question of why this approach hasn’t been more widely adopted. - [00:30:06](https://www.youtube.com/watch?v=dJTwev1O-mE&t=1806s) **Anthropic Model Quality and Literary Style** - The speakers debate how the phrasing of Anthropic’s documentation contributes to Claude’s perceived friendliness, noting the challenge of quantifying such qualitative traits. - [00:34:50](https://www.youtube.com/watch?v=dJTwev1O-mE&t=2090s) **Prompting: Temporary Trend or Permanent Tool** - The speakers debate whether prompting will remain essential or become obsolete as AI models grow more prescriptive and embed explicit moral philosophy. - [00:38:17](https://www.youtube.com/watch?v=dJTwev1O-mE&t=2297s) **Teasing Next Week's Mixture of Experts** - The host wraps up the episode and hints that the upcoming show will focus on the Mixture of Experts model. ## Full Transcript
0:01There's the flip side of the deal which, yeah, you 0:03can as an individual use Sora to generate things with 0:06Disney images, but Disney is going to stream these kind 0:10of videos. So they're going right back to Disney and 0:12they're trying to basically have control in some way of 0:15that fan generated content and have it come back to 0:18Disney instead of proliferating on X or Blue Sky. So 0:21they're trying to say that before this gets into a 0:23whole. Oh yeah, look at all these wonderful fan generated 0:26shorts. No, no, no, come be on our platform instead. 0:29All that and more on today's Mixture of. I'm Tim 0:38Hoang and welcome to Mixture of Experts. Each week Moe 0:42brings together a panel of charming and brilliant minds in 0:44technology to distill down what's important in the latest news 0:47in artificial intelligence. Joining us today are three incredible panelists. 0:50We've got Martin Keen, master inventor, Marina Danielewski, senior research 0:54scientist, and Kush Varshni, IBM fellow. The year is winding 0:58up, but it's still full of AI news. We're going 1:01to be covering Disney and OpenAI's new licensing deal, the 1:04Time Magazine Person of the year, Nemotron 3, the new 1:07Nvidia launch, and this Claude Soul document. But first we've 1:11got Eilee with the news. Hi, I'm Ili McConnen, a 1:19tech news writer for IBM Think. Here are a few 1:22AI headlines you might have missed this week. 2025 has 1:25been the year that driverless cars have gone mainstream. Google, 1:29Amazon and Tesla each launched their own version of these 1:32robo taxis in various US cities. Walmart has moved from 1:37the New York Stock Exchange to nasdaq, a move illustrating 1:40the retail giant is trying to transform into an AI 1:43powered enterprise. IBM and online platform Kaggle have partnered to 1:48create a new leaderboard that evaluates AI models and agents 1:52as they solve real world enterprise issues. It's official the 2:05generated content. For more, subscribe to the Think newsletter linked 2:09in the show notes. And now let's go back to 2:10our experts. I want to begin today's episode by talking 2:17about the Disney OpenAI deal which was just announced this 2:21past week. So this is sort of an interesting one. 2:26I know at this point we're all kind of very 2:29cynical about deals that are less than, you know, $10 2:32billion. But the core of this deal basically is that 2:35Disney is about to sign a three year licensing deal 2:39with OpenAI to allow its characters and IP to be 2:42used in its generative kind of AI models. Additionally, Disney 2:48is going to be taking what is effectively kind of 2:50like a billion dollar stake in OpenAI itself and become 2:55an equity owner. And so I guess maybe. Martin, I'll 2:59kick it off with you. Why is OpenAI signing this 3:02deal? Exactly. Yeah, it's such an interesting deal, isn't it, 3:05Tim? Because traditionally, the sort of generative AI deals that 3:08we've seen up until now have been for training and 3:12grounding purposes. So you would purchase, for example, a bunch 3:16of news articles. So OpenAI did a deal with Financial 3:19Times for that, and Google did a deal with Reddit 3:22where they pay Reddit, I think, 60 million a year 3:24for the Reddit training data. But this isn't that. But 3:27this is the other end of it. This is taking 3:28the finished model and actually using the output of it 3:32to be able to incorporate the characters in, which is, 3:34you know, a really different way of looking at it. 3:37And I think from one perspective, you can think, well, 3:41as soon as a new image model comes out, people 3:44are trying to like, generate copyrighted material instantly with it 3:47and seeing how much the model lets you get away 3:49with. And often over time they kind of. Yeah, I 3:51think one of my first SORA uses was like, Mickey 3:54Mouse cartoon. Exactly. So, you know, might as well do 4:04deals we might see beyond this if this one works 4:06out for Disney and OpenAI. Whenever a new model comes 4:10out today, there's always the same sort of test that 4:13everybody does on the model. Like, you always see people 4:16try to build a vector image of a pelican riding 4:18a bike to see how good each model is. Right. 4:21Or an otter on a plane. Using Wi fi is 4:24another pretty popular one. I have my own one of 4:27those, which is every time a new model comes out 4:30that's supposed to be good at writing, I have it 4:33generate a Nelson DeMille short story in the style of 4:37Nelson DeMille. And it's not bad, but always it comes 4:40through and says, I can't write in the style of 4:44Nelson DeMille. But here is something similar to it that 4:47includes some of the themes. It's like a disclaimer. Right. 4:50But maybe not in future. Maybe in future authors will 4:53kind of consent as well. And then I could, like, 4:55make fan fiction of my favorite authors. Who knows where 4:58this is going? Yeah. And I guess, Marina, I think 5:01one of the things that comes to mind is, you 5:02know, with Disney, they've got, you know, the most frustrating 5:05thing for Disney for a long time is they've had 5:07like the Vault, right, Where they're like, you can only 5:09access certain movies at certain times and then arbitrarily at 5:12certain times, Disney is like, it's in the vault. You 5:14can't get access to it. It's a long way of 5:16saying, I think like, Disney's very protective of these characters 5:19and this IP so them to say, like, yeah, we're 5:22okay with a world where anyone's going to use generative 5:24AI to, you know, have Mickey Mouse do whatever. Oh, 5:28the Mickey Mouse is a bad case because it's like 5:30now entering kind of more public domain. What's your thinking 5:32on, like, why Disney's finally willing to take this risk? 5:35Because it's kind of a big deal for them in 5:37terms of how they think about and control their intellectual 5:39property. It is. And Bob Iger did say something about 5:43this is kind of coming anyway, so we want to 5:46get ahead of it instead of fall prey to it. 5:48And something that I just want to mention is there's 5:50the flip side of the deal, which, yeah, you can 5:52as an individual use SORA to generate things with Disney 5:55images, but Disney is going to stream these kind of 5:59videos. So they're going right back to Disney and they're 6:01trying to basically have control in some way of that 6:04fan generated content and have it come back to Disney 6:07instead of proliferating on X or Blue sky or wherever 6:10you're going to be. So they're trying to say that 6:13before this gets into a whole. Oh yeah, look at 6:15all these wonderful fan generated shorts. No, no, no, come 6:18be on our platform instead. So this also reads to 6:20me to a large extent like a platform play where 6:23they're like, look, this is going to happen anyway, so 6:24let's make sure that people are doing these things on 6:26our platform and constantly coming back to that much tighter 6:29integration with Disney. So that seemed to be important part 6:32of it to me. Yeah, absolutely. And I think, I 6:34mean, Kush, this sort of opens up. I think what 6:37is kind of like a really strange world, right, is 6:39like you just turn on like the Disney Channel at 6:41some point and it's just like infinite user generated, generated 6:45AI cartoons of these characters. Right. Is that, that kind 6:49of seems like where we're headed. Yeah, I think it 6:52is. And for these diehard listeners of the podcast, I 6:58mean, maybe last year at some point I was mentioning 7:01Foucault and the author function and how, I mean, the 7:05whole social contract of authorship is maybe Changing and stuff. 7:08And I think that's exactly it. Right. So when you 7:13have these thousands or millions or billions of fanfiction out 7:16there, the social contract is just completely different. Now. It's 7:20like before in these oral cultures and stuff, you had 7:26these bards singing. They were kind of, I mean, not 7:30trying to make money off of it. They were shepherds 7:33or farmers or shopkeepers or whatever, and they were just 7:35doing it for their own status kind of, or continuing 7:39a tradition. And then with blogs and stuff, this digital 7:43reality kind of came about again. And now with the 7:46AI tools, I think Disney's just going to capture exactly 7:50that same sort of social contract. So as Marina was 7:53saying, I mean, be ahead of the game and kind 7:56of just be part of where the world is headed, 8:00because I think they're still going to make their money 8:03on the amusement parks and the merchandise, licensing and all 8:07of that sort of stuff. So just like whatever authorship 8:11is turning into, I think they just are a part 8:14of it too. Martin, where does this all go ultimately? 8:17Now that kind of Disney has been willing to jump 8:19in. I assume everybody else who owns significant IP is 8:23also kind of thinking, well, is this my time to 8:26get in and cash in on some of this stuff, 8:29to build a partnership with these AI companies? Yeah. I 8:31don't know if you have any thoughts on basically where 8:33this goes next. Yeah, I mean, like you say, Tim, 8:35this completely flips the script because LLMs come out, everybody's 8:40thinking, hey, how did this get trained on my data? 8:44Do I need to sue to get my data paid 8:46for? And so forth. Now this is the exact opposite. 8:48Disney are paying a billion dollars in an equity investment 8:51to open AI with more to come. So it's completely 8:54the opposite way around. So I think this really signals 9:02Like, are the eyeballs now going to go more to 9:04the Disney IP and not to your own ip? So, 9:07yeah, we may even see kind of the complete reverse 9:10of what we've seen before, where everybody is kind of 9:12pushing to get their content into these models now. Yeah. 9:15And I think Marina, I think that was like, one 9:17thing I want to note, end this sort of segment 9:19on is, you know, obviously Disney's Disney. Right. One of 9:22the biggest, most powerful content companies in the world. Do 9:27you have any thoughts on what this means if you're 9:28just like, I don't know, someone creating art online? Because 9:32I feel like they're in a very different position right. 9:34From someone who happens to own the rights to Harry 9:38Potter or something like that. I was going to say 9:41K pop Demon Hunters. Get back to me when. Yes, 9:43that's right. Playing with my daughter, who's obsessed right now. 9:48But I. I could imagine other content owners following suit 9:53and seeing what this is. I mean, I wonder to 9:55what extent. I know it's an exclusive license for a 10:01Meta or whoever for a year for using their content? 10:04Are we going to be chopping things up again? Are 10:06we going to see more mergers of this kind of 10:09thing? I think there's a lot of economics here that 10:11are really interesting and very much not worked out. Like 10:14what Kush said, the contract with creators and what counts 10:19as official and what counts as unofficial, that line's going 10:21to be way harder to tell. And I think that 10:25there's going to be a lot of people involved in 10:28that in the next few years trying to draw those 10:30lines. Well, we'll keep an eye on it. Super useful 10:33getting all your opinions on it. So I'm going to 10:39move us on to our next segment. Time magazine, as 10:42you all know, has an annual Person of the Year 10:45feature that they do, and this year's Person of the 10:47Year. Not a person, but in fact the architects of 10:50AI. And so if you've seen the magazine cover, it's 10:54a take on a kind of classic image of construction 10:56in New York, but it's sort of all of the 10:58kind of various luminaries of AI sitting on like a 11:01construction girder. And I thought this was so interesting and 11:04worth spending some time on because it lets us talk 11:07a little bit about sort of what's been happening this 11:09year in AI, but also just how the media representation 11:12of AI is evolving. I think the big first thing 11:15that stood out to me was you have CEOs and 11:18infrastructure providers, kind of not a whole lot of researchers 11:21are represented here. Yeah. So, yeah, I think, I mean, 11:25Lisa Su was on there, which I think was surprising 11:28for people. Former IBM researcher, by the way. They got 11:31to get that name checked. Exactly. So, yeah, I mean, 11:34I think having the CEOs on there, really, that's the 11:39signal. I mean, architects of AI, like, what are they 11:42architecting? Right. I think it's the financial aspects, the hype, 11:46the business. I mean, that's what's being architected. And if 11:50they had chosen to feature, I mean, the scientists or 11:54even the data workers, I mean, that would have been 11:56a very impactful sort of COVID But yeah, I think 12:01just capitalism is forefronted. I mean, that's what architecture means, 12:05I think. And is that a good thing or a 12:08bad thing? I mean, I'm not going to comment on 12:10that, but I think that's the message at least. Marina, 12:13does. This is going to be a very pointed question 12:16for someone like you. Does. Does research matter anymore? You 12:20know, I mean, in the sense that basically, like, what 12:22kind of. What sort of time is. Time is saying. 12:25Time magazine is saying, And I think it does kind 12:27of capture something which is fundamental about the moment in 12:29AI, which is almost all the action, all the attention 12:32is on the business side. Right. And so I guess 12:36I'm really curious about kind of how you feel about 12:37sort of like this balance of power, I guess, between 12:39like the folks advancing, like the actual technical research and 12:43then I guess this increasingly large ecosystem, which is important, 12:47no doubt, but kind of sits almost entirely aside from 12:51some of what's happening on the latest papers from Neurips 12:54or what have you. I really, really agree with what 12:56Kush said, is that this is a signal that it 12:59hasn't been as much the year of AI as it's 13:00been the year of AI hype. AI communication. AI is 13:04business. AI is the financial deals. Not necessarily so much 13:08the technical side of things. I mean, look, that's interesting. 13:11It's continu. But also, people were saying that 2025 was 13:14gonna be the year where AI agents put all of 13:16us out of work. Not quite there, guys. Still getting 13:19there, still working on it. But the hype this year, 13:22the stories, the way that people talked about it was 13:26ridiculous. And a lot of it really centered on these 13:29cults of personality, these who could say the most ridiculous 13:34things in the news and move the coverage as ping 13:36pong balls from, oh, now it's this company. No, it's 13:38back here. No, it's back here. No, it's back here. 13:40So it was certainly a year of that. And, you 13:44know, I also like Kush. I'm not completely sure what 13:47to make of it. It's reflective of reality. It's maybe 13:50not reflective of where the real work and the interesting 13:53technical work is happening, but you can't deny the reality 14:00which, yeah, it has been this, for better or worse. 14:03Yeah, it has been the hype, basically. Martin, thoughts on 14:07this? I'm hearing some grumbles, maybe from your other panelists. 14:12No, I'm totally on board with what you're saying there, 14:13Brina. It does seem like the year of AI hype 14:16would be the tag for this rather than the year 14:18of the agent. Perhaps. But I mean, look, the article 14:21points out how much of this focuses on infrastructure, how 14:26much it's been on infrastructure this year, how much spending 14:29there is just raw spending on AI data centers and 14:33so forth. They mentioned in the article Over $400 billion 14:35in 2025 just on kind of AI activities, which is 14:40a huge amount. So the article kind of made the 14:43point that are these the next industrial titans? We had 14:47the railroads and so forth, and now it's the data 14:49center. Is this kind of the next thing? And I 14:51think a lot of the focus is on that now. 14:54When I saw that this year it was kind of 15:02I went back and had a look at some of 15:03the other Time magazine persons of the year. And this 15:07is certainly not the first time it has not been 15:10an actual person. And do you know who the person 15:13of the year in 1982 was? No, I do not. 15:17It was the computer popularized by the IBM PC. So 15:22we've kind of come full circle from hey, everyone has 15:25a computer now to hey, now everyone has access to 15:27AI 2025. I don't know if you can talk 2006 15:32you or something. There was just a mirror that's from 15:34your one. I'm going to move us to kind of 15:43our next topic. This was one. I was joking a 15:46few weeks ago. I was like, I am tired. Because 15:49it feels like every few weeks we do a segment 15:51which is a new model's out. What do you think 15:52about it? And in fact the end result, I forget 15:55who commented on a previous episode. And the end result 15:58is basically there's a lot of models coming out all 16:00the time. They're all really good. And after a while, 16:05kind of a lot of the distinctions tend to blend 16:07into one another. But we're going to do a segment 16:10on this. So Nvidia launched its newest generation of its 16:14Nemotron open source models, Nemotron 3. And there's a lot 16:20of stuff that we've kind of come to expect from 16:22some of the model releases that are coming out in 16:23the last few months. They're focusing a lot more on 16:26agentic behavior. There's a spread of models from the very 16:30largest to the smallest. And there's a bunch of kind 16:32of infrastructure and other kind of component accessories they've released 16:37with this generation of models. I do want to get 16:40into that, but I think I want to start a 16:42little bit with just like a business question for some 16:45of our listeners. Kush, I guess I have a question 16:48basically like why isn't Nvidia always winning when it releases 16:52its models? Like doesn't it have the ability to create 16:55models that are ultra optimized for their own hardware that 16:58everybody else runs on? And so kind of implicitly doesn't 17:01it make sense that the Nemotron models would be some 17:04of the most successful models out there? But that doesn't 17:08really seem to be the case. Right. We tend to 17:09focus on a lot of other players in the space 17:12for their models. And so I guess I'm kind of 17:13curious if you can account for Nvidia being this huge 17:16hardware leader. But I think arguably and I think realistically 17:21is not necessarily like a model leader. Yeah, maybe now 17:24they will be actually. So yeah, I think they've been 17:27kind of, I mean moving up the stack. I mean 17:29starting as a GPU company and then having the CUDA 17:33and then part of this announcement was actually an acquisition 17:37of ShedMD and this is work for workload management, sort 17:43of scheduling sort of software and stuff like that. And 17:48so I mean they keep moving up the stack. They're 17:50kind of like consolidating everything as they go and I 17:54think they're really just, I mean it's like a snowball 18:05a long time ago I did an internship at Sun 18:08Microsystems and the key phrase for them was the network 18:13is the computer. So this was John Gage I think 18:17was the person who came up with this. And I 18:19think they're just rolling up to becoming Nvidia. The AI 18:24stack is the computer and I think that's how it 18:28is. And they're kind of controlling the narrative going forward 18:31as well. So I think we just need to keep 18:34an eye on what comes next. Where else do they 18:37go? Yeah, so you're actually saying this is actually a 18:39pretty big deal. I shouldn't necessarily shrug off like eh, 18:42Nemotron 3, much like Nemotron 2. What's another model? You 18:46actually think this is actually significant in some ways? Maybe. 18:48I mean you can say that maybe the models are 18:52being commoditized in some fashion. So what they're doing is 18:55probably the same recipe that others are doing that we're 18:59doing. I mean on the Granite 4 architecture is pretty 19:01much the same. So maybe it's just. Yeah, I mean 19:06connecting everything together is the thing. Marin, if you agree 19:10with Kush. There's almost kind of like a fun, interesting 19:12race going on. Right. It seems like. Which is like 19:15Nvidia moving up the stack, trying to gobble everybody at 19:18the same time. Everybody's trying to like, get off of 19:21Nvidia onto other hardware platforms. I don't know. Do you 19:24agree with Kush's assessment? Where does this all go in 19:26your mind? I think I definitely do agree with Kush 19:28and the full stack play is something that has been 19:32going on. You're right. People coming from different ends, either 19:34coming from the bottom or from the top, and try 19:36and do it. Because the quality of these models by 19:40themselves has gotten very comparable now that it matters so 19:45much more. The levels of integration, the levels of distillation 19:49that you can do for specific things, how do you 19:51integrate them together? How do you test things for yourself, 19:53see if this works for you or not? When everything 19:55gets commoditized like this, it certainly turns once again much 19:58more into, okay, what's the economic play here? And also 20:02what's the ease of use? I think people's expectations for 20:06ease of use and ease of trying it out is 20:08very, very, very high. So I think that also Nvidia 20:12doesn't want to be dependent on other people to choose 20:15it or to not choose it, so they are necessarily 20:18getting ahead of the game. I think this makes a 20:21lot of sense to me. They're not the only ones 20:24that are doing this, so I think it makes a 20:26lot of sense what they're doing. Martin, you may have 20:28comments on this. I think one question also just throw 20:30into the mix is I think a meta story of 20:332025 is what are the bounds of kind of like 20:37an open release in the space? And it feels like 20:39that's kind of one of the questions that's playing where 20:41people used to say, oh, well, we just released the 20:43model and now people are like, well, the model and 20:45the data and Nvidia is here because they're also releasing 20:48a couple training data sets and reinforcement learning libraries. It 20:52feels like, if anything, the scope of Open is getting 20:55broader and broader and broader about what's expected when you 20:58do one of these open releases. And curious if you 21:00have thoughts on the trends there or anything that Kush 21:02and Marina just said. Yeah, I mean, it seems like 21:04the openness is going to expand a lot, especially with 21:07the EU AI act coming in next year. That's going 21:09to require things like stating what your training data set 21:12is and so forth. So that's a big open thing. 21:16That is not Discussed even with many open models. So 21:19that's interesting. Now Tim, you made the point of why 21:24do Nvidia not have the best model? Because they have 21:26the GPUs and shouldn't it just be easy to combine 21:29the two? But it's interesting if you think about what 21:32is probably the top rated frontier model right now is 21:36probably Gemini 3 Pro. Right? And that was trained on 21:40exactly zero Nvidia GPUs. So Google have had the advantage 21:46of using their own hardware. They trained the entire model 21:49on TPU's and it was a massive pre training effort 21:52as well. I mean most of the work seemed to 21:54go into the pre training. So that's an example where 21:56owning the hardware and building the model has really worked 21:59out very well because Google has just been able to 22:01make such advances with that. So it will be interesting 22:03to see how these Nvidia models come along as well, 22:07I guess. Kush, anything you'd want to flag in terms 22:09of model architecture otherwise? I know this is again sort 22:13of multi agent, what everybody else is kind of doing, 22:16but anything unique you think worth pointing out? No, I 22:19mean again, on the narrative point, the architecture is this 22:23hybrid. It has a bunch of transformer layers, it has 22:26some Mamba layers, which is a state space sort of 22:29model for long range dependencies and a mixture of experts. 22:32So our name of our podcast in there, but that 22:38combination is the hype again is kind of saying that 22:43oh, this is something unique and new and great and 22:47it can deliver all sorts of performance and efficiency and 22:51makes sense for agents and stuff. And all of that 22:54is true, I would completely agree with it. But it's 22:58not like they're the first to come out with it. 23:00So I think all of this is, I mean, coming 23:05back to the Time magazine thing, right? I mean it's 23:07like what are you centering? It's the hype of it 23:11in addition to the fundamentals. Yeah, that architecture, that sounds 23:15an awful lot like Grande 4, doesn't it? Kirsch, the 23:18Mamba, the transformer and the mixture of experts all together. 23:22Everybody's been very on message today. We've got lots of 23:25IBM references coming in, I guess. Mario, on a final 23:29note, do you want to do any comparison between this. 23:31And Angranin, the actual architecture itself? As I was reading, 23:34I was like, hold on, am I reading the Nvidia 23:36one or am I reading the website, the grand one? 23:38Because it's awfully familiar and they have the same idea 23:40of multiple models as well, which I think a lot 23:42of these open models are Moving to now. So they 23:44have nano, super and Ultra, where they have 30 billion 23:48is nano and 500 billion is ultra. But because it's 23:51a mixture of experts model, you only use about one 23:54tenth of those parameters at inference time. So that, that 23:5830 billion model, the nano model, actually only uses 3 24:01billion active parameters at inference time. So it does mean 24:04that you can take these models and run them on 24:06some pretty small devices, which I think is quite interesting. 24:12I'm going to move us on to our final topic 24:14of the day. So this is a fun story actually 24:16from a number of weeks back. We had scheduled to 24:19talk about it when the news had broke, but just 24:22there's been so much other things happening in AI that 24:24we're only addressing it now, you know, towards the end 24:27of December. But I did think it was a pretty 24:29interesting story and it's a way to talk about what's 24:31happening in model alignment and model safety. So kind of 24:36a fun little sort of disclosure happened. A kind of 24:40independent researcher was digging around with Claude and sort of 24:44uncovered a document that is used in the training process 24:48that Anthropic calls its Claude Soul Document. And what the 24:53Soul Document is, is basically it's a very long sort 25:04of unique in a couple ways. I think the way 25:07the document is drafted is a lot more sort of 25:10narrative and philosophical than a lot of safety documents that 25:14you might have seen. That said, look, here's a long 25:16list of things that we don't want the model to 25:17do. Maybe I'll turn to Marina. What should we make 25:21of Claude having a Soul Document and what exactly is 25:26going on here from a technical standpoint? What is this 25:28actually being used for? So I think it's very in 25:30line with Anthropic's perspective and the way that they want 25:34people to think about how they develop their models and 25:37their company. And I mean, I like Claude, I've always 25:40kind of liked the Claude model. I like the way 25:42they write and what they do. So I think that 25:46something that's interesting here actually is that this document, if 25:49I understood the coverage correctly, is being used at fine 25:54tuning time, not just something that's in fact in the 25:56prompt. And this is interesting because it's different than I 25:59think what a lot of other models do, which is 26:00just do a whole bunch of fine tuning or RL 26:04techniques that don't have any kind of framing around the 26:07examples. The example is just here's a really Specific task, 26:09really specific question. And then this answer is better than 26:12that answer. I'm simplifying. But you know, you keep going 26:14kind of that way and not having anything around that 26:17of. Oh, and you should think about how to answer 26:20this in this way. In reference to this, in reference 26:21to this, in reference to this. So as a result 26:24of doing it during that time early, earlier in the 26:26stack, think of it that way. This means that this 26:29is reinforced over and over and over and over again 26:32in sort of the model's own parameters. Now, yes, calling 26:36it a sole document is cute, but there is a 26:39sense of almost like a value and a structure of 26:42no matter what the task is, you ought to be 26:44referencing a whole bunch of things beyond just the concrete 26:48question and answer. And I think there's something in that. 26:50And what I think is the case is that a 26:54lot of people maybe do that without being so explicit 26:57about it, where because of the default system prompt that 27:00you choose to have or not have as you train 27:03your model, you kind of maybe have your own version 27:06of that document. But it's maybe very small and it's 27:08maybe not very intentional. Whereas it does seem like what 27:12they have in Anthropic has been deployed in that sense 27:15very intentionally. It does result in a somewhat different personality 27:21of a model because you end up building different biases. 27:24And I use that word in a technical sense. And. 27:27Yeah, so I think it's just a slightly different perspective 27:30on when do you put this information sort of into 27:33the model earlier up the stack. And that itself, I 27:36think is worth looking at, looking into how much of 27:37a difference that makes. Yeah, and I kind of offered 27:39a question, I guess, Marina, which you'll probably have the 27:42answer to is like, why haven't we done sort of 27:45model alignment in this way before? Like, why have we 27:47leaned so much on prompting versus just like working it 27:50into the fine tuning? Because I think what you're saying 27:52is really true, Right. Which is. Well, we kind of 27:54want this set of principles kind of embedded into the 27:56behavior of the model. But I think the fine tuning 27:59thing is a little bit different from how a lot 28:00of people do it. Yeah, maybe Kushal have perspective on 28:03this as well. But I think that one thing is 28:05that it becomes a little bit more difficult to do 28:08things from an evalu evaluation perspective of like, well, if 28:10you have all of these things as also part of 28:13what you're going after, can you really tell that this 28:16particular answer that you're offering is actually better than that 28:18particular answer that you're offering? I wonder how much time 28:20they spent not only on figuring out this document, but 28:23figuring out how they need to change their training data 28:27in order to account that they're actually training in line 28:30with what is there. Because most of the time we 28:33don't do that. And then the data sets that we 28:35rely on or that we construct don't do that. They're 28:38a little more straightforward, they're a little bit more focused, 28:40or actually a lot more focused than that. As we've 28:43been doing this for granite, I mean, we've built up 28:45that experience as well for kind of the safety alignment 28:49or morality alignment or whatever have you. And I think 28:53the, I mean the evaluation is clearly, I mean, part 28:58of this, like how do you know which behavior is 29:02preferred or not? And these sort of things. But I 29:04think there's also the modularity question because once you do 29:08the supervised fine tuning or similar sort of fine tuning 29:12sort of things, it's like fully baked in. And not 29:16every use case is exactly the same, especially for the 29:21types of customers and use cases that we often think 29:24about. So it's just, I mean, maybe like too heavy 29:29handed in some fashion because anthropic in their thing, it 29:35states that they want this to be an expert friend 29:39of some kind. And not every LLM should be your 29:43expert friend. So breaking it up, kind of having the 29:48options to turn things on and off, do things in 29:50the way that makes sense for your use case, I 29:53think is another driving factor. Yeah, totally. And I think 29:56these trade offs get very interesting. I think, think Marina, 29:59your comment about, oh, they're actually giving up some evaluation 30:02ability by doing it this way is pretty interesting. Right? 30:06They're kind of like, well, we have a harder time 30:09measuring this. I don't know if this is what you 30:10meant, but it's like we have a harder time measuring 30:12this, but we think it's more aligned if we do 30:14it this way. Martin, I think Marina was being pretty 30:20nice about this, but I think one of my instincts 30:23on reading this was like, wow, this is so anthropic. 30:26This is a very anthropic Y. This is like the 30:28most anthropic. Yeah. Document I've ever read. And I think 30:32it's easy on some level to kind of like eye 30:34roll and be like, oh yeah, sole document. Very, very 30:37anthropic. But as, yeah, I kind of agree with Marina. 30:39Like out of all the models that are currently operating, 30:41Claude's just the most pleasant to interact with right now. 30:45And I don't know, I guess what I want to 30:47Talk to you about is obviously you have a very 30:49literary bent from your guest appearances on this show. And 30:52I guess I'm really curious about whether or not the 30:54way these documents are written have something to do with 30:57this sort of like. Like very hard to quantify quality 31:01that we like in something like Claude. Yeah. So actually 31:04reading the sole document was so interesting because I agree 31:07with you there, Tim, that the nicest model to chat 31:10to, I've always found, has been Claude. For the last 31:12couple of models, I'm thinking of myself. When I prompt 31:17a large language model, I'll read it back, my prompt, 31:21and I'll think, if I gave this to a human, 31:24am I giving them a decent chance that they can 31:26actually perform the task I'm asking them to do? Am 31:28I giving them enough information? I'm editing a document and 31:31I just respond and say, make it better. Well, that 31:34hasn't really given the model a very good idea of 31:36what to do. Right. So I try to be quite 31:39specific. So I was interested to see what anthropic we're 31:41going to include in this sole document that would really 31:43guide the model. And the part that I read where 31:47it said anthropic generally believes this is from the soul 31:50document, it generally believes it might be building one of 31:53the most transformative and potentially dangerous technologies in human history 31:57yet presses forward anyway. And I'm thinking if I tell 32:01that to a human that I'm building those potentially dangerous 32:05history thing in history ever, how does that human proceed? 32:09I have no idea. So how that model is supposed 32:11to take that information and do something with it? Well, 32:14I'm looking forward to finding out. Krish, maybe I'll give 32:16you the last comment here. And this kind of goes 32:20to what I was just asking Martin about is how 32:23much is what's in here actually impacting model behavior and 32:27how much of it is kind of poetic license or 32:31almost like literary flourish, I feel is like, this document's 32:34a real pleasure to read. But I think where I'm 32:36left with at the end of the day is like, 32:38well, how much of this actually impacts how the model 32:40behaves? Yeah, I mean, I think it does have an 32:43effect, certainly. So could it have been more concise? Maybe, 32:50but I think a few different interesting parts. One is 32:55that they do discuss uncertainty, value uncertainty and calibration and 33:00these sort of things in there. And as people, I 33:04mean, most of the things that we encounter in life, 33:06we're also uncertain about. We don't know what we believe 33:09until we kind of encounter it. And so the Fact 33:14that they're going through all of that sort of uncertainty, 33:18or in this case, in this case, how should you 33:20reason about it, is actually a nice thing. And that 33:25can't be done very concisely. So I think that's an 33:29aspect of it. Another thing I've been waiting this whole 33:33podcast history to talk about a little bit is some 33:37of the moral philosophy aspects as well. So, like, I 33:42think there's a little bit of confusion in, like, it's 33:45very anthropic, as you just said. But if you step 33:48back and look at kind of the. The philosophy of 33:51it, maybe it's trying to do too many things at 33:54the same time. And there's this concept of kind of 33:58dualism and non dualism in a lot of moral philosophy 34:02that is the soul of individuals separate, separate for each 34:07individual, or is it kind of all like universal, all 34:10the same thing? And I don't want to get too 34:12philosophical, but it's important here. And the reason it is 34:15is because every instance that a person uses the model, 34:20it's kind of like a new birth, it's a new 34:23session. And so is this soul document really meant to 34:28be, like, universal, or is it meant to be kind 34:32of individualized for the session? And if it's really meant 34:35to be universal, then why is it talking about being 34:38a brilliant friend? Because every context needs to have a 34:42separate sort of soul in that case. So it's just 34:47very confusing of what the exact goal actually should be. 34:51Well, and so what's the end result? Do you feel 34:53like that's a problem for the model? If you just 34:57use it for this very narrow sort of type of 35:01use, then it's fine. But if it's really a general 35:06purpose technology that you're going to use in a lot 35:09of different situations, then I think it's too prescriptive in 35:14certain ways. Yeah, for sure. Yeah. That's a good reminder. 35:18Question. I didn't know you wanted to talk about that. 35:20We should definitely. I'm going to work out an moe 35:22segment next year where we just do moral philosophy. Yeah, 35:25absolutely. I think it'd make for like a super fascinating 35:28episode. Marina, I thought of one last question, so maybe 35:30we'll actually have the last word go to you. A 35:34few years ago, I know I was very excited about 35:36prompting because I was like, okay, for someone who has 35:39over time become more of a writer than a coder, 35:42this is very exciting. And at the time I had 35:45a couple researcher friends who were like, don't invest too 35:48much in prompting. We're going to figure out how to 35:49automate it. Prompting is going away. It's just a temporary 35:52thing. And here we are at the end of 2025, 35:56the architects of AI have done their thing and AI 35:58is bigger than ever. And if anything, the fact that 36:01anthropic is doubling down so hard on this kind of 36:05document, which specifies, as Kush said, the moral philosophy of 36:08these models, are we going to be living with prompting 36:10for much longer, or is this kind of similarly just 36:13like a temporary thing in your point of view, Prompting. 36:15Is a way to get information into the model. It 36:19is certainly a very simple and straightforward way. It doesn't 36:22mean that you know what effect it has when you 36:23get it in. But I will say that a lot 36:26of times when you go in these larger systems, the 36:29prompting does kind of start to dial down where the 36:32prompting is. Yeah, maybe you specify slightly what you want, 36:34but the way that you actually end up executing is 36:38not via prompt. You end up telling the model roughly, 36:41oh, I kind of want you to do this. And 36:43then there's other intermediate steps. Oh, now I'm going to 36:44do this, this, this, this, this. So out of Prompt 36:46Engineer, you end up with like, you know, agentic flow 36:49engineer. And so, yes, prompting in this sense is going 36:53to be part of it, because then you're trying to 36:54get information into the model in a particular way. But, 36:58yeah, I agree with your friends back then who were 37:00like, yeah, we're going to figure this out, because remember 37:03where we started, we started with you prompted and you 37:05put in two spaces instead of one space, and the 37:07model just like, I don't know what to do. I 37:10don't understand. I don't understand. Okay, we got past that. 37:13But the. Yeah, it's good. We're okay now. But the 37:16base idea remains sort of the same of like, look, 37:18you're trying to get information, and the model either can 37:20kind of figure out what you mean or it can't 37:23quite figure out what you mean. Now, it just won't 37:25tell you that it couldn't figure out what you meant. 37:27And it continues to keep going. But I think we 37:29are going to be moving beyond this in terms of 37:32ways to inject information into the models and as we 37:36go beyond the models, ways to inject the information that 37:38we want into the use of the models. The models 37:41themselves are a means to an end most of the 37:43time. Right. So especially as Kush was referring to enterprise 37:46solutions and real use case solutions are going to be 37:49a means to an end. And at that point in 37:50time, yeah, we're going to be moving beyond something as 37:53fragile as prompting. And also, I guess come to think 37:56of it, like these types of fine tuning approaches as 37:59well, right? Where you take a sole document and try 38:01to fine tune it in. Yes. Yeah. Incredible episode. This 38:04ties together like so many threads from the last 12 38:06months. Martin Kush, Marina, thanks for joining us so late 38:09in December. And that's all the time that we have 38:12for today. If you enjoyed what you heard, you can 38:14get us on Apple podcasts, Spotify and podcast platforms everywhere. 38:17And we'll see you next time week on Mixture of 38:19Experts. Experts.