Voice Rights, Transparency Index, Watsonx
Key Points
- The show opens with Tim Hwang introducing three major AI topics: the Scarlett Johansson‑OpenAI “Sky Voice” controversy, Stanford’s new Foundation Model Transparency Index (FMTI), and IBM’s latest Watsonx announcements highlighting enterprise AI and open‑source trends.
- Panelists Marina Danilevsky, Kate Soule, and Armand Ruiz discuss the ethics and legal implications of OpenAI’s use of a voice eerily similar to Johansson’s after she declined to license her voice, questioning consent, likeness rights, and the broader impact on AI product design.
- The conversation shifts to Stanford’s Center for Research on Foundation Models releasing the updated FMTI, explaining how the index aims to evaluate transparency, accountability, and potential risks of large foundation models for researchers and regulators.
- Finally, the team examines IBM’s aggressive push of Watsonx during the AI announcement season, exploring how the platform’s open‑source strategy and enterprise‑focused tools could shape the future of AI adoption in business environments.
Sections
- AI Ethics, Transparency, Enterprise Trends - The episode preview introduces debates on the Scarlett Johansson‑OpenAI voice controversy, explains Stanford’s latest Foundation Model Transparency Index, and examines IBM’s watsonx announcements shaping the future of enterprise AI.
- Debating “Her” Legacy and Voice AI Ethics - Panelists discuss why the film *Her* remains compelling while debating the ethical and regulatory challenges of modern voice‑cloning technology exemplified by the Scarlett Johansson controversy.
- Choosing Appropriate Voice for AI - The speakers argue that, like visualizations must transparently reflect data uncertainty, AI assistants should convey their nature through thoughtfully selected vocal styles—criticizing overly flirtatious human voices and emphasizing cultural priors that favor clearer, more robotic or transparent tones.
- Risks of AI as Personal Counselors - The speakers caution that using fluent, voice‑cloned language models for informal therapy or companionship raises trust and ethical concerns, urging providers to embed caveats, responsibility, and safeguards against such misuse.
- Discussing the Foundation Model Transparency Index - Tim introduces Stanford’s annual FMTI and asks Kate to explain the index’s structure and IBM’s involvement in assessing model transparency.
- Transparency Index for AI Governance - The speaker questions the secretive nature of AI model development and proposes an index to incentivize openness, asking the panel whether initiatives like FMTI are effectively promoting industry transparency and how far such efforts can realistically go.
- Transparency as AI Differentiator - The speakers note that as benchmark scores plateau, firms are turning to model transparency, trustworthiness, and governance as new competitive advantages, a trend emphasized in recent IBM client discussions.
- Commercial Pressures on FMTI Adoption - The speaker expresses concern that growing industry reliance on the extensive FMTI index may lead buyers to push for simpler, narrower criteria, undermining its broad transparency goals.
- IBM Think Week AI Announcements - Armand provides a rapid overview of IBM's latest AI platform highlights from Think Week, covering open‑source Granite models, the Instruct Lab customization suite, and a new partnership with Mistral.
- IBM Governance & Open‑Source Granite - The speaker describes IBM’s Watsonx governance integration with AWS SageMaker’s MLOps platform for regulatory compliance and risk management, then announces the open‑source release of Granite code models in several sizes.
- Open‑Source Limits for Massive AI Models - The dialogue debates whether rising pre‑training expenses will eventually prevent companies from open‑sourcing next‑generation models, with one participant challenging the notion that open‑source has a ceiling.
- Fine‑Tuning Takes the Spotlight - Tim explains that the AI race is shifting from massive pre‑training dominance to fine‑tuning and alignment, turning previously low‑prestige work into the new high‑value expertise.
- Open-Source LLM Community Enhancements - Armand Ruiz explains how contributors swiftly extended LLaMA 3’s context window—an often‑overlooked but powerful open‑source innovation—while the hosts wrap up by inviting listeners to suggest future discussions on AI agents.
Full Transcript
# Voice Rights, Transparency Index, Watsonx **Source:** [https://www.youtube.com/watch?v=F0FHMakREDM](https://www.youtube.com/watch?v=F0FHMakREDM) **Duration:** 00:38:50 ## Summary - The show opens with Tim Hwang introducing three major AI topics: the Scarlett Johansson‑OpenAI “Sky Voice” controversy, Stanford’s new Foundation Model Transparency Index (FMTI), and IBM’s latest Watsonx announcements highlighting enterprise AI and open‑source trends. - Panelists Marina Danilevsky, Kate Soule, and Armand Ruiz discuss the ethics and legal implications of OpenAI’s use of a voice eerily similar to Johansson’s after she declined to license her voice, questioning consent, likeness rights, and the broader impact on AI product design. - The conversation shifts to Stanford’s Center for Research on Foundation Models releasing the updated FMTI, explaining how the index aims to evaluate transparency, accountability, and potential risks of large foundation models for researchers and regulators. - Finally, the team examines IBM’s aggressive push of Watsonx during the AI announcement season, exploring how the platform’s open‑source strategy and enterprise‑focused tools could shape the future of AI adoption in business environments. ## Sections - [00:00:00](https://www.youtube.com/watch?v=F0FHMakREDM&t=0s) **AI Ethics, Transparency, Enterprise Trends** - The episode preview introduces debates on the Scarlett Johansson‑OpenAI voice controversy, explains Stanford’s latest Foundation Model Transparency Index, and examines IBM’s watsonx announcements shaping the future of enterprise AI. - [00:03:07](https://www.youtube.com/watch?v=F0FHMakREDM&t=187s) **Debating “Her” Legacy and Voice AI Ethics** - Panelists discuss why the film *Her* remains compelling while debating the ethical and regulatory challenges of modern voice‑cloning technology exemplified by the Scarlett Johansson controversy. - [00:06:18](https://www.youtube.com/watch?v=F0FHMakREDM&t=378s) **Choosing Appropriate Voice for AI** - The speakers argue that, like visualizations must transparently reflect data uncertainty, AI assistants should convey their nature through thoughtfully selected vocal styles—criticizing overly flirtatious human voices and emphasizing cultural priors that favor clearer, more robotic or transparent tones. - [00:09:28](https://www.youtube.com/watch?v=F0FHMakREDM&t=568s) **Risks of AI as Personal Counselors** - The speakers caution that using fluent, voice‑cloned language models for informal therapy or companionship raises trust and ethical concerns, urging providers to embed caveats, responsibility, and safeguards against such misuse. - [00:12:37](https://www.youtube.com/watch?v=F0FHMakREDM&t=757s) **Discussing the Foundation Model Transparency Index** - Tim introduces Stanford’s annual FMTI and asks Kate to explain the index’s structure and IBM’s involvement in assessing model transparency. - [00:15:39](https://www.youtube.com/watch?v=F0FHMakREDM&t=939s) **Transparency Index for AI Governance** - The speaker questions the secretive nature of AI model development and proposes an index to incentivize openness, asking the panel whether initiatives like FMTI are effectively promoting industry transparency and how far such efforts can realistically go. - [00:18:51](https://www.youtube.com/watch?v=F0FHMakREDM&t=1131s) **Transparency as AI Differentiator** - The speakers note that as benchmark scores plateau, firms are turning to model transparency, trustworthiness, and governance as new competitive advantages, a trend emphasized in recent IBM client discussions. - [00:21:57](https://www.youtube.com/watch?v=F0FHMakREDM&t=1317s) **Commercial Pressures on FMTI Adoption** - The speaker expresses concern that growing industry reliance on the extensive FMTI index may lead buyers to push for simpler, narrower criteria, undermining its broad transparency goals. - [00:25:04](https://www.youtube.com/watch?v=F0FHMakREDM&t=1504s) **IBM Think Week AI Announcements** - Armand provides a rapid overview of IBM's latest AI platform highlights from Think Week, covering open‑source Granite models, the Instruct Lab customization suite, and a new partnership with Mistral. - [00:28:07](https://www.youtube.com/watch?v=F0FHMakREDM&t=1687s) **IBM Governance & Open‑Source Granite** - The speaker describes IBM’s Watsonx governance integration with AWS SageMaker’s MLOps platform for regulatory compliance and risk management, then announces the open‑source release of Granite code models in several sizes. - [00:31:11](https://www.youtube.com/watch?v=F0FHMakREDM&t=1871s) **Open‑Source Limits for Massive AI Models** - The dialogue debates whether rising pre‑training expenses will eventually prevent companies from open‑sourcing next‑generation models, with one participant challenging the notion that open‑source has a ceiling. - [00:34:21](https://www.youtube.com/watch?v=F0FHMakREDM&t=2061s) **Fine‑Tuning Takes the Spotlight** - Tim explains that the AI race is shifting from massive pre‑training dominance to fine‑tuning and alignment, turning previously low‑prestige work into the new high‑value expertise. - [00:37:26](https://www.youtube.com/watch?v=F0FHMakREDM&t=2246s) **Open-Source LLM Community Enhancements** - Armand Ruiz explains how contributors swiftly extended LLaMA 3’s context window—an often‑overlooked but powerful open‑source innovation—while the hosts wrap up by inviting listeners to suggest future discussions on AI agents. ## Full Transcript
Tim Hwang: Hello and welcome to Mixture of Experts.
I'm your host Tim Hwang.
Each week, Mixture of Experts brings together a brilliant team
of researchers, product experts, engineers, and more working at the
cutting edge of artificial intelligence.
We debate, distill, and discuss down the biggest news of the week in AI,
from product announcements and the hottest papers on archive to industry
gossip and NVIDIA stock price.
This week, three stories.
First up, Scarlett Johansson versus OpenAI, the Sky Voice Controversy.
Who's right, who's wrong, and what does it tell us about where things are
going in the design of AI products?
Second, who's afraid of the FMTI?
The Center for Research on Foundation Models at Stanford
University have released the latest edition of their Foundation
Model Transparency Index, or FMTI.
What is it, and why does it matter?
And then finally, last but not least, it's announcement season, uh, and announcement
season continues with IBM Think hot on the heels of OpenAI and Google.
watsonx is seeing a bunch of major announcements, what to tell us about
the future of AI and enterprise and more specifically about the
future of open source in enterprise.
So the panelists, as always, I'm joined by an S tier level of, uh,
set of panelists for us today.
First off, Marina Danilevsky, a senior research scientist.
Welcome back to the show.
Marina Danilevsky: Happy to be here.
Tim Hwang: Kate Soule, Program Director, Generative AI Research.
Thanks and welcome to the show.
Kate Soule: Great to be here, Tim.
Tim Hwang: And finally, Armand Ruiz, Vice President, Product
Management on the AI Platform.
Armand Ruiz: Thank you so much.
Hi, everybody.
Tim Hwang: I want to tackle our kind of first story, which was sort of
the hot news of the week, uh, the ScarJo versus OpenAI controversy.
So hot on the heels of GPT 4o
announcements the other week, um, and basically Sam Altman simply tweeting
her, there had already been a lot of major speculation that essentially
the Spike Jonze film from about a decade ago, Her, um, was somehow
weirdly ending up being the template for OpenAI's, uh, product development.
Uh, and all of this kind of took a major turn when Scarlett Johansson
herself, uh, released a public statement saying that OpenAI had
approached her to use her voice.
Uh, and then when she had refused, had proceeded to release a, a similar one,
a, a kind of stunningly similar one.
In fact, so similar that people had been like, that sounds like
Scarlett Johansson, uh, when OpenAI was, uh, demoing, uh, GPT 4o.
um, uh, the other week.
And so I think the main question, you know, I think we can get into the who's
right, who's wrong here, but I think the kind of first question that I want,
wanted the kind of panel to opine on is.
You know, the unbelievable thing for me is like, Her is like a movie that's
like a decade old and that like, Sarah Johansson is still like the cultural
template for like the kind of assistive technologies that people are working
on today to the point that like one of the leading companies in the space
almost explicitly is like really still using that movie as kind of like a
template for their product development.
And I guess I'm kind of curious if any of you have kind of thoughts about like the
persistence of like the vision in Her.
Um, and why it.
still so kind of compelling today, or if actually, you know, you think
it's actually kind of silly that people think that it is so compelling.
Um, uh, I don't know if any, you have any thoughts.
I mean, K Armand, you're new to the show, but I don't know if you would want to
jump in first with some thoughts on that.
Armand Ruiz: Uh, I can, I can start.
I mean, uh, look, From my perspective and, um, my, most of my conversations are
always with, uh, within an enterprise set up, but, uh, everything related to voice
imitation and, uh, which is a technology that has been progressing a lot in the
last few years, uh, is, is a big concern.
Um, and I think that's why we're, we're seeing this acceleration on, on
regulations, because these examples are just freaking people out, honestly.
Right. And.
And, uh, we, we, we need to be careful, especially when companies
like, uh, OpenAI, that they have so much, um, reach and hype around them.
And, by the way, that demo was spectacular in every single sense.
And it's a little bit sad that all we're talking about is, is this controversy.
Uh, on, on the resemblance, on the voice with Scarlett Johansson, I think we, there
was this opportunity to just pick another voice or make it, uh, less closer to, to
her, given like that they tried actually to, to, um, to get her voice officially in
the, in the system and it didn't work out.
I don't know.
What do you think?
Kate Soule: Yeah.
I mean, I, I think a lot of the draw to having a Scarlett Johansson type voice
is really, you know, an attempt to try and get trust and comfort with these
systems that, you know, that's where I think a lot of the initial ambitions lay.
But if you think about it, like these models are, are tools,
they're not, they're not people, they're not humans, they're tools.
And is this really the right tone?
And even, even, I mean, there are huge issues on data rights to consider, but
even that aside, is this the right tone and mode in which to actually communicate
the value and that these tools can offer?
Tim Hwang: Yeah, for sure.
I think that's kind of one of the funny things.
I mean, to first respond to Armand, I think like, um, you know, everybody
I know who is like more in the machine learning space saw the
demo and they're like, low latency.
It's crazy.
Right.
And then everybody else who kind of saw the demo who are less in
the AI space are like, it's her.
And it's sort of interesting, like what people pick up from demos,
depending on their level of.
Familiar with uh, the technology.
But I guess, Kate, we'd love to kind of go into the point you just made a little
bit more, you know, I think there's a kind of question of like, should we
be imitating Her in the first place?
Like I think Her is kind of such a fascinating movie, cause I, I watched
it again recently, because I've been talking to a bunch of people being like,
oh yeah, it's like a great product vision like And then you watch her and like the
whole point of her is like, this is a bad direction for technology to be going down.
And so it's like very strange to me that like, you know, it's become
a template, uh, in some ways.
Um, and I guess the kind of pick up would, is sort of what you're saying that like
we, we actually might not want technology companies to really kind of, imitate like
a human companion, like that there's some ethical concerns that you have around
that, or, um, or maybe your point is actually maybe in a different direction.
Kate Soule: Well, you know, there's a, a principle in data storytelling and
data visualizations that, you know, the, what you visualize should reflect,
uh, the data and how it was created.
And so if you're uncertain and there's uncertainty, you should visualize error
bars, for example, you know, and I think.
A similar principle applies for large language models, like the mechanism
and the mode of how you communicate the results that the model is saying
and the tone and intonation, everything that you're doing is providing a
lot of information for the user, whether you realize it or not.
And I don't think that human voices should be off the table, but you know,
very flirtatious female human voices for something that's meant to be a
tool and an assistant, you know, is that really the right thing to do.
mode and mechanism.
And I think there's a right way and a wrong way.
And it's, you know, it's sometimes hard to define exactly what correct and wrong
is, but you know, this one seems to lean a little bit too far to the wrong side.
Tim Hwang: Definitely.
And it's sort of interesting because I think like when you get to the realm of
voice, you really are working on like people's, kind of cultural priors, right?
Like, you can imagine a voice which was like very like sci fi robotic.
It's like very arbitrary what voice we wanted to produce.
And like, you know, I guess in that sense, maybe it communicates more
that it is a computer that you're talking with versus a versus a person.
Um,
Kate Soule: and there's like ways to earn trust, right?
In systems and to make people feel more comfortable.
But like, do you, you know, there's also real reasons to have some skepticism
of what you're hearing from models.
And there's, you know, proper ways to go about, you know, showing that
models are not confident and that there are risks and things that should
be evaluated objectively by humans.
And if we're just being told something in a very trusting, loving voice, then,
you know, are we really doing our due diligence here as model providers and
giving our customers the right, you know, putting them in the right mindset of how
to use these models in a responsible way?
Tim Hwang: Totally.
So, Marina, I want to bring you in.
I know you're a veteran to this show, but I think one of the reasons
I was very excited to have you back on was last time you were talking
about Inspector Ragget, right?
And I think the conversation that we had at that point was, well, how
do we know that RAG is doing well?
We need to build kind of like a dashboard experience for people to
kind of monitor and understand whether or not they They should trust the
results coming out of a RAG process.
And I guess as someone who has like worked so deeply with that as a method, right?
Like the dashboard as the way you establish trust versus like
the voice as the interface.
Are you, how do you feel about voice?
Are you like kind of suspicious about it?
Like I get it.
I'm getting suspicious vibes from Kate, but I'm kind of curious about like how
you kind of navigate this as we think about like all the different interfaces
we can have in sort of assessing like model trust essentially, which it
really seems like we're talking about.
Marina Danilevsky: Sure, I will say I don't think that the, the dashboard is the
only way and it tends to be the kind of thing that is, again, more understandable
to model developers, um, normal folks don't understand it and actually often I
think it's a way to have even less trust because if you have somebody who's not
technical, they're going to look at it and be like, what am I supposed to do with all
these numbers, all of this, all of that?
Like, tell me sort of at the end of the day.
So I really, really agree with what it.
Kate is saying, which is that it's important in how you deliver the
information at the end, finally, to the end user, you should give on in
terms that are obvious to the person receiving it, whether you should
trust and how you should take it.
So like one direction that's a little bit worrying is the amount of, uh, people
using these language models, for example, as ad hoc psychologists or ad hoc friends
or girlfriends or anything of that kind.
So now we're going to make sure that now we have that in
Scarlett Johansson's voice.
That seems again, not maybe.
The right direction to go.
Let's just Tim Hwang: pour some gasoline
Marina Danilevsky: on this.
And certainly not in the enterprise setting.
So voice is great, uh, but it should also be a way to communicate, just as in text,
are, you know, to what extent should I be really trusting what you say and can
you give me the appropriate caveats?
There's a responsibility here.
Just as when you read something that reads extremely fluent, you hear
something and it's extremely fluent.
It's got that affect and it sounds human.
Of course, you're going to have a tendency.
To, to take it in a particular way.
I think there's a lot of responsibility on the people providing these models
to, to, to do that accordingly.
Tim Hwang: Yeah. I never really, oh, yeah, Armand, go
Armand Ruiz: ahead. Tim Hwang: Yeah,
Armand Ruiz: I just wanna add, uh, two quick things.
One, I think, um, maybe it's a little bit controversial,
but I'm gonna say it anyway.
Please do. Uh.
Here we are talking about it, right?
So I think Sam Altman is like Elon Musk, uh, they, they know very well, they are
very smart how to market their products.
And, and they had the Google conference right after this, their, their event.
And they always find ways to be on the headlines.
One way or the other.
So I think that they like a little bit of the controversy.
I think maybe this one is getting a little bit out of control, but it's
not the first one that they faced.
Um, and on the other hand, I think there is also, uh, about voice.
I think voice is, it's been the promise for AI for many years with Siri,
with Alexa, but it was low latency.
It was, It's very robotic.
So that demo and the Google demo, and there was a similar demo a few years
ago, very researchy from Google that was showing already like a more natural voice.
And at some point it's going to always be a problem.
Any voice they put is going to resemble to someone else's voice.
So this is a very difficult conversation in this case is because we're talking
about a celebrity and, and voice cloning from celebrities is going to be a problem.
But we will always have these problems that these voices will
resemble, uh, someone else.
Marina Danilevsky: Actually, I wanted to respond to something that
you said, um, I don't like Elon Musk and Sam Altman speaking for
all of us that are working in AI.
They like controversy.
They, they're kind of, they're very, you know, bro kind of guys.
Okay, great.
But this assumption that, you know, all publicity is good publicity and as long
as you're talking about me, that's great.
It's not.
Reflective, I think, a lot of us that are here, and also the idea
of, well, why wouldn't Scarlett Johansson agree to be the voice?
She should be honored that she was asked.
She was in a sci fi movie about this.
So, clearly, this is the same thing.
That level of assumption and that level of, well, it should be an
honor to participate in anything that I do, that leaves a very
bad taste in a lot of our mouths.
And I just want to, say a lot of us are not pleased and do
not, that doesn't represent us.
Tim Hwang: So we'll move on actually, because we have three topics to get to.
So I'm going to bring up the second topic and Kate, I'll bring you in
to kind of lead us on this, but just to kind of quickly tee us up.
Um, so this is a big week.
Um, there's a group called the Center for Research on Foundation
Models, uh, at Stanford university.
Um, Uh, Percy Lang and a number of his collaborators there have been working
for some time on something they call the Foundation Model Transparency
Index, or FMTI for short, um, and it effectively is kind of this annual
index they're doing of like leading foundation models evaluating effectively
their commitment to transparency.
And I guess, um, Kay, I figured, you know, just for our listeners, It
worth it kind of talk a little bit about what it is in the first place.
Um, and then I know you were actually working in a, in a pretty
deep way on this just recently.
So we'd love to kind of hear about sort of your involvement and sort
of IBM's involvement in the FMTI.
Kate Soule: Yeah, absolutely.
So Stanford's report in the transparency index is, uh, a compilation of a
hundred different questions that they basically ask model providers to
understand uh, how transparent and open they are across the model life cycle.
So they look at everything from upstream, how is the data curated, what rights do
you have to the data, are you transparent about what data you use, To the model
itself as the second main category.
So have you evaluated your model for different risks?
Do you describe those risks?
Do you provide mitigations for those risks?
And then also to the downstream uses, like, do you, are you clear where the
usage and policies, do you talk about how you would enforce those usage
policies, where are your models being used and, and those types of applications?
So what it does really well, and what I really appreciate about it is it, It's
not trying to evaluate and say, this is an unbiased model or this score.
If you score well, that means your models are safe.
What it's doing is trying to look at how open our model providers
about their own technology.
Are people actually sharing what they've built, sharing the degree to which
they've tested different safety aspects?
Um, and are sharing those with their own customers or not.
And you know, that's something that I I'm really, really passionate about
in the entire team here for IBM that trains granite models have strongly felt
we need to show up very strongly on.
So, uh, our granite models were ranked in this report.
We're really excited.
We came in fourth overall.
Uh, and especially on the upstream, like all of the data collection, uh,
and all the work that we do on the curation and transparency around what
data goes into our models, we were one of the top scoring model providers.
So the, the granite model showed up very well in that report.
And we're really excited, excited by those results.
Tim Hwang: Yeah, congratulations.
I know it was like very competitive actually, like the number of
companies and sort of models that they were covering was like very vast.
Kate Soule: I mean, they cover the top 14 or so model providers.
This is the second time they did the report.
The first time was back in October and they looked at the top eight.
Uh, or so, and, um, it's, it's a really, really exciting area.
Tim Hwang: Yeah, for sure.
So, I think there's a bunch of interesting questions I want to kind
of talk to the panel about, really kind of about sort of like governance in the
universe of AI, because I think there's sort of two very interesting things
going on that I see in FMTI, right?
Like, I think one of them is, uh, You know, even a few years ago, and
it still kind of is this way, right?
Like, I think, like, a lot of the process of, you know, pre training,
fine tuning models is, like, shrouded in mystery, where people are like, oh,
well, you know, I heard they have, like, this thing that they do in the recipe
that really gets these great results.
And so, like, a lot of the way AI development has proceeded in
the past has been very secretive.
It's been the realm of, like, trade secrets.
Um, and so I think like one of the interesting things here is like, can
we create an index that kind of creates sort of like a race to the top, right?
Like avoid a world in which everybody's incredibly closed, uh, about their model.
And yeah.
You know, I guess I'm, I'm curious, you know, uh, Marina, like, uh, as
a researcher in the space, you know, do you feel like it's working, right?
Like the strategy, like, do you think FMTI is like helping to improve
or like encourage companies to be more transparent in the space?
Um, and, and if so, I'm kind of curious about like how far you think it will go?
Because presumably at some point all companies will be like, well, we're never,
can't definitely tell you about that.
Right.
And so I think we're kind of playing with this line about like, what
do companies owe to the public when they release these models?
And, um, yeah, I was just kind of curious as someone who's kind of
like a researcher researcher in the trenches thinking about this, um, how
you sort of see this type of effort.
Marina Danilevsky: Sure.
So I'll start again with agreeing with Kate that this at least
encourages people because they see, oh, other companies are saying stuff.
So it's maybe okay or good for PR or for adoption reasons for me to also say stuff.
First, everybody had to get to a certain point of So when nobody could
figure out how to get the models to a certain point of quality, no one was
going to say anything, just in case they came up with the right secret
sauce and I'm not going to share.
As people, I think, start to get the technology to be a little bit more,
uh, evolved, a little bit more mature, okay, now there are reasons, including
economic ones, of why you'd want to share, because that'll be the kind of
thing that your clients or customers will, you know, pick you over somebody
else because of aspects of this.
So this kind of.
public, uh, pressure of, well, they did this, so I'm going to do this.
This pit is actually really good.
Um, there's going to be a limit to how far it's going to go.
Of course, nobody's going to share, uh, like customer data or also anything that
might get them bad PR or anything that might get them potentially in trouble.
Uh, legal waters or anything of the kind.
Sure, right, yeah, yeah. People won't share.
But overall, it's a good trend.
It speaks to the, um, evolving maturity of the field to me.
So that's, that's, that's how I see sort of that back and forth going.
Kate Soule: I mean, you can see it in the scores too.
Like the scores from October to, uh, this, this year.
Latest report may have gone up across everyone who was evaluated.
Like I think there's this like safety in numbers where also given the regulations
are still evolving and everyone, you know, a lot of case laws still evolving,
people are kind of testing the waters, but as more and more, uh, results are
shared and, and people are more and more transparent, it gives confidence
for, for more people to do the same.
Tim Hwang: It does sort of feel like, like things like transparency, right,
or like things like, you know, chatbot arena, they're in some ways kind of
like of a piece, which is that like early on everybody was sort of like
competing against these like benchmarks.
And essentially like as the benchmarks have become like more and more saturated,
now it seems like everybody's trying to differentiate in different ways.
And like one of the differentiations is like, is your model transparent
or not, right, which is almost like a factor that sits, you know,
somewhat on top of the model, but also like outside of it as well.
Marina Danilevsky: I was just going to say, if you actually look at the
places where the scores are still low, Um, I think, uh, off the top
of my head, I know it's about like evaluation of model trustworthiness.
How do you do it?
And then also downstream applications.
These are things that we're not yet very good at, do not understand yet very well.
So those are the cards that people are still kind of holding a little
closer to their chest in case again, this turns into a differentiator.
So again, it just makes the point of lots of the scores are improved.
It's very interesting to see the ones that haven't, because it again, gives a sense
of, you know, what is the confidence and the maturity of, uh, of the technology.
Armand Ruiz: Yeah, for sure.
I'll add, uh, this, uh, in the conference IBM think this week.
I maybe talk to 50 plus customers um, governance, transparency, trust.
Uh, it was in every single discussion.
Um, and over the last year, Granite and the work that we're doing at IBM.
Uh, is really a differentiator.
Our customers really appreciate that.
Um, and in fact, if you follow me on LinkedIn, I'm extremely open and I
published a few times the research paper from Granite, which I recommend everyone
to go check because it explains very well where they went to train the model and
the data collection process, the data preprocessing, we've been extremely open.
You won't find a paper with such openness on what he went to train the model.
Um, so that is actually.
becoming more and more important because companies, they, they
take, they take these models as a base model, and then they mix it
up with their own enterprise data.
So you need to get a very good base model that you can trust, uh, if
you are going to mix it up with your own data to get, uh, outcomes.
Yeah, Armand,
Tim Hwang: you're actually like, you were, uh, you're anticipating my question
because I think one of the things, uh, I have, I think is as a, you know,
point of point of critique as well.
Okay.
There's a bunch of academics at Stanford, right?
Like, is this actually impacting how business is behaving?
It sounds like your answer is yes.
Like, actually, it turns out that, like, companies are there looking at
the, you know, the FMTI being like, well, this actually is relevant to
my purchasing decision, which is really, really pretty interesting.
Kate Soule: Well, just to build on your point a little bit, Armand,
it's like, you know, it's not like you're just taking these models
and then adding a layer on top.
Like when a model provides a response, you can't pinpoint back that the model is
using your data or it's pre training data.
You know, it all basically, you know, goes into a blender and comes
out mixed together the other end.
So just because you're going through applications with your own data
and using rag patterns or even fine tuning and other things, it doesn't
mean that you have control over all of the history you're inheriting and
baggage and skeletons and closets you could potentially be inheriting
when using some of these models.
Tim Hwang: Yeah, totally.
So Kate, final question.
I'm curious if you have any thoughts on is, you know, kind of almost talking
about the trend is like, where does things like FMTI go into the future?
Um, and I guess I want to kind of relay a fear and maybe you can relay my fear.
Uh, but I don't know if you agree is basically like, you know, I agree.
I think one of the nice things about FMTI, though, having been in a company that
received an FMTI request was like, Oh, you know, it's basically like they have not.
Take it any shortcuts, right?
They're basically like, how do we know whether or not you're transparent?
I don't know.
It depends on how well you do against these a hundred indicators, right?
Which is like a massive It's like a project in order to take on responding
to the FMTI Um, and I guess I'm kind of curious and maybe a little bit worried
about kind of the commercial pressures on indices like FMTI What I mean by
that is You know, if you're a B2B buyer, you're an enterprise like on the
market looking for, you know, a model to use for your internal operations.
A hundred indicators is like a lot, you know, like there's a reason
people go to Wirecutter and they're like, Oh, I will buy the same
fridge that everybody else has.
They're like, I'll buy the same, You know, wire management, everybody else says.
Do you worry at all that, like, as these models, or these indices become
more and more used by industry, like by businesses to make purchasing
decisions, we'll also see kind of, like, pressure to kind of, like, narrow.
Like, people do ultimately just want, like, transparent or not.
Or like, you know, rubber, yeah.
And so, you know.
If you do agree with that, like, I'm kind of curious if you have thoughts on, like,
how do we keep that aperture open, right?
Because it's important for us to, like, keep the kind of transparency
that it sounds like Percy and team is really chasing after here.
So, and a couple of thoughts there, but curious about,
like, how you'd navigate that.
Kate Soule: I mean, I think particularly when regulations start to come into
act, you know, there's going to be tremendous pressure to be able to put
a, you know, a rubber stamp on something and say it's compliant or not compliant.
Um, and there, so there will certainly be pressure along those lines,
but you know, Similar to, it's a similar risk that you bring up that
you have with also gamification.
Like people are going to just start optimizing for a couple of key
things and how do we make sure that we continue to push forward and to
drive how we're, we're innovating.
And I really think it comes down to making sure that we're
continuing to keep pushing forward.
pace on how we define transparency and safety in models with how
fast this technology is growing.
So if you look at how we looked at large language models a year ago and
what was considered state of the art and what was considered safe versus
not safe a year ago compared to today is an entirely different story.
And it's going to need to continue to evolve.
And we need researchers like those at Stanford helping us articulate to what
those risks are coming up with more, uh, nuanced ways as, as some of these
metrics and indices become saturated.
Everyone's always.
You know, sharing all this information, you know, maybe we can shift our focus
into some of these new emerging things that we need to continue to keep in mind.
Tim Hwang: Yeah.
And I think that will be the new game is like the initial task is like
getting the companies to do this.
And then now that sort of the challenge is like, well, we don't want you to game it.
So like the criteria also may also become this kind of game where it's like, well,
we won't tell you what the criteria are for certain types of indicators.
You order to kind of retain the benefit of the, the Sigma.
But I, yeah, I think
Kate Soule: that's to our benefit as a field, right?
Totally.
If, if we don't have some sort of incentive to keep innovating, um, then
you know it's gonna become stagnant.
So, uh, certainly welcome it.
Tim Hwang: Um, so I'll move us on to our last topic here.
And Armand, I'm gonna give you center stage, uh, 'cause it sounds
like you were, had a really busy week presenting all of this.
Um, so, uh, this was, uh, IBM's think week.
Um, it continues the season of announcements.
Everybody's announcing AI.
stuff right now.
And so I guess, um, you know, I think just as an open question, Armand, do
you want to just kind of quickly give a thumbnail sketch of everything?
I know in particular, you were very excited about the
announcements around watsonx.
Um, and if you just want to give like a thumbnail sketch to our listeners about
what was announced, because I read the blog post and I was like, this is going
to be a lot to cover in 15 minutes.
Um, but I think you as an expert would probably be able to put us
best on track on, you know, what people should be paying attention to.
Armand Ruiz: Yeah, there is so much.
Uh, I'm going to be a little bit selfish and talk about the area,
uh, of what's so nice that I cover, which is the AI platform part.
Um, I will start with, with, uh, Granite models that, uh, they are now open source
and I'll let Kate elaborate on that.
So, um, but we're really excited, uh, to, to just jump
into the open source movement.
And then we, we have something called instruct lab that helps customize models.
So, um, I'll let, um, explain, uh, Kate will explain that a lot better.
She's been driving a lot of that from the research angle.
Then, um, we announced a very exciting partner partnership with Mistral as well.
And we already had the Mistral open source models in, in our platform.
And we offer that to our customers.
Now we also have the Mistral commercial models.
That includes Mistral large and Mistral small.
And that is, I mean, we, we, we love Mistral and now we're gonna
be able to offer that to customers in the cloud and on prem as well.
And specifically in Europe, that's gonna be a very big hit.
Um, we are We released a lot of features.
So many features.
One I would like to highlight is, for example, chat with your documents.
The classic RAG use case, we introduced a user interface that is very easy to add
documents or point to a vector database.
You can have thousands of documents there.
And in just a few clicks, you can create your own chat interface to talk to the
documents that will pinpoint directly to the reference and to the citations.
And then you can export that as an application or as an endpoint that you
can integrate with your own applications.
So, um, there are a lot of, uh, tools like that that will make the development
of, uh, solutions, um, very flexible.
The last one I would like to highlight is two more I would like to highlight.
One is a toolkit that we're releasing in tech preview for application
developers to make it extremely easy to develop Gen AI applications.
So we're all the time talking about LLMs, but LLMs don't make
applications and solutions.
LLMs are just one component.
And coming back to the RAG use case, you need an embeddings model.
You need a vector database.
You need a to chunk the data.
So you need a lot of different things.
So we, we believe we were creating one of the best toolkits for developers
to make the development of those use cases extremely simple with a lot
of templates and access to tools.
And the last one on the governance.
On the governance, we also release a lot of stuff.
I'll, I'll highlight, um, What we're doing with AWS, we have a very good
partnership with AWS and SageMaker is a very popular tool for, um, for,
um, enterprise, uh, to just build and deploy machine learning models.
And now you can govern all those models directly with watsonx.
Governance.
That means you can have like your full central panel, MLOps panel, And, and
then you, you have those components of regulatory compliance and risk management
to make sure those models perform.
So those are just a few, but there is, there are so many new
assistance and other features, but maybe let me hand it over to Kate.
You can explain the open source angle on Granite and InstructLab.
Tim Hwang: Yeah.
Do you really mean it?
Before Kate, you jump in.
It's, it does feel like.
It's like open source, like the big message is you're all in on open source.
Yeah,
Kate Soule: it was really exciting to be there.
You know, it was hosted in my hometown this year, which was
really fun, uh, in Boston.
And the message across every single presentation was IBM
is all in on open source.
And so it was really, really exciting to be there and to be part
of the announcement for the Granite code models that we released.
So we open sourced eight state of the art granite code models two variants, um
for four different model sizes a 3b and 8b Uh 34 billion parameter models And
especially the 8 billion parameter model really we're seeing state of the art
performance And its ability to outperform anything else that has come out there
we're really thrilled just to be able to create this as a starting point that
the rest of the community can operate under and A lot Of Armand's mention in
Struck Lab, a lot of our intent behind releasing the Instruct Lab open source
project, which I think you guys covered in an earlier episode as well, is giving
the community, the open source community, the tools to work on models together.
So allowing them to collaborate and contribute to models, uh, and build
ultimately a better model that benefits from the world working together.
And, you know, I think it gets Marina kind of to your earlier point of
like, You know, there's a lot of big personalities who are trying to define
how AI works and the world works.
And that's one version of the world, but that's not how IBM sees it.
And it's really only through, you know, an open source ecosystem where we bring
the best that the community has to offer and everyone working together that I
think we'll really be able to unlock, uh, the, the future potential here.
Tim Hwang: So I'll reveal, uh, reveal a little bit of my own bias here.
Uh, huge open source head.
Thanks Um, struggled for many years running just like Linux, like that
was my, my childhood was like running free and open source software locally.
Um, and I think one of the things, you know, I kind of would love the
three of you to kind of respond to, um, is essentially like this kind of
interesting question that I saw popping up on social media this week, which is
how sustainable is open source, right?
Um, and, you know, I think.
One of the debates, I mean, speaking about big personalities
that want to define AI, right?
So there's a bunch of very loud VCs on AI arguing about this.
But I think that the root of the debate, I think, is an interesting one, right?
Which is that, you know, you look at the pre training costs
of state of the art models.
And as we scale bigger and bigger and bigger, it just gets more and more
and more expensive to like, accumulate the computing clusters you need to
do this, to do the pre training runs.
Um, and I think, you know, I think one of the arguments sort of being made
right now, right, is, is there coming a point where basically like these next
generation models are becoming so, so expensive that like it would be very,
very difficult to imagine any company that originates these models being willing
to open source them going forwards.
So that essentially sort of the argument is that there's a point at which kind
of like the, the sort of like, uh, upside from open sourcing is going to
be outweighed by sort of like the raw pre training costs of these models.
You know, the conclusion goes, right, with these big personalities
on Twitter is basically like, dot, dot, dot, open source has a ceiling.
Kate Soule: Do Tim Hwang: you
Kate Soule: all buy that
Tim Hwang: argument? If not, why not?
Kate Soule: I don't buy that argument.
I mean, I think there is incentive to continue to drive model performance, spend
more and create bigger and bigger models.
But I think we're seeing diminishing returns in terms of, you know, The use
cases and the value, you can accomplish a ton of incredibly, you know, take
care of low hanging fruit, so to speak, with much smaller models that
are going to be the ones that you're actually deploying and using day to day.
Like, that's where I think the economic value is going to drive.
And that's where open source is really, Well positioned.
I also think we're also, you know, we're learning a lot this past year, these past
couple of months in terms of how to unlock value and improve performance in models.
And most of the cost is spent on pre training, right?
As you say, you know, Burning GPUs for months, thousands of
GPUs, uh, to create a base model.
But then there's a step afterwards called alignment.
And that's where the open source community has really been leading the innovation.
They take these models like the llama model series, that's incredibly popular.
And they take that base model and iterate on the alignment step.
And that is far less costly, far less compute intensive.
So we're able to drive, you know, these step changes without having to resort
back to just burning compute hours for, you know, eons driving up crazy costs.
So I, I don't think that's, you know, a valid argument in my mind.
Tim Hwang: Yeah, for sure.
Armin and Marina, do you want to jump in at all?
Or do you largely agree?
Armand Ruiz: Yeah, no, I fully, I fully agree.
And, and, and, uh, we're all in on open source at IBM and I, I, I see the passion
of the community and that's really, really hard to, to compete with, right?
Like, um, even, even companies are using the open source speech
that, um, to attract talent.
Uh, researchers, they, they want to see their work.
I contributed back to the community and not closed source and behind an API
and their work not represented in any different way than just a commercial API.
So I think there is also that angle as well.
The power of the community is proving to attract the best minds
in the planet to progress on AI.
Tim Hwang: Yeah, it's fascinating to believe that, you know, like the
era of big scale is already over.
Like, essentially, like, the competition has already shifted.
Like, it has turned out that, like, scale was not all you needed.
Like, actually, you needed a lot more than, more than scale in some ways.
I think, Kate, what's also really interesting in what you say is I
think a little bit also about, like, you know, in a company using AI, like
in any company, There's this, uh, like hierarchy of prestige, right?
Who's doing the important work?
Who's the rock star?
And for a very long time, pre training was like the rock stars, right?
Like, oh man, they're like really using these like, you know, F1 computers to
like kind of create these like, you know, beings of pure linear algebra.
But like kind of what you're saying is like, actually the future is not that.
Like, in fact, like it's turned out that like what was traditionally
almost low prestige in the machine learning space, which is.
You just do the fine tuning at the end to make it a nice chatbot is actually
where the action is going to be.
Do you buy that?
Like in the future, people are going to be like, Oh my God, that
person's like a God of fine tuning.
Like this person is so amazing at alignment, like they're the ones and I,
this commodity stuff is like pre training.
Kate Soule: Uh, I mean, I think the community like is the quickly
already there if we're not already.
It is insane the amount of innovation that's happening at that part of the
process, and there's just so much untapped potential given, relatively
speaking, how cost effective it is.
So, you know, I think that's where we're also going to continue just to
be incentivized and where rock stars, as you say, will be made because you're
going to do what pre training had to spend millions and billions of dollars
to do and, you know, a fraction of that.
Should I ask the question then?
Tim Hwang: Why keep scaling?
Is it an insane thing for the industry to be doing?
Kate Soule: Okay, well, the hidden curse of why alignment's doing so well is you
need big models to make good small models.
So, you know, there is this paradox here of like, okay, small models are
where we're incentivized, but At some point, if you don't have a big model,
you can't make a good small model.
But we're also seeing a lot of open source, great larger models come out.
There's a bit of a play there that has to still, I think, evolve in terms
of, you know, the market still needs to feel out how that's going to fully
play out, um, for model providers.
Tim Hwang: So this is great.
Uh, any final thoughts, uh, Armand and, uh, Marina?
Marina Danilevsky: Um, I think that, uh, these things go in waves.
So we had a wave of, like, scale, scale, scale in a way that you just could
never do before, and that was amazing.
So it's very natural that we say, all right, we've maybe hit not a plateau,
but it may be a little bit of a slight slowing down in the S curve, all
right, let's see what else we can do.
It's going to come back again, and meanwhile, it's a very reasonable thing to
continue to try to see, well, what can we meanwhile do with, with the hardware, with
the acceleration, with everything else?
Because it's going to come up again for, for some reason or another.
All right.
So it's very good and natural that you go from scaling and now it's like, all
right, how do you get small from big?
It's probably going to go again.
Okay, now what can we turn those small things into once again, something big.
It's this, this, this is a normal and the pendulum swinging
Tim Hwang: basically.
Marina Danilevsky: We're just in that part of the swing.
So it's not that it's not valuable.
It's just that what are people innovating the most rapidly and that the pendulum
will swing of where the focus is there.
Armand Ruiz: I will add just things that people don't talk about it
that much is, for example, when Lama 3 went out, I think Lama 3 had,
um, what was the context window?
Like 32, 000 or 16, 000, but it wasn't super large.
Uh, and days after the release, the community was already contributing,
uh, A version with a technique to increase the context window.
So those are small details that only the practitioners notice, uh, and, and
they don't get into, into the headlines, but that is really the power of open
source that those contributions that innovation and, um, that's going to be
really, really hard, really hard to stop.
Tim Hwang: Yeah, that's a great note to end on.
Well, that's all the time that we have for today.
Marina, thanks for coming back on the show.
Marina Danilevsky: Pleasure.
Tim Hwang: And Kay Armand, it's been awesome having you on the show for
the first time, and hope to have you again on the show sometime.
Thanks so much.
Thanks Kate Soule: for the great discussion.
Tim Hwang: Thanks for joining Mixture of Experts.
And for the first time, a quick call out to all you listeners.
We're thinking about doing a segment in the next few weeks
that will focus specifically on agents and what's happening there.
Uh, we're always looking for interesting stories and people to talk to, so
if you've seen any cool papers or companies or people working in the
space, um, please, uh, drop a line in the comments on it and we'd love
to pick it up in a future episode.
Um, see you next time.