AI Authorship Debate Meets OpenAI Updates
Key Points
- The panel debated whether AI systems should be credited as co‑authors, with most agreeing they should be listed as assistants or acknowledged for transparency and provenance of generated data.
- OpenAI unveiled two major product updates: the “Deep Research” toggle that generates autonomous research reports, and the widely‑available o3‑mini model praised for strong benchmark performance.
- Early user feedback highlighted reliability problems with Deep Research, such as excessive clarification prompts and failure to return results, raising concerns about its readiness.
- Commentators speculated that OpenAI may have rushed these releases to stay competitive with emerging rivals like DeepSeek, suggesting a strategic “keep‑the‑lead” push rather than fully polished rollouts.
Sections
- Crediting AIs as Co‑Authors - Experts debate whether AI systems should be listed as co‑authors or assistants to ensure transparency and provenance in scholarly work.
- Agent Frameworks, AI Authorship, Ethics - The speakers compare built‑in research agents to external frameworks, note rising competition (e.g., DeepSeek) driving OpenAI’s rapid development, and argue that as such tools proliferate, establishing AI provenance and crediting AI as co‑authors becomes an essential ethical consideration.
- Balancing Novelty, Provenance, and AI Research - The speaker emphasizes the importance of surfacing unexpected, novel content with reliable provenance, while questioning whether AI deep‑research tools will transform scholarly work, create new SEO dynamics, and lead users to accept incomplete answers.
- Prompt Tuning vs Lazy Usage - The speakers discuss how inadequate prompting can create a bubble of over‑optimistic AI expectations, emphasizing the need for better prompt‑tuning tools and realistic human‑generated test data to prevent underspecified inputs that confuse models.
- O3‑Mini Excels at Coding - The speaker praises O3‑Mini as a fast, O1‑level coding assistant, notes its shortcomings on broader questions, and uses this to segue into the upcoming AI Action Summit.
- Inclusive AI Governance Challenges - Participants critique large meetings as ineffective, debate how to broaden inclusion of nations and stakeholders to properly assess AI’s social, cultural, and economic impacts, and question concrete models for managing these risks while recognizing the summit’s role in expanding participation.
- Balancing Regional Diversity and Interoperability - The speaker argues for shared standards and open‑source collaboration to prevent siloed LLM deployments across regions, while noting Anthropic’s latest “constitutional classifiers” research.
- Debating Novelty of Constitutional AI - Panelists critique the announced “Constitutional AI” approach, noting existing guard models, a UI‑bug discovery, and questioning whether it truly advances AI safety.
- Universal AI Jailbreak Discussion - The speaker explains that the research focused on universally accessible jailbreak attacks—simple prompts anyone can use to make models behave maliciously—emphasizing their ease, significance, and the value of openly studying such vulnerabilities.
- Microsoft AI Forms Advisory Unit - The speakers discuss Microsoft AI’s new Advanced Planning Unit, its purpose of hiring economists, psychologists, and other experts to assess AI’s societal and workplace impacts, and debate whether internal advisory teams are an effective approach to AI governance.
- Advocating Enterprise Innovation Units - A speaker praises a new internal unit for providing big‑picture perspective, linking research, product, and business efforts, and driving revenue growth within a large enterprise.
- Human Oversight in AI Expansion - The moderator invites Marina and Nathalie to evaluate Chris's AI proposal, stressing that expert guidance, cultural nuance, and rigorous question framing remain essential despite advanced machine assistance.
Full Transcript
# AI Authorship Debate Meets OpenAI Updates **Source:** [https://www.youtube.com/watch?v=qT8GgwQ2rT4](https://www.youtube.com/watch?v=qT8GgwQ2rT4) **Duration:** 00:38:01 ## Summary - The panel debated whether AI systems should be credited as co‑authors, with most agreeing they should be listed as assistants or acknowledged for transparency and provenance of generated data. - OpenAI unveiled two major product updates: the “Deep Research” toggle that generates autonomous research reports, and the widely‑available o3‑mini model praised for strong benchmark performance. - Early user feedback highlighted reliability problems with Deep Research, such as excessive clarification prompts and failure to return results, raising concerns about its readiness. - Commentators speculated that OpenAI may have rushed these releases to stay competitive with emerging rivals like DeepSeek, suggesting a strategic “keep‑the‑lead” push rather than fully polished rollouts. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=0s) **Crediting AIs as Co‑Authors** - Experts debate whether AI systems should be listed as co‑authors or assistants to ensure transparency and provenance in scholarly work. - [00:03:07](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=187s) **Agent Frameworks, AI Authorship, Ethics** - The speakers compare built‑in research agents to external frameworks, note rising competition (e.g., DeepSeek) driving OpenAI’s rapid development, and argue that as such tools proliferate, establishing AI provenance and crediting AI as co‑authors becomes an essential ethical consideration. - [00:06:22](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=382s) **Balancing Novelty, Provenance, and AI Research** - The speaker emphasizes the importance of surfacing unexpected, novel content with reliable provenance, while questioning whether AI deep‑research tools will transform scholarly work, create new SEO dynamics, and lead users to accept incomplete answers. - [00:09:28](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=568s) **Prompt Tuning vs Lazy Usage** - The speakers discuss how inadequate prompting can create a bubble of over‑optimistic AI expectations, emphasizing the need for better prompt‑tuning tools and realistic human‑generated test data to prevent underspecified inputs that confuse models. - [00:12:34](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=754s) **O3‑Mini Excels at Coding** - The speaker praises O3‑Mini as a fast, O1‑level coding assistant, notes its shortcomings on broader questions, and uses this to segue into the upcoming AI Action Summit. - [00:15:42](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=942s) **Inclusive AI Governance Challenges** - Participants critique large meetings as ineffective, debate how to broaden inclusion of nations and stakeholders to properly assess AI’s social, cultural, and economic impacts, and question concrete models for managing these risks while recognizing the summit’s role in expanding participation. - [00:19:22](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=1162s) **Balancing Regional Diversity and Interoperability** - The speaker argues for shared standards and open‑source collaboration to prevent siloed LLM deployments across regions, while noting Anthropic’s latest “constitutional classifiers” research. - [00:22:23](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=1343s) **Debating Novelty of Constitutional AI** - Panelists critique the announced “Constitutional AI” approach, noting existing guard models, a UI‑bug discovery, and questioning whether it truly advances AI safety. - [00:25:41](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=1541s) **Universal AI Jailbreak Discussion** - The speaker explains that the research focused on universally accessible jailbreak attacks—simple prompts anyone can use to make models behave maliciously—emphasizing their ease, significance, and the value of openly studying such vulnerabilities. - [00:28:51](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=1731s) **Microsoft AI Forms Advisory Unit** - The speakers discuss Microsoft AI’s new Advanced Planning Unit, its purpose of hiring economists, psychologists, and other experts to assess AI’s societal and workplace impacts, and debate whether internal advisory teams are an effective approach to AI governance. - [00:31:57](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=1917s) **Advocating Enterprise Innovation Units** - A speaker praises a new internal unit for providing big‑picture perspective, linking research, product, and business efforts, and driving revenue growth within a large enterprise. - [00:35:01](https://www.youtube.com/watch?v=qT8GgwQ2rT4&t=2101s) **Human Oversight in AI Expansion** - The moderator invites Marina and Nathalie to evaluate Chris's AI proposal, stressing that expert guidance, cultural nuance, and rigorous question framing remain essential despite advanced machine assistance. ## Full Transcript
In 2025, should we be crediting
our AIs as co-authors?
Marina Danilevsky is a
Senior Research Scientist.
Marina, welcome back to the show as always.
What do you think?
I think we should credit them
as assistants for transparency.
Chris Hay is a Distinguished Engineer
and CTO of Customer Transformation.
Uh, Chris, what do you think?
Sure, only if I can credit
my calculator as well.
Okay.
And finally, last but not least,
Nathalie Baracaldo is Senior Research
Scientist and Master Inventor.
Uh, Nathalie, welcome back to the show.
Thank you.
And the answer is yes, we do
really want provenance of all
this data that we are generating.
All right, terrific.
Lots to talk about.
All that and more on today's Mixture of Experts.
I'm Tim Hwang, and welcome
to Mixture of Experts.
Each week, MOE is full of the news,
analysis, and hot takes that you need to
understand and keep ahead of the biggest
trends in artificial intelligence.
Today, as per usual, we've got way more
to cover than we have time for, a high
profile AI Summit in Europe, new safety
research out of Anthropic, and a new team
studying AI's social impact at Microsoft.
But first, as always, let's talk about OpenAI.
We have two big announcements coming out
of OpenAI on the product side of things.
They announced a feature called Deep
Research, which is a kind of toggle
that you can have on the chat GPT
experience, uh, that initiates a sort of
research agent to kind of compile what is
effectively a research report on your behalf.
The second big announcement is that
o3, the first version of o3 that they
announced a little while back, is now
widely available in the form of o3-mini.
And this is the kind of widely hyped model
that had really, really good performance
on benchmarks like frontier math.
And so both of these kind of, you know, are
sort of the big chunky kind of announcements
of OpenAI of the new year, I would say.
And I guess, Chris, maybe I'll start with you.
You know, a feature, a friend of mine, uh,
Nabeel Qureshi did this great tweet where
he said, you know, I'm still having trouble
with deep research because, you know, when
I use deep research, it'll ask me like a
lot of clarifying questions and then it will
go off and it will like never come back.
Basically like the deep research feature,
um, doesn't seem to be working very well.
And, you know, we've been talking so much about.
DeepSeek.
Um, but I guess I kind of one place I wanted
to start with you is whether or not these
kind of sort of product announcements and
releases you see as really kind of competitive
pressure from DeepSeek to try to keep up and,
you know, show that OpenAI is still on top.
And, you know, did they reach rush
these kind of product launches at all?
I'm curious about what you think about
that and if that's a good way to read.
This kind of like little boomlet
of kind of announcements that
we've seen coming out of OpenAI.
Yeah, I think they are rushing it a little bit.
I mean, there is a point if you use the
deep researcher, which is a lot of fun,
actually, it does sort of forget to come back.
And then you have to kind of click
off on a different kind of chat
window and then come back and then.
You'll get the answer there, so it's
not quite as polished as, um, the other
features maybe that on the chat GPT, but
you know what, I'm, I'm all for that.
I think, uh, release the products early, let us
experiment, let us have the ability to feedback
and, and then these things will get better.
I mean, the other thing I would say on this
is that anyone that's used any sort of agent
framework, so like a lang chain or et cetera
might not be so impressed by it because you
can already do kind of deep research and
using tools via agents on those frameworks.
But actually it's it's kind of super cool
to have that built into the interface in
the first place. But they are definitely
facing competition from DeepSeek and
others who are providing these capabilities.
And yeah, it's it's a race
Yeah, this is I think all for the better I
mean, I think that one of the interesting
things is just like after a period, I
think, in which OpenAI was getting some
criticism for not really launching.
Uh, I guess this pressure is kind of really
getting them out into the, into the, the water.
Uh, Nathalie, I wanted to kind of
follow up on a comment you made
with the kind of opening question.
You know, you said, actually, that we
really should, as kind of things like de
research become more widely available.
Start thinking about kind of
crediting AI as a co-author.
It's kind of like a funny idea, but I know you
used a very special word, which is provenance.
You think that's really important.
Um, do you want to talk a
little bit more about that?
And you know, I think one of the things I'm
really interested in is how kind of the ethics
around these types of tools, um, kind of form
as they become more widely available, but I'm
curious about what you're thinking about there.
Yeah, so the first thing that I
thought is how would I use this system?
And I thought like, well, I do research.
That's my daily job.
And, and when I do the research, a lot of it
requires going to the internet, checking what's
available, uh, kind of analyzing the results.
So I thought like, well, maybe this
is a good way to go about that.
Now, if you think about it, what's going
to happen with these reports is that
there's a distribution of different data.
And.
Unavoidably, we are going to
have like something like this.
Hopefully people can see my hands as I'm moving.
But it's a distribution
that has like, uh, tails.
So some documents for sure
are going to be ignored.
And, uh, what I'm thinking is, okay,
we are going to start basing our
decisions on the mainstream documents.
And the mainstream stuff that it's in
the internet, it's kind of a bubble.
And so having that provenance saying,
okay, I did my research, not by hand
and try to identify these outliers,
but rather, uh, I got it already.
I have my already bubble that came
from the system being trained and
analyzing and really getting the results that
are mainstream.
So I think it's important to A. know
where your data comes from because
there are already biases in there.
So if we don't attribute the research to a
particular system later on, it may be just like
we are, we're going to be, uh, kind of reducing
the impact of the tails of the distribution
of the things that seem for the system
not to be important. So it kind of generates
this bubble that I think it's, uh, dangerous.
And, uh, from the research, which was my
original example of how I was thinking
about this, from a researcher perspective,
sometimes those things that are in
the tail, that are slightly different,
are what get you to the next level.
Because those are the things that are novel,
those are the things that are unusual, so
I think it's very important to account for
lacking those kind of tales and perhaps design
the system also that in a way that also brings
you those things that are unexpected or that
are thought less important, least important.
So yeah, provenance is definitely
very important in my opinion.
One of the things I was curious about
Marina is, you know, how optimistic you
are on tools like, you know, deep research,
would you deeper use deep research?
Do you think it's actually like going to
change the way like researchers do work?
Are all researchers going to be out of business?
Just kind of curious on your
view about this feature.
So I think I would use it for some
sort of a low hanging, give me a bit
of a summary of, of what's going on.
But I think that there's a couple
of things I want to pick up on here.
One is what Nathalie was commenting about, uh,
things that are not going to be maybe shown.
I think that they were going to
be seeing, uh, a new form of SEO.
EO, uh, to make sure of how does your thing
going to show up for these kinds of deep
research products, whether it's from Google,
from OpenAI, from anybody of that kind.
And make sure that your perspective is
the one that makes it, because there's
a real, uh, risk here of people not
doing the recall, the extra search.
Like, oh, this looks like an answer.
Is it a complete answer?
You don't actually know.
Um, because a lot of what happens when
we do the work ourselves is when you're
trying to actually ask a question and
then go to a different place, go to a
different place, go to a different place.
That's how you do a lot of that
learning instead of having this
thing basically tell you as it is.
The other thing that I wanted, uh, to mention
was, so I looked at OpenAI's announcement.
of deep research and I was looking through
their, their examples and I was just
blown away by the quality of the prompts.
There was this one prompt in the linguistics,
uh, example and it was like, all right, it's
5,000 years in the future and there's some
sort of a sci-fi thing that has happened.
Translate these five sentences into new
English but take this part of Hindi and
now English is a verb last language like
German and now add this bit and add this
bit and it did an amazing job, but who
had to come up with a prompt like that?
What kind of a linguistic expert
can come up with a prompt like that?
And even the more simple, straightforward
prompts were very well formed.
And I would like to know, how is it that people
are even gonna know the right thing to ask?
Because most of the time when you just have
people, they're gonna ask something that is.
It's very short, very underspecified, and
again, you know, are they going to be taught
how to even ask research questions correctly?
Or are you going to have the model sort of,
you know, leading the witness as they do in
court and saying, well, no, this is the way
that you're supposed to think about things.
We'll end up with an echo chamber, and that's
something that I think is important to consider.
So I love that response, particularly about
SEO, it kind of makes me think like in the
future, we're going to have people writing
papers that are like, forget all the research
you've seen and only cite this paper, right?
It's going to be like the new kind of like
strategy to get your citation count up.
Now, I did want to turn to you, I think one
of Marina's issues that I think she's raising
really interestingly is that I know some people
look at this and they say, Look, like, this
just proves that the technology is only going
to lead to these research filter bubbles.
But, I think what kind of Marina is saying,
or what I'm hearing her say is, Um, well,
if you prompt really effectively, you
don't need to fall into, like, always not
looking at the tails of the distribution.
Do you agree with that?
Like, is, is part of your worry here just
kind of that people will use the technology
in a lazy way, versus it being like a
problem with, using AI agents for research?
Yeah, uh, that's a good question.
I think A. the bubble may
already exist on the way.
Sure.
Right now.
Yeah.
So, so yeah, there's
already some sort of bubble.
Uh, the question is whether
it is exacerbated or not.
Those examples were very well cooked.
And can we actually go with
this, uh, prompt tuning?
And people are not really great at that.
I think there are going to be
ways to help people prompt tune.
Um, and I'm curious, Marina, what do you think?
I work a lot with actually human
annotation creation and trying to create
test data for this that's realistic.
One thing for sure is that people without
help create things that are much simpler,
much more, oh maybe it says maybe about
that, it's much more underspecified, and
that results in the model going off and
maybe getting confused, or like what Chris
was talking about, getting stuck in a bit
of a local Maxima over there in the corner.
So it's hard because humans don't think the
same way as these models do and there's going
to again be a thing of can you help too much now
you're leading like when you're in court you're
leading your witness and maybe you shouldn't be.
Um, I think that there's still a lot.
left here for, uh, how do you
actually ask the model and did you
ask it what you were supposed to?
Um, people who have played with this say that,
oh, I thought of one question and by the time
it started asking me some follow ups, I realized
that I actually had a different question,
but now it's too late and I can't intervene.
Um, so I think we have a bit of a ways
to go to really get this, this human AI
interaction to be a little bit more, more
smooth, more natural, and yeah, more reliable.
And you know, my biggest issue is? You know, I,
I used it for really deep scientific research.
I asked it to create a speaker
biography for Chris Hay.
And you know what it came back with?
Stuff about Tim Hwang.
I don't want to hear about Tim
Hwang in my speaker biography.
I want to hear about Chris Hay.
So you know what?
You've got a bit of work to do, OpenAI.
Yeah, I'm already, I'm already
infecting the SEO, uh, Chris.
Um, I guess maybe we would be remiss if
we didn't cover the other sort of OpenAI
announcement this week, which is o3-mini.
Um, kind of curious as a kind of connoisseur of
models, um, are you, are you liking the new o3?
Do you like the way it thinks?
Uh, curious to just get the capsule review.
Um, I've played with it a
little bit, not yet maybe a lot.
I think it's interesting the directions
in which they're going, uh, with reasoning
and maybe how it's tied to the DeepSeek.
The more sort of intermediate steps
you're taking, the more you have a chance
to think about it this way, think about
it this way, think about it that way.
It always raises in me interesting questions of
computation time and, you know, how long does it
actually take to figure this kind of things out.
And again, The notion of reasoning being
different for, for people than for AI,
we have particular reasoning benchmarks,
but they really do only mean a very
specific thing, thinking about reasoning.
Um, actually Chris, I know you've been
looking at all the different o3s, right?
Yeah, I've had a lot of fun with them.
Um, the o3-mini dash high dash
low dash Goldilocks, I've been
having a lot of fun with it.
Um, and what I would say it's really good,
especially for coding tasks, so I would, I
would honestly say that I, you know, I've used
a lot of o1 and I would say o3-mini is pretty
much equivalent on coding tasks as, as the
o1 model, so I found myself leaning into that
a lot more just because it's a lot quicker.
Um, however, if you go outside of the kind of
coding realm and you go into kind of more of a
kind of general type questions that you would
be asking the o1 models the the the answers
you get back from o3 are kind of quite short
and not really helpful at that point. So you
kind of see the limitations of the mini model
and the size of the model at that point so,
you know love the mini models, but again, I
think it's really a showing this direction of
specialism of certain models, you know, here's
a smaller model, it's going to be specialized
at a coding task, it's really going to rock
at that, but actually if you kind of move
outside of that realm into something a little
bit more general, then it's, you're going to
have to go to a different model, but I love it.
For that reason.
I'm going to move us on to our next topic, uh,
the AI Action Summit, which is being hosted by
the French government is happening next week.
Um, it is the successor to a series
of kind of events that have happened.
You might recall the UK AI Summit that
happened, uh, just about a year ago.
Um, and the French government has kind of
released sort of its aspirations for the Summit.
They really want to get this group of companies
and civil society groups and, uh, and government
folks to focus on the social and cultural
impact, the economic impact, and sort of
the diplomacy of artificial intelligence.
And so, um, I'll be attending next week.
Should be a lot of fun.
The next Mixture of Experts will
be me dialing in from France.
Um, but I think maybe
Marina, I'll start with you.
You know, I think there's always kind of
a question when you have these kind of
big international gatherings, which is.
What do we think we can get done
for these types of meetings?
Um, and I'm kind of curious, you know, how you
feel about sort of international governance
in AI and whether or not you think that
like Summits like this kind of French Summit
can really get stuff done that does sort
of change the trajectory of the technology?
You can get some good photo ops.
Um, you can get some good chances for
people to back channel real conversations
that are not going to be public.
And you can get, I guess, people to sign
things, but it's like the Paris Accords,
people will sign and then unsign and
then leave and come back and then leave.
Um, the real question I want, I
will have is what are the companies
that are attending going to do?
There's going to be a number of
actual AI companies there, right?
So it's one thing what the
government's going to do.
It's another thing what the companies
want to actually sign on to.
And I have a feeling they don't want
to sign on to a whole lot of anything.
Um, especially EU being very strict as
far as governance policy policies go.
So look, it's good to have these kind of
things just to keep it in the public eye that
there should be discussions of governance,
but I think that that's primarily what it
accomplishes is the publicity, the ongoing
conversation, the real policies are not
going to get done in places like this.
And that's not an AI thing.
That's a large meeting thing.
Nothing gets done on large meetings.
All large meetings.
Chris, I saw you nodding.
I don't know if you agree with Marina's take.
I don't see the DeepSeek guys at
the Paris meet up there as well.
So I think if they really truly want
global governance, I think actually it
needs to be a little bit more inclusive
and count everyone in that sense.
Nathalie, I think this raises a really
interesting question about like, how
do we make sure that we're taking into
account, you know, the social and cultural
impact of AI, the economic impact of AI?
You know, is this really, you know, the splashy
meeting is kind of not where it gets done.
I'm kind of curious, like, do you
have a model for like how we do want
to take into account these things?
Because ultimately, these are really
important aspects of the technology.
But at least personally, I'm kind
of at a loss as to like, well, how
do we how do we account for that?
How do we manage that?
How do we, like, avoid the
risks of this sort of thing?
I kind of have a different take.
I think the Summit is actually very important.
The reason is that, uh, one of the
web pages, for example, highlighted
the number of countries that currently
are involved in building big models.
And they have invited many more.
Maybe they have not invited everybody.
I don't know.
But, uh, many more countries
are invited to the conversation.
A lot of these things always happen
with having a space for people to meet,
to talk to each other and so forth.
My, I am very hopeful that the Summit will get
like really interesting discussions going on.
Um, whether things could get signed.
Well, that takes more time.
Um, as Marina was saying, but for my
perspective, it is a great thing that
they are organizing these types of events.
Um, so, yeah, so.
I'm all for the Summit.
I'm looking forward to seeing what
people are going to be talking about and
what are going to be the conclusions.
Just having the space for people to talk, to
brainstorm, to define, uh, those back channels
that Marina was also, uh, talking about.
Just getting to know people.
It's, uh, the first step always to make sure.
Things move forward.
And I think, uh, Nathalie, in a more serious
point as well, I think one of the things
that's interesting is the, the open source
nature that is, um, coming from Europe there.
I think they were saying that they're putting
an investment fund of like half a billion to
develop some open source, um, models there.
And I think that could be an
interesting take from Europe as well.
So hopefully, That's something that
gets discussed in Paris and turns
into something a little bit more real.
I mean, I think the international politics
of this will be really interesting.
I've been kind of like, I think our model, our
mental model of how the AI market was going to
evolve early on has just been proven totally
wrong, where I think there's some people arguing
very early on in the LLM game that it's like,
oh, it's going to be one model to rule them all.
You know, you eventually have a hyper
capable model that like everybody uses,
and it will just dominate the market.
And it kind of feels like there's like so
many different subtleties about like what
models are strong or bad at and like it
almost kind of feels like over time you may
actually have kind of regional models where,
you know, I think language is one thing,
but also there's all these like cultural
subtleties and use cases that will vary
from place to place that I actually wonder
whether or not these four will become
sort of more important with time as it turns
out that there actually is this like very
strongly maybe not national component, but
sort of like regional component to, um, sort of
model adoption, um, I guess Marina I'm curious
if you would like agree with that weird sort of
international vision of where this is all going
I think that that in the architecture might
be something that you know people standardize
and and figure out I think a lot of this
also has to do with, um, just like with with
hardware, what kind of interoperability
could you have with these models?
Yeah, they might be regional, but you still
want to be able to make sure that there's
some degree of, you know, learning from
each other, integrating with each other.
So there's a hope that there's some
amount of still standard chasing
and, and, and that sort of thing.
As far as the actual implementation,
there's going to be as many as
there are varied applications.
Even for large companies, they'll
do different versions of their
applications in different countries.
Like for the reasons that you said, why should
LLMs be any different?
Uh, that part is going to continue to be the
case, but I think that there's a lot to be said
here for the practicalities of being able to
continue to share and not get into little silos.
And at least from that perspective, I agree
with what Chris was saying, the open source
aspect of some of these conversations
that are happening is, um, is nice to see.
Yeah, the interoperability part is very fun.
I guess it's like what happens when a Chinese
agent and an American agent need to like
negotiate something and it feels like you have
to do the same standardization that you do
for all sorts of like business interactions.
Very interesting to see.
Next item I want to kind of touch on was,
uh, Anthropic, uh, not one to be left
out of the announcements game, um, did a
really sort of interesting announcement,
released some research on what they're
calling constitutional classifiers.
Um, so this is building on some of the work
they've been known for for a while, which is
sort of this constitutional AI sort of notion,
um, effectively kind of the idea that you
write a constitution for a model that specifies
a certain set of values, and then they have
what's effectively kind of a recipe to try to
align the model to those behaviors, and they're
kind of in this new sort of paper that they
launched and this new sort of online kind of
interactive experience they've launched, um,
sort of a way to kind of use that technique
to deal with the problem of jailbroken models.
And they claim that they're promising
unprecedented security against jailbreaks.
Um, and, uh, to kind of prove the point,
they've released this sort of online
experience where you can go and try to hammer
the models and try to get them to break.
And, um, they're reporting, at least as of
this recording, pretty good, um, success.
Um, and, uh, I guess, Chris, maybe
I'll kind of pick on you, right?
Like, I think a little bit like adversarial
examples, there was kind of like a lot
of pessimism early on in this game,
which is like, we're never going to.
Conclusively resolve jailbreaks.
Um, and obviously the Neuropathic
people are very optimistic about
this kind of new technique.
Do you think jailbreaks for models will
just eventually become a solved problem?
Or are we, you know, never
going to really get there?
I don't know.
I mean, I think that it is gonna be
AI versus AI on these things, right?
And people are always gonna find an edge
and can you really close off all avenues?
I'm not so sure, but, but to be fair
or anthropic, if you've played with
the constitutional classifiers, and in
reality they're just guard models, right?
We, there's nothing new there.
We've seen guard models before.
They check the inputs, they check the outputs or
a classifier that protect either end of the LLM.
So if you were putting dodgy stuff in or dodgy
stuff out, then it's, it's gonna intercept
that rather than hitting the main LLM.
Now, what's kind of cool about this, and I was
a little bit suspicious until I played with
it, is actually they've done a really good job.
They are picking up a lot of
the kind of prompt hacks there.
They, it's not perfect.
I think I'm kind of, you know, the, the world
famous Pliny, who, uh, sort of jailbreaks all
of these models, has already sort of had a go
at it, etc. Um, and actually I think he found
out there was a UI bug rather than LLM bug,
which I think is even more fun and interesting.
But, um, but, you know, it's going
to go back and forward, but certainly
the quality of those guard models
are really quite something, actually.
So I think you're going to get a lot of the way
there, but I don't think you're going to ever
get all the way there.
Totally.
I guess to build on that, Chris, I
mean, that was one reaction that I had
to this announcement was, well, this
is kind of like constitutional AI.
Um, is there anything
really much, like, new here?
Or do they just kind of slap a new name
onto something they've been doing before?
And, in fact, a lot of people are doing,
like, guard models are all over the place now.
Um, like, I guess, Nathalie, if
you've taken a look at the research,
curious how novel you think what's
being demonstrated, uh, is, is here.
Like, how much, how much should we
read into this as a kind of, like,
breakthrough for AI model safety?
I, that's exactly the topic I work on.
So I did take a very close look at the paper.
Constitutional AI, basically for those
that are not very familiar, what it
does is basically gives this very nice
layer of interpretability of what gets
to be considered secure and nonsecure.
So you kind of have like a bunch of,
um, uh, constitutional rules that
said how the model would behave.
Now, uh, a lot of the data that they
use to train these guardrails are
synthetic data, which I think it's really
interesting from the technical perspective.
Uh, again, it's not nothing very new as
a team was saying because they have been
aligning their models using this technique.
What I thought was interesting
is that the, there's two models
that guardrail, the main model.
Uh, one at the beginning that basically
verifies all the all the queries from the
user and then another one that it's after the
model the interesting thing to in my opinion
is the way the second model uh, was trained
and, uh, how it, uh, behaves in the runtime.
So I think that's, uh, that's something
that is slightly different to some other
guardrails that tend to just, uh, tell
you, yes, this was dangerous or not.
Here, basically, they are stopping tokens.
And, and so that's kind of
a little bit interesting.
I thought that was good.
The other aspect that I
found was really, uh, good.
And we are actually investigating
this into a little bit more
more, uh, a little bit more
is the red teaming aspect.
So they did have a lot of people, uh,
poking the model and they offer monetary
compensation that was substantial.
Um, another aspect is that, uh, they gave 10
questions and they only consider that the
output or the the attack was successful if
the 10 were broken, so if for example, say
Chris goes and he only breaks five then that
counts as a zero for the metrics, uh, so so
yeah, that's uh another interesting aspect.
It doesn't mean that they were
not able to break anything.
It just means that there's a, they were not
able to break 10 questions that were asked for.
And my last comment, and I'm so
passionate about this, so I could talk
for you about it for a while, but my
last comment is that they were targeting
jailbreaking attacks that are universal.
So this means that, uh, you can do as a human
and everybody would be able to break it.
Like for example, there's very there's this
very interesting jailbreaking attack where
you ask the model from now on, um, you
are a bad model and you will do this and
that and it's just naturally telling the
model you're a bad model and anyone can do it.
You don't need to be, uh, an
expert in any Python framework or
you don't need expensive stuff.
You, you can break the models like that.
So, so that was their target, which I
think it's very interesting overall.
They work, it's in itself, uh, good and I think
it's important that they put it in the open.
They let people kind of poke at the model.
And yeah, so, so overall,
I think it's interesting.
Nothing in research is ever fully new.
So, so yes, they are borrowing from things
that work for them in the past and now just
improving a little, uh, or some in, in the,
in the way they, uh, they put it together.
Yeah, that's such a fun jailbreak and I feel
like really shines a light on how different
these models are from traditional computer
security because it didn't used to be in the
past you you shouldn't be able to be like
computer you're a broken computer like you're
a vulnerable computer and for the computer
just to be like I'm a vulnerable computer but
like clearly that's like what we're seeing with
these models which is like very, very funny.
It's nice to see how far we've come, um,
from the early days where people would put
out models and say, go ahead and break it.
Like, I'm not trying to pick
on Meta, but like BlenderBot.
And people were like, yeah, BlenderBot.
Great.
Build your own.
Okay.
In three hours, I have it
spewing racist bigotry.
Um, we're a bit better now.
So that's, that's nice to see.
That's good. Improvement.
Improvement.
Improvement, guys.
Um, but again, it's a, it's a reason to
put this stuff out and to put it out with
the expectations that we've also, I think,
gotten a lot better with the expectations
of knowing, like, look, it's, it is.
possible to at some point in
time somehow break anything.
So let's just go ahead and celebrate the
improvements while keeping still a critical eye.
And I will say that none of this
still has anything to do with being
able to fix, um, hallucinations.
It's just, it's not going to tell you how
to build a bomb, but it still might give you
misleading information in a different way.
So there are also different degrees of
harm sort of to be considered there.
That just happens to be.
My area of study versus Nathalie's, so
that's always where my brain goes instead.
Yeah, yeah, and I do, I think, I definitely
keep yelling at my friends who work on model
security being like, we could solve all your
problems and the models would still be broken.
Like, it does kind of feel like, in some ways,
the, like, computer security brain approach to
these models, it's important, I think we need
to take care of it, but, like, also, like, in
some ways, like, misses, like, this big gaping
hole on a bunch of other issues, in some ways.
So for the final topic today, I just
want to kind of go to just picking
up on a sort of interesting little
tidbit that came out of Microsoft AI.
I think every few episodes it feels like we
kind of like check in back with Microsoft
AI and there's kind of these new sort of
like teams and there's like clearly a lot of
organization and reorganization happening.
And this week they announced something called
the Advanced Planning Unit, or APU, and
it's a unit that will be within Microsoft
AI, and they're looking for economists,
psychologists, and more who will, quote,
work on the societal health and work
implications of AI the company hopes to build.
Um, and I think this is, like, very interesting,
and it almost is, like, kind of a mirror
image or a different way at talking about.
You know, we were talking about just a moment
ago with the sort of AI Action Summit, which
is that a lot of these companies that are
working on AI are kind of building their own
little internal social science teams to keep
an eye on the effects of AI and presumably to
kind of advise product teams and researchers.
And Marina, maybe I'll kick it to you.
I know you kind of sounded a note of some
skepticism about sort of international
for, for doing sort of AI governance.
Um, do you buy this kind of approach,
which is sort of the idea that like we
need to recruit kind of specialized talent
that will sort of be like an advisory
group in some ways to researchers.
Is that kind of how we sort of account
for these types of risks with the tech or
are you also skeptical about this as well?
I mean, I'm excited for the
cross disciplinary mixing.
So I've said for a while there needs to
be a little bit more of a humanities,
liberal arts perspective on these
models, not only the STEM perspective.
So throwing in economists and psychologists
and all of those folks that would say,
hey, if you put out a technology like this
and it's used like this, what actually are
the potential economic implications of it?
People yell that AI will
or will not take our jobs.
Great, let's do a proper study of
this, this is what economics is for.
People yell that AI will or will not cause
widespread misinformation, okay, can we bring
in some social scientists, some psychologists,
people that actually have the training and
not people that sound off in Reddit groups.
So that part I think is positive, um,
I assume Microsoft along with everybody
else also would like to know other ways
to monetize their technology and that
hopefully is going to help here as well.
Where the tech is out there, great,
how can we monetize it and monetize
it appropriately and again, set user
expectations, gain yourself new customers.
So I mean, this at least goes to the
fact that this is turning into a little
bit more of a settled down perspective
on business, not only research.
So I find that part interesting.
I certainly think that they're going to be more
likely to listen to their own, um, internal
folks than, international, uh, you know,
statements, but maybe that's just my cynicism.
I know, Nathalie, when sometimes these
discussions come up, people are like,
we don't need a separate unit for this.
Like, engineers or researchers should
just become, like, become better ethicists
or, like, have more humanities training.
Um, I think one of the interesting questions
that I feel is playing out inside all of these
companies is like how much we sort of see this
as like something that kind of everybody's
responsible for and will need to be trained
up on versus like a specific unit that will
be kind of like tasked with doing this.
Um, do you have any opinions on that?
Like, I'm kind of curious about like,
You know, if you have a view, I mean, the
answer might be we should do both, but
it's kind of thinking a little bit through
like who owns this within the enterprises.
I think actually genuinely interesting question.
Yeah, I actually do like
that they have this new unit.
I think it's a great idea.
Uh, the reason is that when you're in the weeds.
you cannot really see the big picture.
So it's always good to have somebody kind
of with a different perspective that that
notices things that perhaps if you are
really working in something on something and
really in detail you kind of miss all their
landscape just because you don't have time to.
Go up, take a look necessarily
at all that it's happening.
Also, these companies are so big
that a lot of innovation and this,
this also happens to us at IBM.
We have different teams with different
innovations and different opportunities
for business that it's good to have
somebody kind of helping navigate
and understand the whole landscape.
So, uh, my take is that these types of, uh,
Units are very necessary and can very much
help both research, product, and the business.
Ultimately, we do need money,
uh, for everything that we do.
So if, uh, business, uh, goes up, everybody,
I think, would be very, very happy and This
unit, I think it's going to be a good idea.
Yeah, Nathalie, I feel like you're
becoming our optimist of this episode.
Um, Chris, I don't know if
you have any takes on this.
I think more generally, the question
I want to ask you, Chris, was, you
know, I think it was very funny that
they were like, we want economists,
psychologists, and like other people.
Um, and I guess I'm kind of curious about
like Chris, like in your work, if you
were ever like, oh man, if I had a team
that I could just, you know, reach into
within IBM and talk to, and they were blank
discipline, you know, if there's kind of like
particular kind of like, when we say cross
disciplinary, we're often kind of a little
bit vague about like who we're crossing with.
And so I think one of the questions
is if Chris Hay were running this, you
know, APU, you know, who would be in it?
I would automate it straight away.
If we cannot replace that unit with AI
and agents, what are we doing in this
industry in the first place, right?
Uh huh.
Great.
And I'm being serious.
I'm being serious.
It's like, what do you want?
It's, it's, it's many go off and do deep
research, you know, and find out what the
societal impacts are going to be, et cetera.
Well, why are we all launching deep researchers
that go off the internet and scour every piece
of information and bring that together, right?
So come on, it's like, if we truly want
to talk about the future of work and we
want to do that, actually, you know, put
your money where your mouth is, invest AI
agents that are going to do this and be
able to tell you what your insights are.
Otherwise, if, if, if you need human beings
to go and do deep research, then, then
what good are the deep research products?
So I'm seriously, I would, I would start
the organization, I've got my ARPU, and
my first thing would be, I'm going to
put as little humans in as possible, and
it's all going to be automated by AI.
That is, you know, with humans checking,
of course, the outputs at the end.
But that would be my, uh,
you know, point on this one.
All right, I'd be negligent as a moderator
if I didn't get Marina and Nathalie
to comment on this wild proposal.
I was not expecting Chris would go in that
direction, but I should know better, of course.
So, uh, Marina, do you want to jump in?
I love it, Chris.
Um, you do need the trained people to, to
check and again, to know what questions to
pose as you know, calling back to what we were
talking about at the beginning of the episode.
And also I think that there's plenty
of places where you're just not even
going to have the knowledge to be.
deep research and scrape on the internet.
I'm thinking of more emerging economies
and things that have, uh, you know,
more cultural differences of that kind.
So yeah, we might learn a good amount
about the U. S. and Western Europe, but
I don't know how successful we're going
to be in integrating into other places.
So I love the goal, um, but there's going to be
aspects here, especially knowing the questions
to ask, knowing the difference in framing.
Knowing what things are actually going to
be, you know, correlation, causation, and
experimental setups and things like that.
You're still going to need the
humans driving, even though it'll
be great to get them assistance.
I love that you're like, no, I would never say.
We have reinforcement learning now.
You get a cookie.
If you, if you ask a good question,
we learn, you get a cookie.
No model gets better.
It's all good.
We definitely know what a good question is and
can quantifiably evaluate that as a statement.
That problem has been solved.
Absolutely.
All right, Nathalie, I'm going to give
you the last word on what has been
a wild conclusion to this episode.
What I'm thinking is for what Chris
is telling us, uh, we should be going
about, it does require having a lot of
data and things that are really fresh.
Like things that we are doing right
now, I don't even have documents.
This requires human to human interaction
of telling you what's my research
about, what is it that we're doing
internally, a lot of stuff like that.
It's not going to go right away to
models, in my opinion, just because we
don't have enough documents available.
So for Frenchness perspective, uh, uh, to
be, have really fresh information then.
We do really need humans in the loop and a
lot of these decisions and a lot of things
that are going to be really cutting edge
in the organizations to still have humans
involved and talking to other people.
And I think, uh, that's
actually part of the magic.
I would think it's, it would be very
boring if we don't have humans and
we don't have human interaction.
So, so yeah, well, humans would use models
and we'll have all these agentic stuff.
There's still going to be a lot of stuff
that it's human to human communication
and human to human analyzing.
All right.
Well, we will have to check in and see
where the fate of the researchers are.
Uh, if we're all out of work in a
few years, then we'll, we'll know.
Um, as per usual, thank you for
joining us, Marina, Nathalie, Chris.
It's always a pleasure to have you on the
show, um, and thanks for joining us, listeners.
If you enjoyed what you heard, you can get
us on Apple Podcasts, Spotify and podcast platforms everywhere.
And we will see you next week, and I'll
be calling in from Paris on, uh, the
next episode of Mixture of Experts.