Prompting Scores and Claude 4 Insights
Key Points
- The hosts ask guests to rate their own prompting skills, with Kate rating herself an 8, while Chris and Aaron dodge the question, highlighting the playful uncertainty around prompt‑engineering expertise.
- The episode of “Mixture of Experts” focuses on recent AI news, including high‑profile collaborations like Rick Rubin with Anthropic, Jony Ive with OpenAI, and Microsoft’s new “agent factory” concept.
- A major discussion centers on the leaked Claude 4 system prompt, noting that Anthropic’s unusually long and publicly annotated prompt serves as both a practical guide and a benchmark for modern prompting practices.
- Chris observes that Anthropic’s transparency—publishing most of the prompt despite some redacted parts—effectively educates users on how to craft effective system prompts, underscoring a shift toward openness in AI model behavior control.
Sections
- Prompting Self‑Ratings on the Podcast - In the opening of the “Mixture of Experts” podcast, host Tim Hwang humorously asks a panel of AI professionals to rate their own prompt‑engineering skill on a 1‑to‑10 scale, underscoring the playful uncertainty about what truly constitutes expertise in prompting large language models.
- Cross-Model Prompting Insights - The speaker highlights how system prompts from Claude 3.5 improve even unrelated Llama models, stresses the need to balance specificity with model autonomy, and discusses embedding safety “red‑flag” awareness into prompts.
- Debating Release of AI Prompts - The speaker weighs the trade‑offs of publishing LLM system prompts—transparency and proof of capability versus security risks and the need for user expertise.
- Evaluating System Prompt Control - The speakers discuss how system prompts—like directives to insert a “thinking block” after function calls—aim to steer model behavior, questioning the extent of their actual impact, the difficulty of thorough testing, and the need for academic access to validate these prompts.
- Leaked System Prompts and Future Exploits - The speaker warns about the dangers of leaked system prompts and obfuscation tactics that can covertly shape model behavior, and notes a new Anthropic collaboration with legendary music producer Rick Rubin.
- Balancing Artistic Prototyping with Robust Engineering - The speaker contrasts using creative “vibe coating” for rapid prototypes with the necessity of solid engineering for scalable, critical‑infrastructure applications, while advocating for diverse, artistic approaches to coding.
- Balancing Technical Rigor and Creative Freedom - The speaker argues that while architectural schematics and engineered processes are essential, preserving a creative, exploratory “vibe coding” mindset—much like music production—is crucial for innovative design.
- Bridging Vibe Coding and Engineering - The speaker likens vibe coding to collaborative invention, emphasizing the need to translate the creative, interdisciplinary brainstorming process into concrete, scalable engineering implementations.
- Designing the Future of AI - The speaker argues that beyond simple collaboration, shaping AI’s multimodal, on‑device future will require AI and design firms—led by visionaries like Jony Ive—to reimagine interaction paradigms, form factors, and agent behavior.
- High‑Stakes AI Talent Deal - A speaker critiques a $6.5 billion investment in a tiny AI firm, emphasizing the speculative $118 million‑per‑employee cost and the race to launch mass‑produced AI companions that could pressure Apple’s emerging intelligence platform.
- Ecosystem Trust vs Data Tradeoff - The speaker argues that while users accept sharing data for utility, they gravitate toward trusted, integrated ecosystems like Apple’s, and OpenAI must embed its services within such cohesive platforms rather than remain isolated.
- Enterprise AI Agents: Democratization and Competition - The speakers discuss how AI agents are becoming commodified, with multiple vendors offering prebuilt agents, and why Microsoft is focusing on training and customizable agents to meet enterprise demand while avoiding vendor lock‑in.
- Azure‑Powered Supercomputer Sparks AI Agent Talk - The speaker highlights Azure’s cloud‑based supercomputer ranking on Top500, emphasizing Microsoft’s compute strength and AI tools as a foundation for a burgeoning AI‑agent market.
- AI Agent Marketplaces & Vibe Coding - The speaker envisions standardized AI agent marketplaces that let specialized, interoperable agents “vibe code” tasks, turning the concept from a toy into a production‑ready factory model, exemplified by the newly released aLoRA technique.
- Podcast Closing and Platform Plug - The hosts wrap up the episode, thank the guests and listeners, and promote where to find the show on major podcast platforms while previewing next week’s “Mixture of Experts” IBM episode.
Full Transcript
# Prompting Scores and Claude 4 Insights **Source:** [https://www.youtube.com/watch?v=e_B91C2vILc](https://www.youtube.com/watch?v=e_B91C2vILc) **Duration:** 00:43:50 ## Summary - The hosts ask guests to rate their own prompting skills, with Kate rating herself an 8, while Chris and Aaron dodge the question, highlighting the playful uncertainty around prompt‑engineering expertise. - The episode of “Mixture of Experts” focuses on recent AI news, including high‑profile collaborations like Rick Rubin with Anthropic, Jony Ive with OpenAI, and Microsoft’s new “agent factory” concept. - A major discussion centers on the leaked Claude 4 system prompt, noting that Anthropic’s unusually long and publicly annotated prompt serves as both a practical guide and a benchmark for modern prompting practices. - Chris observes that Anthropic’s transparency—publishing most of the prompt despite some redacted parts—effectively educates users on how to craft effective system prompts, underscoring a shift toward openness in AI model behavior control. ## Sections - [00:00:00](https://www.youtube.com/watch?v=e_B91C2vILc&t=0s) **Prompting Self‑Ratings on the Podcast** - In the opening of the “Mixture of Experts” podcast, host Tim Hwang humorously asks a panel of AI professionals to rate their own prompt‑engineering skill on a 1‑to‑10 scale, underscoring the playful uncertainty about what truly constitutes expertise in prompting large language models. - [00:03:04](https://www.youtube.com/watch?v=e_B91C2vILc&t=184s) **Cross-Model Prompting Insights** - The speaker highlights how system prompts from Claude 3.5 improve even unrelated Llama models, stresses the need to balance specificity with model autonomy, and discusses embedding safety “red‑flag” awareness into prompts. - [00:06:10](https://www.youtube.com/watch?v=e_B91C2vILc&t=370s) **Debating Release of AI Prompts** - The speaker weighs the trade‑offs of publishing LLM system prompts—transparency and proof of capability versus security risks and the need for user expertise. - [00:09:14](https://www.youtube.com/watch?v=e_B91C2vILc&t=554s) **Evaluating System Prompt Control** - The speakers discuss how system prompts—like directives to insert a “thinking block” after function calls—aim to steer model behavior, questioning the extent of their actual impact, the difficulty of thorough testing, and the need for academic access to validate these prompts. - [00:12:19](https://www.youtube.com/watch?v=e_B91C2vILc&t=739s) **Leaked System Prompts and Future Exploits** - The speaker warns about the dangers of leaked system prompts and obfuscation tactics that can covertly shape model behavior, and notes a new Anthropic collaboration with legendary music producer Rick Rubin. - [00:15:30](https://www.youtube.com/watch?v=e_B91C2vILc&t=930s) **Balancing Artistic Prototyping with Robust Engineering** - The speaker contrasts using creative “vibe coating” for rapid prototypes with the necessity of solid engineering for scalable, critical‑infrastructure applications, while advocating for diverse, artistic approaches to coding. - [00:18:32](https://www.youtube.com/watch?v=e_B91C2vILc&t=1112s) **Balancing Technical Rigor and Creative Freedom** - The speaker argues that while architectural schematics and engineered processes are essential, preserving a creative, exploratory “vibe coding” mindset—much like music production—is crucial for innovative design. - [00:21:36](https://www.youtube.com/watch?v=e_B91C2vILc&t=1296s) **Bridging Vibe Coding and Engineering** - The speaker likens vibe coding to collaborative invention, emphasizing the need to translate the creative, interdisciplinary brainstorming process into concrete, scalable engineering implementations. - [00:24:39](https://www.youtube.com/watch?v=e_B91C2vILc&t=1479s) **Designing the Future of AI** - The speaker argues that beyond simple collaboration, shaping AI’s multimodal, on‑device future will require AI and design firms—led by visionaries like Jony Ive—to reimagine interaction paradigms, form factors, and agent behavior. - [00:27:42](https://www.youtube.com/watch?v=e_B91C2vILc&t=1662s) **High‑Stakes AI Talent Deal** - A speaker critiques a $6.5 billion investment in a tiny AI firm, emphasizing the speculative $118 million‑per‑employee cost and the race to launch mass‑produced AI companions that could pressure Apple’s emerging intelligence platform. - [00:30:50](https://www.youtube.com/watch?v=e_B91C2vILc&t=1850s) **Ecosystem Trust vs Data Tradeoff** - The speaker argues that while users accept sharing data for utility, they gravitate toward trusted, integrated ecosystems like Apple’s, and OpenAI must embed its services within such cohesive platforms rather than remain isolated. - [00:33:52](https://www.youtube.com/watch?v=e_B91C2vILc&t=2032s) **Enterprise AI Agents: Democratization and Competition** - The speakers discuss how AI agents are becoming commodified, with multiple vendors offering prebuilt agents, and why Microsoft is focusing on training and customizable agents to meet enterprise demand while avoiding vendor lock‑in. - [00:37:02](https://www.youtube.com/watch?v=e_B91C2vILc&t=2222s) **Azure‑Powered Supercomputer Sparks AI Agent Talk** - The speaker highlights Azure’s cloud‑based supercomputer ranking on Top500, emphasizing Microsoft’s compute strength and AI tools as a foundation for a burgeoning AI‑agent market. - [00:40:10](https://www.youtube.com/watch?v=e_B91C2vILc&t=2410s) **AI Agent Marketplaces & Vibe Coding** - The speaker envisions standardized AI agent marketplaces that let specialized, interoperable agents “vibe code” tasks, turning the concept from a toy into a production‑ready factory model, exemplified by the newly released aLoRA technique. - [00:43:15](https://www.youtube.com/watch?v=e_B91C2vILc&t=2595s) **Podcast Closing and Platform Plug** - The hosts wrap up the episode, thank the guests and listeners, and promote where to find the show on major podcast platforms while previewing next week’s “Mixture of Experts” IBM episode. ## Full Transcript
How good are you as a prompter on a scale from 1 to 10, with 1 being totally
amateur and 10 being world class?
Kate Soule is a Director of Technical Product Management for Granite.
Kate, welcome back to the show.
Prompting.
How are you at it?
Prompting is never something I wanna be known for, but I do think I'm pretty
good at it, so maybe like a, an 8.
Okay, cool.
Nice.
Chris.
Hey, Distinguished Engineer, CTO, Customer Transformation.
Chris, welcome to the show, uh, your prompting score as a Large
Language Model.
I could not possibly answer that question.
Got it.
And last but not least is Aaron Baughman, IBM, Fellow and Master Inventor.
Aaron, uh, your prompting skill please.
Does prompt engineering really exist?
Yeah, I'm not quite sure.
I always ask LLMs to produce a prompt for me.
Okay.
Everybody's fighting the question.
All that and more on today's Mixture of Experts, a Think podcast.
I am Tim Hwang, and welcome to Mixture of Experts.
Each week, MoE brings together the sharpest team of researchers, engineers,
and product leaders you'll find anywhere in the world of podcasting
to discuss and debate the biggest news in artificial intelligence.
As always, there's a ton to talk about.
We're gonna talk about Rick
Rubin's collaboration with Anthropic.
Jony Ive with OpenAI, uh, Microsoft's new agent factory theory.
Um, but first I really wanted to start by talking about the Claude 4 system prompt.
So you may have gotten our emergency episode where we did a quick review of the
release of Claude 4 um, and true to form.
Uh, pretty soon afterwards, the system prompts leaked.
I think that's kind of just almost standard practice now.
It was pretty interesting.
Simon Willison, uh, did a super interesting blog post where you sort
of annotated the system prompt and I think in general I wanted to kind of get
this group together 'cause we haven't talked about prompting in some time, but
it's also just an interesting document as kind of like a state of the art
on where prompting is at the moment.
Chris, I'll start with you.
Curious if there's anything that kind of stuck out to you reading this
prompt that you felt was different or really kind of indicated where.
You know, kind of the state of the practice was in prompting.
I always find the Claude system prompts super interesting because one, they're
very transparent about it as well.
They publish it.
I mean, there is some stuff they don't publish, but, uh, they're
pretty transparent about that.
But it's long. I mean, this is not a short system prompt, right?
So if you think question, how good are we at system prompting?
This thing is pages and pages long.
So Anthropic are really giving you an education themselves on how to system
prompt, pro, uh, how to prompt properly.
So I think it's pretty good.
There are
a few things that I think is super interesting about it.
Um, the first one is probably just, just simple things like
guidance on how it wants to answer.
It's like, um, you know, if it's a short thing, please just answer there.
Don't use artifacts in this case, et cetera.
Um, there's a lot of guidance perspective and then how to
deal with personality as well.
And you know, if it's a sensitive topic, blah, blah, blah.
But I think the thing that probably.
Makes me laugh the most is how it always talks to Claude in the third party.
You know, Claude, you should do this, Claude, you should do
that, Claude, you should do this.
And I, you know, and, and per AI is gonna have an existential crisis already, you
know, thinking in, in a third party form.
But, um, but I, I think it is worthwhile everybody checking out that system, though,
because you, you can learn a lot from it.
And then I, I remember last year.
Um, when Claude 3, it just came out, I think it was the, called 3.5 Models
At that point, one of the videos I once did was I took the Claude
3.5 system prompts and then I put them on top of the Llama models.
And, and I'm gonna be honest, even though those system prompts were
designed for the Claude models.
They actually improved the, the Llama models as well.
So I, I honestly, I think it's something everybody should really
read up on that will help them.
Yeah, for sure.
So there's a lot there.
And Kate, maybe I'll turn to you.
I mean, taking Chris's first point, I thought one of the most interesting
things in the prompt was the degree to which it really feels like in prompting
we're trying to figure out how much we need to specify versus like leave
up to the knowledge of the model.
So there's is an interesting quote where it's like, Claude should
be cognizant of red flags in the person's message and avoid responding
in ways that could be harmful.
And part of Simon's annotation is like, it just has a notion of what red flags are.
Um, and curious about like how you think about that.
I know Chris is saying that these prompts are very long, but it almost
kind of presages a world where, you know, we can increasingly sort
of rely on model knowledge and keep prompts almost sort of short.
Um, but curious how you think, think about that.
Yeah.
You know, I think what surprised me most was just how much of the Claude, you know,
experience they're leaving up to a single prompt versus breaking some of these
things down into more granular steps.
So you mentioned red flags and you're saying, you know, all right, Claude, pretty
please don't include, you know, don't respond to red flags.
Whatever red flags might be.
And you could easily envision in a different experience that
Anthropic could have built where first there's a step where there's
literal screening by a model whose only job is to screen for red flags
or any other risks or harm and biases.
And they might still be doing this behind the scenes, but, um, you
know, I, I think where I see a lot.
of the world starting to move and where I'd ex would've expected Anthropic to
go a little bit more with Claude 4 and they didn't, is dividing this up into more
steps, running more inferences and leaving less to kind of a really long essay that
you have to maintain and, you know, do basically like security on a prayer.
Like pretty, pretty, please will you, you know, not respond to harmful
content, not respond to messages instead have more verifiable checks
and balances that you can, uh, kind of
articulate via software and more programmatic functions
that you're checking.
Yeah, for sure.
Um, and Aaron, I guess to take the other side of kind of Chris's response
to that first question, you know, you opened up with a, the, the, around the
horn question by saying like, I don't actually really do much prompting at all.
You know, um, and I think Chris is, you know, kind of almost taking the
view that's like, it's good for us to kind of like read and understand
what's going on here, but I don't know if you'd say this is maybe a
little bit too aggressive, but like.
Is it worth it for us to kind of study these prompts, uh, as
someone who just kind of like gets models to generate them for him?
Yeah, I mean, I mean, you know, there's, there's sort of two
schools of thoughts here, right?
And, you know, should these prompts, you know, be released or
not, you know, and if they're not released, then they're potentially
gonna be leaked anyways, you know?
And.
I think one of the schools of thought is we should release the prompts, you know?
Um, because it's, it's proof, you know, that AI can be incredibly smart, but
it can still completely misunderstand the assignment or what you're telling
it to do, unless you understand like that manual of how to use the LLM.
Right.
But on the other hand, maybe you don't wanna release the prompts
because, you know, AI could be like this new intern, right?
Where it's eager, unpredictable, but somehow it's already
running the company, right?
And so we have to be very careful about releasing too much.
Um, and then, and the leaking part of this, right?
From, from what I saw, it looked like that Anthropic did
release some of the system prompts.
But what was really leaked were the tools part, you know, which could
be very, uh, dangerous, you know?
And so.
And so the, the notion that, you know, do people need to read, you know, these
manuals to understand how to use LLMs, um, from the expert level, you know, you
know, when you ask, are you one to 10?
If you're like an 8 to 10, you know, then I think it's good to study it, right?
Um, if you're down on the lower end, one, three -
maybe not, but I do think that, um, you know, whether
or not these prompts, you know, are gonna be released or not, right?
Um, is sort of up in the air, you know, of, you know, should it or should it not.
And there are a lot of inherent risk, right?
About, uh, exposing, you know, these, uh, prompts.
But there's also benefits.
I think it's not a bad thing though.
I mean, to sort of come back to Aaron's point and case for a second, right?
It's like.
It's more of a handbook and a guide for the model.
The model's gonna learn loads and loads of things over time, and it's
gonna be put in different situations.
But like us as humans, we're in different situations.
How I act at a party is gonna be different to how I act to this podcast, right?
So before we came onto this podcast, or a wonderful producer
was like, Tim, make your bed Chris.
Set up straight.
You know, put your camera down, da da da, da.
Here is the guide.
For how you should behave in this scenario.
And that's different in other scenarios.
So I think it's okay for them to say, you know what?
You are, you're not in an enterprise setting at the moment.
And remember, the cloud model is gonna be doing enterprisey stuff.
Maybe it's gonna be doing research, et cetera.
You are now acting as a general chatbot.
You're ask, you're answering general queries and that means.
Average human beings don't want to hear you waffling on about,
you know, life, et cetera.
And what you think of this book, it wants it in a couple of paragraphs and, and it
doesn't want you hallucinating things.
It wants you to go and use the web tool and go, come back with the answers.
So I, I think it's okay to have that in a system prompt to, to
guide, like a handbook of how it should behave in that case.
And because that's how we deal with things as well, right?
In different scenarios, we have different guides of how we should behave.
And I think one of the most interesting things here, and it goes
back to what Kate and I were just talking, a moment ago about is like.
You know, originally I think the idea of these prompts was
to specify in detail, right?
Like what you wanted the model to do.
Um, and I always remember the, the joke I had with a friend was like,
are we just rebuilding programming?
Where you're like, you have to just like say really specifically
what you want the computer to do.
But you know, there's another quote that I had written down here.
So one of the elements in the prompt is if thinking mode is interleaved or
auto, then after function results, you should strongly consider outputting a
thinking block, which is like kind of this very funny thing where you're like,
okay, now the model has thinking mode.
But rather than saying like, under these specific conditions, engage it, it's just
like you should strongly consider it.
Right.
And it, it's, it's sort of interesting on like the degree
to which these prompts like.
Are actually giving us control over what the models are doing or
versus, or versus us just like, kind of like giving it vague rules.
I don't know if, Kate, you wanna respond there?
Well,
I, I mean like maybe the what might look like control.
I think the other thing is like, how much have we really tested?
And if you don't release system prompts, it's really hard for the
academic community to do research and to validate some of this.
But how thoroughly have we really tested if.
Every single line of that system prompt actually has the intended effect.
What is the degradation and performance and how often the model produces
thinking if that line is there or is not.
I see prompts all the time where, you know, people write them based
off of like one weird edge case, and so they add a line and that
one weird edge case disappears.
But do they really impact the model behavior as a whole across,
you know, everything that you're trying to impact and study?
So I think there's also some degree of wishful thinking with system prompts
where the model's been trained for a lot of these behaviors already, like when
to do thinking and all sorts of stuff.
Um, so.
You know, I think we're trying to nudge and steer, but it also makes it seem
like, oh, well if I told the model X, Y, and Z, then X, Y, and Z will happen.
'cause I gave it this nice little playbook and I think it gives us a
false degree of, uh, security that that is actually gonna be followed.
I. I think a lot of these system prompts are probably way too long.
If you actually want something that really is like, there should almost be standards
of, is this system prompt, uh, certified to impact this type of behavior and,
and the degree that it, it specifies.
I do think there's a balance, right?
That um, you know, you know, going back to tool calling, function, calling right?
Is um.
That there's a, there's a huge inherent risk, I think, of leaking
those types of prompts, right?
Because depending upon the, the use case, for example, on the extreme, if you're
doing like, um, robotic surgery, right?
And somebody could have a tool call, right?
And hack the tool, call and bypass different types of, um.
Uh, different types of, let's say, uh, refinements, right?
They could do jailbreaking, refinements, uh, bypass content moderation, um, force
different types of searching, right?
Which, which could have catastrophic, you know, impacts on the patient, right?
So those types of, um, I think system prompts could be obfuscated.
Uh, they could be encrypted within fragments such that they're not.
There, you know, to be used.
Right.
Um, you know, because I don't think some behavior
should be, um, enabled.
Right?
Um, and, uh, released like if you're filing taxes or if you
are sending an email, right?
Uh, not you, but the LLM or genAI doing that.
Right. I certainly wouldn't want it to "Oops.
Um, sorry Kate, I sent an email on your behalf, you know, uh,
because I hacked in, you know, this certain tool call or function call".
Right?
Um, you know, so I ghosted her, right?
So I mean, I mean, so those, those types of more extreme
right exploitations I think just need to be carefully thought of.
And I thought Anthropic was taking that into account by not
releasing some of those, um, system
prompting elements within their original sort of manual, right?
But then they were leaked anyways, right?
So, so there's always, you know, this, that, that, uh, risk and balance that I
think we all need to just think about.
Yeah, for sure.
And I think, I don't know, the, the layers of obfuscation here I
think will get very interesting.
'cause at the end of the day, it's just, it's just tokens, right?
And so you can imagine constructing a prompt where a human reads it and is
like, oh, well these are the rules that guide the system, but actually like,
you know, impose certain other kinds of not written behavior, uh, on the model,
which I think will be like a really interesting, you know, next development
if it hasn't already happened, right?
Uh, because all these companies know that the system prompt's just gonna get leaked
within hours of the model coming out.
So I'm gonna move us on to our next segment.
Um, really interesting collaboration dropped between, uh, legendary music
producer Rick Rubin and Anthropic.
Um, they dropped this kind of document, uh, on thewayofcode.com
and what it appears to be is a rewrite of the Tao Te Ching.
Um, but.
About vibe coding.
Um, and, uh, this is like both a very funny kind of collaboration in some
ways and made me think a little bit about this kind of famous interview
that Rick Rubin did with 60 Minutes, where he said, I have no technical
capability and I know nothing about music.
Um, and, uh, he took a lot of, you know, criticism for this, um, being, you know,
the legendary music producer that he is.
But I kind of love this because it sort of asks the question for vibe coding about
like just how far vibe coding will go.
Um, and whether or not in the future we really will have Rick Rubin like
producers for code, um, in the same way that we have for music where it's really
unclear like what Rick Rubin's skill is.
He just appears to be really good at getting number one hits.
I don't know.
Maybe Aaron, I'll throw it to you first is like, do you feel like
in the future of vibe coding.
We'll see people with zero technical ability be able to do incredible
things with computers just given where things are going with code gen.
Yeah, I mean, I mean there there, there's a, again, a continuum here, right?
And you know, as, as a, as an engineer and scientist, right?
I, I do believe that the mind gets into like these different patterns and
constructs pathways as one develops
and codes and builds.
Right?
And you can think of it as like a flow state, right?
And then if someone just walks into your office, right, when you're in the middle
of it, it's sort of like your flow state collapses and you gotta start all over
and rebuild those constructs, right?
And so that, that to me is
kind of like this vibe coding.
Um, but I think the way that Rick, um, you know, is approaching this is
it's more of an art form or like this cultural phenomenon, you know, where,
you know, you know, I did visit his, his was it way of code site, right?
And looked, um, and it looked like you could go in and actually personalize
some of like the graphics and such, you know, that, that he already sort of
seeded with a vibe coding element, right?
So.
So, so I think in short, you know, if you are building like a production, um, app
application that needs to be at scale, I think pairing vibe coating with good
engineering, you know, is very important.
But if you're just doing it for, um, you know, a prototype to build an
experience that doesn't have to be so precise, maybe this kind of style of
vibe coating, you know, is the way to go.
Okay. Any responses to this?
Um, I know it's always this kind of push pull, right?
I mean, I, I think Aaron's response has a lot in there is where it's like.
Well, this is good, but like we might really need real
engineering at some point.
Um, but curious about what you thought reading through, uh, the way of code.
Yeah.
You know, I, I think in many ways, you know, coding can be viewed
reasonably so as an art form.
It's creating in kind of the act of creation.
I think, uh, it is inherently artistic and creative.
And so from that perspective, I think there is something
interesting about how do we unlock
future developers who don't have the same backgrounds, who bring different
experiences to find new ways to solve some of the thorny challenging problems.
And I think that's kind of the spirit that, uh, Rick is coming
from that, that I've seen.
But I also, you know, think if we talk about
critical infrastructure and you know what the world runs on.
Like, you know, there's a big difference between art and, uh, you know, mainframe
systems that run, you know, all of the financial transactions in the world.
And, you know, there's different degrees of reliability and,
and trust in everything else.
So, you know.
I think it's important to make sure that there's kind of a balanced approach
at the end of the day.
It's not saying the world is going to be vibe coding and only vibe coding, but how
do we use this rather as a tool to engage more with the community, uh, with people
who come from less traditional backgrounds that traditionally don't know how to code.
But could bring really new, unusual and powerful ideas that could be,
if you know, are being gonna be implemented in some sort of critical
capacity, implemented maybe with more, uh, knowledgeable traditional means.
Yeah, I love that.
How basically, you know, maybe in an earlier era kind of computer code,
you sort of couldn't approach in an artistic manner, but we're now living
in a world where like almost the boundaries of that are a little expanded.
Uh, and so you can approach it as if you were.
You know, sort of a music producer or just kind of like vibing with it.
Chris responses.
I saw you went off mute here.
Yeah, I love it actually.
'cause I, I do think programming is, is an art form.
I, I know we want it to be a science, but I do think it is art.
So, and I, and I do love the idea of
exploration and being able just to kind of figure things out.
So I don't think we always need to take an engineering approach.
And, and, and again, I, if I think of architecture, I, I don't mean
computer architecture like we do.
I'm meaning as in people with pencils and beards and flip flops
and things like that, right?
I don't, I don't know.
Um, but, but.
You know, if somebody came in and went, I want to design a new house, and you
know, and then they, they start and sort of drew, drew a picture and now
there's your new house, you'd be like, huh, should I give that to the builder?
And they'd be like, sure.
And you'd be like, I don't think that, how is this gonna work?
And.
And, but that's fine.
But if, if you then got the technical schematic architect who just
build that, then you know they're gonna be following the process.
Uh, you know, this, joist needs to connect to this.
I don't know any about building terms.
Joists think is the only one I know.
And that's like a thing that houses have.
Exactly.
And then I know roof and stuff, and then there'll be, but, but
where's the creativity that's not gonna create you, you know?
Um.
Uh, the Guggenheim or something like that.
There you go.
I was trying to think of something that was a fancy building.
It's an art thing.
Yes.
Yeah, exactly.
So, so and so I think you've gotta have that mix, and I think
that's the, it's almost the same as like music production, right?
So in Rick Rubin's case, right?
It's, it's just like.
I think vibe coding allows you to break things down into their individual
elements and then recompose them, right?
And then I think that's okay to then take that to an engineered state,
but I, I think that whole process of creativity is a good thing.
So I'm, I'm a big fan of vibe coding because you can test out ideas really
quickly and explore it, and then you can go and engineer the parts that you
need to engineer and, and get a little bit more, uh, process oriented about it.
But, but.
Why kill the creativity?
So I love it.
I'm a huge vibe coder.
Uh, and I love the collaboration.
Kate, uh, this makes me think a little bit about like how vibe
coding is gonna evolve within an organization or within an enterprise.
You know, in all the companies I've worked for, there's always been like a little
bit of like, uh, a class system, right?
Between like the designers and the engineers and the designers were like,
here's a mockup that you should build.
And the engineers are like, we have to build it.
And like, ah, like all these people with their crazy designs.
And it kind of feels like what Vibe coding is gonna allow.
Like what will change is that like designers can suddenly
build workable prototypes.
And so like there's a whole degree to which, like this allows a, a group
of people within a company to kind of like seize the means of production in
a way that I think might be like deeply disruptive to kind of like the, the
natural state of affairs that has kind of presided, uh, over these companies and.
That feels like it's gonna be really interesting to watch.
Yeah, I don't know.
I think it can go both ways though, because I think designers or whoever
is, is trying to test the waters and who always say, oh, go build this.
You know, it should be easy.
Just put this button over there and then you'll be fine.
And that button should do all these other things, by the way, and oh,
it also needs to be compliant and X, Y, Z. And so they'll try it and.
Undoubtedly it will fail if they just kind of vibe, code and throw it out
into the world, uh, when it hits real production and, and learn some pretty
nasty lessons that it's actually really complicated and there's a lot
of important work that, you know, developers are doing behind the scenes.
So, you know, I, I think it's just gonna probably be really important
though as a communication tool, uh, to help better articulate
vision, to help better explain what.
What you're looking for or what the target goal is to help iterate faster on proof of
concepts and you know, experiment faster.
So I think it definitely will disrupt from those perspectives.
Yeah.
And it actually occurs to me as you're talking that like the
annoyance will work both ways, right?
Because suddenly engineers can be like, oh, I generated this picture
of the website wanted you to create.
It's like everybody's gonna be Aaron in like in everybody else's business.
It seems like Aaron.
Yeah, I mean, I mean, I mean this, this whole notion of vibe coding to me
is very similar to inventing, right?
Because it's, you know, you get lots of people together, um, and
you need different perspectives.
You need the artfulness of creating novelty, but you also need the
engineering to make sure it's implementable and, and it can be used
in some kind of embodiment, right?
And vibe coding, to me, is very similar, where you get, you know,
the creatives together, you know, you know, it's, it's a blur.
It becomes more of a blur where the scientists and
creative now become one, right?
Because you're vibing to.
Sort of do like this vibe science or vibe engineering, you know, to have
these alternative, um, hypothesis.
You know, it, it's, it's like exploring different branches very quickly.
Um, and then when you need to get into an embodiment, then you build right?
And, and then, and then implement, you know, so, so.
So I think some of the white space here would be how do we connect
vibe coating, um, to the actual build implementation, right?
And deployment of something that's practical, that's usable, that
can handle high scale and load.
Um, some, some of the really hard challenges, right, that
we face every day, right?
So that, so, so I'm pretty excited about that area, which I think is
just beginning to emerge a bit.
So for a third segment, we're actually gonna do another design
and AI story in some ways.
Um, the biggest business story of really the last week or two in AI has been this
enormous $6 billion plus acquisition of Jony Ive's secretive startup io.
Um, and Jony, Ive, if you don't know, was most famously the chief architect
of the iPhone and kind of like the.
Sort of design mind between app for Apple during kind of a whole era of its history.
Um, and the announcement is that Jony Ive himself, is gonna go collaborate
with OpenAI on hardware, uh, through a design collective, um, that he owns.
And so this is a, a huge transaction, right?
Billions of dollars.
Um, and you know, I guess Chris maybe to turn it to you like.
Is it worth it?
There's not even a product here, uh, and they're putting $6 billion down.
How do you think about why OpenAI would do this and, and if it really
is gonna pay out for them in the end,
I hope for $6 billion, he does more than collaborate for them.
That that seems a huge bill for collaborations.
You know what I mean?
I'm collaborating with you guys just now, and I, I'm not paying $6 billion.
Sorry about that, Chris.
Yeah, so I would be more worried if they paid $6 billion and Jony Ive went:
"You can have my company, but I'm outta here, you're hearing nothing from me."
You'd be like, what am I buying at this point?
You know?
So I, I, I think the whole thing, I, I mean, Jony Ive's incredible.
I really do think, and, and therefore you're buying his talent,
you're buying his brand, et cetera.
So I, I, I do think it's gonna
go beyond collaboration, and I think it's really gonna be about
shaping the ideas that form what the future of AI is gonna look like.
Because if we, if we actually truly think about where we are, we're now
in this sort of multimodal world.
We've got.
AI is becoming cheaper, you know, being able to run on device.
So there's new form factors that need to be discovered to, you know,
to have AI in the right place.
All right. How do I want to interact in that world?
How is the world of agents?
Yeah, I said it.
How is the world of agents gonna behave?
How does the future of web look like for that?
How does the future of mobile devices?
I think there's a lot of things to, to really work out and discover and,
does that mean that how we interact today is gonna change
and, and I think it will change.
So actually.
Being able to bring together AI companies and design companies
together to go and figure out what that future looks like and experiment.
I, I, I really think that is a smart move and, and have somebody like
Jony Ive, who has, who's, uh, been through those transformations before.
Um.
I think it's a very sensible thing.
Um, so I think it's an exciting collaboration and I kind of look
forward to what this kind of next wave of experience design
for AI is gonna look like.
Yeah.
And Kate actually, I mean, so I mean, to give them a little more credit,
like this is more than just like a vibe acquisition in some ways.
Uh, I was like curious.
Uh, so there has been some details kind of leaked or rumored about
what it is that they're working on.
And as far as we can tell, it's a kind of like AI device with no screens.
That's kind of their, their pitch.
Um, and uh, that's
pretty interesting, right?
We've really built, you know, a whole digital paradigm on screens and so the
idea we'd, that we'd go completely no screen in the future thanks to AI is,
is pretty surprising, don't you think?
Yeah, I think it's very surprising, but you know, I think it also kind of gives
vibes of some of the AI companion type things we've seen, like, and nobody wants
just basically to be accused of making a tamagotchi where you've got a tiny
little screen companion that you know that you have to feed, otherwise it dies.
So, you know, I, I think they're probably gonna lean into some sort
of this like, life assistant route doesn't need eyes, you know, or if,
if it doesn't need eyes, it doesn't need a screen to communicate with you.
Right?
We've got better tools now, um, that they're working on, but it'll be
interesting to see what they come up with.
You know, I, I've struggled to see that they won't be some sort of like.
Phone app experience as well that connects to whatever
device they're also working on.
Yeah.
It's hard to untether from that completely.
Um, yeah.
Aaron, how do you size it up?
I mean, so the most obvious precedent for something like this is the,
the humane pin, which I think we're talking about a year ago, right?
Which is like a screenless device that you wear that's always on, that is kind
of like an AI assistant in your life.
Um, and.
One point of view is like no one wants that and that's why it didn't work.
There's another point of view, which is the technology kind of wasn't
there, and we might finally be there.
I don't know if a year later is enough time, but obviously things
are changing very quickly in AI.
Yeah, I mean, I mean, I'm a bit stuck on, you know, that, that this is one of the
largest deals for 55 employees, right?
That's what it is.
That at least that we know of, and if I do the math right,
that's what about 118 million-ish
per employee that you're paying for.
I, I mean, yeah, that, that's, you know, pretty good.
It's a high stakes bet on this, on this talent, basically because,
uh, the valuation right, is very speculative because I don't think
that this company has created a user base or any devices at all.
Right?
So it's, it's basically a high stakes bet on bet on design talent, right?
For these 55 employees, right?
But if it goes right, um, to creating these AI companions, you know, so.
I saw that, that, uh, Sam Altman, they wanted to release what, a hundred
million of these AI companions, right?
I, I mean, roughly about that, you know, and if they can pull it off, I mean,
they can sell these very cheaply to get back, you know, their 6.5, you know, um,
what billion dollar, uh, investment here.
Right.
But, but, but I mean, yeah, but again, you know, I, I just wanna
see something tangible, right.
Very quickly.
Um, and.
And I think that they can pull it off.
Right?
Right.
I think their mission is in the right place and, and I would just say, you
know, Apple, you know, watch out, you know, Apple Intelligence, you know,
you need to get that going quickly.
Right?
Because I think if OpenAI you know, works with Ive here, then you
know, then these AI companions could really, you know, be a nice bet to
understand what's happening in one's life without having a screen perhaps, or,
you know, maybe you're going to extend to an already existing screen, right?
That's already there.
But, but these different form factors, I think it's gonna be
really, really interesting and, and combining, um, these cutting edge.
AI experiences, right.
Is is gonna be fascinating to watch as the field emerges.
One thing though, Aaron, that I think Apple does well, obvi,
they definitely need to catch up.
I think you're right, but as we talk about wise in the past, assistance have failed.
And what I think OpenAI will struggle with is still this
notion of privacy and trust
with data.
Like, I think another reason why the AI virtual assistant companions, I mean,
there was plenty of things that were done for on edge, you know, type learning.
But it is just still this like shadiness factor of like, what are, why are my
life's now being recorded and being beamed up to, you know, some machine and AI
intelligence and I, I don't know that.
OpenAI is best suited to crack that.
So it'll be interesting to see if the new design team can help and think
through new ways to design for trust.
I think that's something Apple does have as a better starting position
if they can figure out, you know, some of their Apple intelligence
work, what they're doing there.
Yeah, for sure.
Yeah.
I think the, the paradigm shift that will, that is implied for OpenAI to
get this right, I think is really hard.
Um, 'cause I think it's more than just devices, it's more consumer trust, and
how do you ensure that from a technical standpoint, and it's like a whole
nother way of thinking about this stuff.
I don't know, I think we overthink trust sometimes.
You know, I, I mean, I know we want trust, et cetera, but it's a trade, isn't it?
It is like, here is the functionality that I'm gonna get, how.
Better is my life gonna be, think of the hundreds of million of people
are using ChatGPT every day, right?
And, and everybody knows that you're giving away your data,
but you know what it is.
You're getting utility from that.
So everybody's kind of prepared to make that payment or not.
And some things you're not gonna make that payment for and
say, okay, I don't trust that.
And, and you'll lean into something and say, wow, you know, the Apple approach in
this case is gonna, is gonna be important.
And you might lean into that direction.
But I think everybody sort of.
Takes that utility and we understand that we're given a bunch of data away.
Um, personally I find it very unlikely I'm gonna give up my iPhone.
I love my iPhone, I love my iPad.
I, everything is connected, all of my movies and, and this
thing doesn't have a screen.
What am I gonna play my movie on?
So, I, I just, I, you know, I, I, I think there is a whole point
about ecosystem things don't exist within islands, actually.
And the thing that Apple does very well.
Is they have a very good ecosystem of platforms and devices, right?
Where everything connects well, and therefore, if they're making a
move into that space, and I think they'll do very well, is you have to
bring the ecosystem along with you.
Because actually, back to the point about that pin thing, right?
That didn't connect into anything, so it was, it sat on an island.
So I think that's really gonna be the problem OpenAI has to think about is.
Is, what ecosystem are you gonna plug into?
And guess what, the, the only two choices in this case are Apple and Google.
So, you know, um, so you, you gotta start figuring this out because,
because if you can't plug in into that ecosystem, you're gonna have a problem.
Alright, so Chris already beat me to it by saying the word agent, but we'd be remiss
if we didn't do a story about agents.
Uh, so I'm gonna close up today with our last segment.
Um, super interesting Verge interview that popped up with Jay Parikh, who was the
former, uh, engineering lead over at Meta.
And is now over at, uh, Microsoft working on all things agents for them.
Um, and we haven't heard from Jay in a little while.
I think we talked about him on the show when he first joined Microsoft.
And so, you know, I thought it'd be useful to kind of check back in on what he's
been working on and to talk a little bit about Microsoft's strategy in the space.
Um, I think the most interesting part of the interview, he had this quote, he said,
I want our platform, meaning the Microsoft platform for any enterprise or any
organization to be able to be the thing they turn into their own agent factory.
So the idea is like.
Whatever you're building, you're gonna be able to turn it into
an agent using Microsoft tools.
Um, and we've talked about this, this came up on last week's episode
as well, which is that, you know, it's a little bit of a joke.
The agent means everything.
And I think one way of thinking about what these companies are doing is
that they're all battling for like.
What an agent even is.
So you know, for Google io everybody was saying, oh well their
version of agent is like search.
It's like not that surprising.
'cause they're a search company and I guess Microsoft is kind of articulating
a new vision or their own vision, if you will, on how agents should work,
which is very much kind of like I.
You know, not really a platform, but like every enterprise being
its own manufacturing kind of facility for agents, I guess.
Kate, maybe I'll turn to you as like, it, it's assumes a world
where these things become really commodified and really democratized.
Um, do you see that happening, right, like soon?
Is that a realistic way to think about where the market is going?
Yeah, I mean, I think we see also a lot of other industry players that are putting
some pressure on Microsoft to do similar.
Like we've got, uh, the Agent Force from Salesforce, you know,
all everyone's coming up with a suite of pre-canned agents.
watsonx announced a bunch of agents that Think just this past conference.
And so I think Microsoft's training, just to better speak the language
of what all of our enterprise users and customers have been trained
to speak, which is, I need agents.
I need agents now, everything I can build can be built as an agent and
trying to make sure that they're hyper targeted towards this
kind of modality for how people are trying to build and starting to build.
And I think it is very much being democratized as we start
to see a lot of performances for useful enterprise tasks converge.
Any model can do a lot of the things that, um, drive, you know, 80%
of the value for these companies.
So.
The ability to build your own, to swap out parts, to customize, I think is gonna
be critical as people continue to look to how to avoid getting locked into just
kind of one, one endpoint and, you know, ultimately continuing to innovate within
their own four walls of their company and how to use their data to to create value.
Aaron, there's almost a question here.
I think about like, almost like the ceiling on commodified agents.
Uh, we talk a lot about, I think, on this show about like how complex it is to
like orchestrate agents to work properly.
You know, you need like the right protocols and you need tasks to
be done in the right way, and it needs to be fine tuned and evals.
The skepticism I've always had is like, well, it just seems like not
every enterprise just has people who know how to do that outta the box.
But I guess Kate's, I don't know Kate, I don't wanna put words in your mouth,
but you're almost kind of arguing that there's enough kind of common tasks
that like the sort of out of the box agent will be something that like most
enterprises be able to, to play with.
How do you think that market's gonna evolve?
It sort of feels like it's like gonna go in two directions almost over time.
You know, whenever I think about agents, the first.
Thought that pops in my mind is James Bond oh oh seven.
Right. He's the ultimate agent.
Right.
And, and we need to watch out for double agents and make sure that we
can ensure that they don't go rogue.
Right.
Um, and I, I was, you know, looking at this and you know what this
Agent Factory has, you know, it's, it's like it has this a service.
It, it uses agent ident identity and governance.
You know, where I can provide identification for each of the agents.
Such that you can't go get a fake ID and, you know, uh, do maybe Doppel
gang, you know, ano another agent to go do something else, right?
Um, you know, it's got observability management, low code, no code tools,
but, um, but I mean, you know, I think everybody in industry is trying
to do, you know, get in the game, AI agents, what they should be.
Uh, but I think for Microsoft, one of the biggest differentiators that I see,
um, I happen to look, uh, two weeks ago.
You know, I look every now and then at the, it's called the top five hundred.org.
It's, it's this website that tells you the, the fastest
supercomputers in the world.
I. And, um, I was curious, was cloud on there?
Right.
And I think it was the number four, the fourth ranked one was called Eagle.
Right. And it runs and built on, on Azure.
Right?
So it's a cloud-based, um, super supercomputer, which, you know, I
didn't think I would see that right.
Happen so quickly that Wow.
Okay.
So this isn't like a, like a blue jean, you know, a. A
particular piece of hardware.
Right.
Um, so, so to me the compute power that, um, Azure, that, that Microsoft
has on Azure, I think really can give them, you know, a nice opportunity here.
They have data sources.
They can integrate with Windows.
They already have what Azure AI Copilot
pieces they can expand into, into consumer markets, you
know, with like Windows Copilot.
So I think they have sort of the bread and butter elements, right, to
make this AI agent factory happen.
It's just hopefully they can, uh, release some of these features to map
to their vision of how they're gonna do it so we can avoid these double agents.
Yeah, for sure.
And Chris, it looks like you're about to jump in.
I mean, if I can kind of, maybe.
Prompt you with a question.
You know, we've talked a lot about like who's gonna win in the agent market.
There's almost a part of me that kind of thinks about Aaron's comment and is like.
Maybe actually over time the agent markets is gonna divide up, right?
That it'll just turn out that like if you have a task that really requires
search, you'll be using Google's agents, but you may not really need
all those capabilities and so you'll like, maybe you really are more married
to the, the Azure infrastructure and so you use Microsoft, like it may not
be winner take all in this market.
I don't know if that was what you're gonna address, but.
I, I don't think it will be winner take all, and I'm, I'm, I'm happy to kind of
say that because one of the big thing that's really happening in the market
at the moment is the commoditization.
So if we really think about what's going on here, I. All of the major
providers have hooked onto model context protocol, MCP, as the, um,
standard for remote tool calling.
And I think that's a good thing, right?
Because we're gonna move into this, uh, this world where we want to
be built on composition.
So if everybody's at least standardizing on tools, then there can be a
marketplace of tools, and it also means the models can be trained to
work with those tools very well.
And therefore, if you want to shift to a different agent for
whatever reason, then guess what?
You can bring your tools along with.
That.
And then for, and I think in the factory context, this makes sense as well, right?
So from a factory perspective, I'm gonna want to build something, right?
But actually maybe 80% of the tools already exist and maybe
80% of the agents that can work with those tools already exist.
And I really need to do the 20%.
Whereas in the previous world, I would've had to do all of that.
So I, I think that becomes important.
And then with things like.
Eight oh a and a CP.
So having agent protocols where you can have a standardized way of having agents
be able to talk to each other, and again, whether it's Salesforce, Microsoft,
et cetera, they're all landing on protocols, sort for that interoperability.
So I think that moves us into a marketplace again.
So I think as soon as you start to get in this world of
marketplaces and you have this.
Area of standardization, then I hope that that means we get away from this
winner takes all market and then folks can specialize on the things that they're
really good at and their differentiation.
Their differentiation, the, the, the good news and probably the bad news
at the same time is actually, I think this brings us back into the discussion
we had at the beginning, which is about vibe coding, because actually.
If I've got the engineering of agents that do tasks really well and I've got tools
that do things really well and models have done that, and then agents know how to
talk to each other and we all know how to talk to models, et cetera, then actually
vibe coding becomes quite interesting.
I. In the world of factories because then I can sort of vibe up what I want
and then I can hand it across to some agents who are gonna do a productionized
version and use productionized tools and it completes that circle.
So.
So I know we were talking about vibe coating being a toy, but actually I. I
want you to think about that factory model for a second that kind of Microsoft's,
uh, discussing and, and I don't, I, I think those two worlds blind over time.
I can also envision a world where we have these AI agent skills,
marketplaces, you know, so, you know, if, if we use these new approaches.
Um, so, so we just released what, what's called aLoRA, I think it's
activated low rank adaptation, where, um, you have like these weights that
go, that can influence the attention.
So, so your weight matrices that project and create, whether it's your, your
keys, your queries, your values, right?
Um, but they can be fine tuned to what kind of skill you would like, right?
And then you save those weights and you can dynamically on the fly.
I. Import that skill so that now that same model of which the, the
same model topology of which you created your lower weights now
has a different behavior, right?
So, so this decentralization of skills is there, and you could vibe,
do, do some vibe, skill, right?
To create what kind of skill vibes with you and then put it up on
a marketplace, right, to a share with your friends or, you know, um.
Or create these emergent skills, but, but I think that might be where it's going.
Um, and then last thing, I could talk about this for a while, but model
distillations could play in that as well.
Okay. I'll let you have the last word here.
I know you on the first conversation, were maybe the most strong on
look, you're not gonna use vibe coding to like build a bridge.
I think Chris is maybe ending on a note of optimism.
It's like maybe agents are the, the bridge that gets you there.
Um, do you, do you buy that story or are you still a little bit skeptical about,
you know, how far Rick Rubin can get.
Uh,
I think humans are gonna have to be in the loop more than
just in the vibe, coding step.
So I completely agree.
I think vibe coding to create something, kicking it over to an
agent to iterate, build it out a little more detail, like all fair
game and is gonna be pretty exciting.
But I'm not ready to totally just kick out the, the human in the loop part of
the process there, where they start at the beginning and then, uh, you, you,
you just see what bridge pops out on the other end and walk across it, uh, blindly.
Seems like a fine bridge.
All my agents are telling me it's the best.
Yes, every agent agrees.
Yeah, exactly.
All right, well that's all the time that we have for today.
Kate, Chris Aaron, always great to have you on the show and, um, thanks
all your listeners for joining us.
If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify,
and podcast platforms everywhere.
And we will see you next week on Mixture of Experts
IBM,
IBM.
Great, job everyone.