Prompt Engineering: Here to Stay
Key Points
- Prompt engineering is considered a lasting discipline, even as tools emerge to automate prompt creation.
- The panelists disagree on the future of prompt engineers: some say the role will disappear, others say it will evolve into something different.
- Major AI firms (Anthropic, Cohere, Google) are releasing or acquiring technologies that generate or tune prompts automatically, aiming to remove the human from the loop.
- Guests discuss broader AI impacts, from robots handling household tasks to potential job displacement for scientists and the hope for AI to become a collaborative “we.”
- The episode is part of the “Mixture of Experts” series, featuring a recurring lineup of engineers, researchers, and product leaders who dissect weekly AI news.
Sections
- Future of Prompt Engineering - A panel of AI experts debates the longevity of prompt engineering, the expanding role of AI in everyday tasks, and its potential impact on future jobs.
- Evolving Role of Prompt Engineering - The speakers critique the tediousness of manual prompt crafting, advocate for automated methods that explore broader natural‑language spaces to boost accuracy and productivity, and predict a shift in prompt engineering from low‑level token tweaking toward higher‑level model interaction.
- Prompt Engineering Evolves to Supervisory Role - The speakers argue that prompt engineers will transition from hands‑on tasks to overseeing automated systems, requiring broader skills like model training, data curation, pipeline integration, and hyper‑personalized context handling.
- Future of Obscure Prompt Encoding - The speakers debate whether prompts will grow increasingly unintelligible as models are optimized, highlighting a trade‑off between efficiency and readability while noting that advances in structured outputs aim to keep human‑model interaction understandable.
- Humanoid Robots: Promise and Hurdles - The speaker praises a recent demo and argues that while humanoid robots could integrate into human environments, obstacles like human‑level mobility and energy efficiency still keep them out of everyday use.
- Home Robot Adoption Timeline Debate - Participants discuss when advanced home robots will become practical, highlighting the current functionality gap, potential decades‑long timeline, and the trade‑off between massive generalist AI models versus smaller, task‑specific solutions.
- Specialized Machines vs Human Workflow - The speakers argue that purpose-built devices—like an optimized dishwasher and pool skimmer—can outperform traditional human routines, anticipate future humanoid robots that unify cost, flexibility, and dexterity, but warn that perceived creepiness may hinder widespread adoption.
- AI-Driven Automated Scientific Discovery Debate - The speakers examine a paper proposing fully automated AI scientists that could accelerate breakthroughs, while expressing skepticism about the claims' realism and pondering the future role of human researchers.
- Ethics and Potential of LLM Literature Review - Researchers debate the ethical implications of automated AI reviews while highlighting how large language models could vastly outperform humans in scanning and synthesizing existing scientific literature.
- AI-Augmented Collaborative Research Future - The speaker envisions AI as a supportive partner that augments human researchers, handling tasks like knowledge‑graph synthesis and even acting as representatives in international collaborations, while noting authorship dilemmas.
- AI Augmentation and In‑House Chip Strategy - The speakers explore whether AI will become a fully autonomous researcher or serve as a specialized tool to boost human workflows, and discuss OpenAI’s rumored push to develop its own expensive semiconductor chips in partnership with Apple to support this vision.
- Building a Proprietary Semiconductor Future - The panel debates the high‑risk prospect of creating an in‑house semiconductor supply chain to enable AI‑hardware co‑design and its potential benefits for enterprises.
- Future of Compute Costs & OpenAI Strategy - The panel notes that declining GPU expenses will lower AI costs over time, advises OpenAI to focus on solving key problems before pursuing full vertical integration, and promises to revisit the discussion in future episodes.
Full Transcript
# Prompt Engineering: Here to Stay **Source:** [https://www.youtube.com/watch?v=iSJenVM7KnQ](https://www.youtube.com/watch?v=iSJenVM7KnQ) **Duration:** 00:38:04 ## Summary - Prompt engineering is considered a lasting discipline, even as tools emerge to automate prompt creation. - The panelists disagree on the future of prompt engineers: some say the role will disappear, others say it will evolve into something different. - Major AI firms (Anthropic, Cohere, Google) are releasing or acquiring technologies that generate or tune prompts automatically, aiming to remove the human from the loop. - Guests discuss broader AI impacts, from robots handling household tasks to potential job displacement for scientists and the hope for AI to become a collaborative “we.” - The episode is part of the “Mixture of Experts” series, featuring a recurring lineup of engineers, researchers, and product leaders who dissect weekly AI news. ## Sections - [00:00:00](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=0s) **Future of Prompt Engineering** - A panel of AI experts debates the longevity of prompt engineering, the expanding role of AI in everyday tasks, and its potential impact on future jobs. - [00:03:04](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=184s) **Evolving Role of Prompt Engineering** - The speakers critique the tediousness of manual prompt crafting, advocate for automated methods that explore broader natural‑language spaces to boost accuracy and productivity, and predict a shift in prompt engineering from low‑level token tweaking toward higher‑level model interaction. - [00:06:13](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=373s) **Prompt Engineering Evolves to Supervisory Role** - The speakers argue that prompt engineers will transition from hands‑on tasks to overseeing automated systems, requiring broader skills like model training, data curation, pipeline integration, and hyper‑personalized context handling. - [00:09:20](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=560s) **Future of Obscure Prompt Encoding** - The speakers debate whether prompts will grow increasingly unintelligible as models are optimized, highlighting a trade‑off between efficiency and readability while noting that advances in structured outputs aim to keep human‑model interaction understandable. - [00:12:32](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=752s) **Humanoid Robots: Promise and Hurdles** - The speaker praises a recent demo and argues that while humanoid robots could integrate into human environments, obstacles like human‑level mobility and energy efficiency still keep them out of everyday use. - [00:15:40](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=940s) **Home Robot Adoption Timeline Debate** - Participants discuss when advanced home robots will become practical, highlighting the current functionality gap, potential decades‑long timeline, and the trade‑off between massive generalist AI models versus smaller, task‑specific solutions. - [00:18:44](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1124s) **Specialized Machines vs Human Workflow** - The speakers argue that purpose-built devices—like an optimized dishwasher and pool skimmer—can outperform traditional human routines, anticipate future humanoid robots that unify cost, flexibility, and dexterity, but warn that perceived creepiness may hinder widespread adoption. - [00:21:53](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1313s) **AI-Driven Automated Scientific Discovery Debate** - The speakers examine a paper proposing fully automated AI scientists that could accelerate breakthroughs, while expressing skepticism about the claims' realism and pondering the future role of human researchers. - [00:24:59](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1499s) **Ethics and Potential of LLM Literature Review** - Researchers debate the ethical implications of automated AI reviews while highlighting how large language models could vastly outperform humans in scanning and synthesizing existing scientific literature. - [00:28:03](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1683s) **AI-Augmented Collaborative Research Future** - The speaker envisions AI as a supportive partner that augments human researchers, handling tasks like knowledge‑graph synthesis and even acting as representatives in international collaborations, while noting authorship dilemmas. - [00:31:06](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1866s) **AI Augmentation and In‑House Chip Strategy** - The speakers explore whether AI will become a fully autonomous researcher or serve as a specialized tool to boost human workflows, and discuss OpenAI’s rumored push to develop its own expensive semiconductor chips in partnership with Apple to support this vision. - [00:34:10](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=2050s) **Building a Proprietary Semiconductor Future** - The panel debates the high‑risk prospect of creating an in‑house semiconductor supply chain to enable AI‑hardware co‑design and its potential benefits for enterprises. - [00:37:24](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=2244s) **Future of Compute Costs & OpenAI Strategy** - The panel notes that declining GPU expenses will lower AI costs over time, advises OpenAI to focus on solving key problems before pursuing full vertical integration, and promises to revisit the discussion in future episodes. ## Full Transcript
Tim Hwang: My opinion is that prompt engineering is never going to die.
It's a forever thing.
Kate Soule: Anyone who's worked with large language models has experienced some of
the pain, dark art, black magic of...
if I shout loudly enough at my model, maybe like literally if I
type in all caps, maybe this time it will do what I'm asking it to do.
Tim Hwang: The creepy factor is big, but these robots are also pretty
cool if you can get them to work.
Kaoutar El Maghraoui: I would love to have one actually in my
home, cleaning dishes and cooking.
Tim Hwang: How many scientists are going to be out of a job
in the next 10 to 15 years.
Shobhit Varshney: I'm just looking forward to a world where we start using the
word "we" when AI is actually starting to do something meaningful for us.
Tim Hwang: All that and more on today's episode of Mixture of Experts.
I'm Tim Huang and I'm joined today as I am every Friday by a world class
panel of engineers, researchers, product leaders, and more to
hash out the week's news in AI.
On the panel today, Kate Soule is a Program Director of Generative AI
Research, Shobit Varshney, a Senior Partner Consulting on AI for U.
S.,
Canada, and Latin America, and Kaoutar El Maghraoui, Principal Research Scientist,
AI Engineering and AI Hardware Center.
So as always on Mixture of Experts, we're going to start with a round the horn
question, and that question is, will prompt engineers even exist in five years?
Kate, yes or no?
Kate Soule: No,
Tim Hwang: Shobhit, yes or no?
Shobhit Varshney: Not at all, man.
Tim Hwang: Uh, okay, alright, and how about you, Koutar?
Kaoutar El Maghraoui: I think it's gonna evolve to a different role.
Tim Hwang: Okay, alright, well let's get right into it.
The prompt for this first story that we want to cover today is that we've just
had kind of a slew of sort of subplot, you know, sub B kind of announcements
coming out from all the companies.
They haven't been the most kind of prominent things they've been
announcing, but it has really kind of created a little bit of a pattern.
I think, Kate, you flagged this for us.
Which is that a lot of the companies have all been working
on prompt automation, right?
So Anthropic announced a Meta Prompt system that helps
generate prompts for you.
Cohere is launching a prompt tuning feature, which takes a prompt that you
have and improves it automatically.
And then Google recently acquired a company called Prompt Poet, which is
very much in the same functionality.
Um, and so this is a big deal, right?
If you're familiar with LLMs in the past, right, a lot of the work
has gone into making a good prompt.
Um, and, uh, I think the big thing about this is the future of basically taking
the human out of the loop, the idea that you won't need prompting anymore.
Um, and I guess, Kate, as someone who kind of threw this topic to us, do you
want to just explain for our listeners, like why, why is that important?
Right, like why, what changes when that happens?
Kate Soule: Yeah.
Uh, and I, I like what you did there, Tim, the prompt for today.
Uh, so look, I think anyone who's worked with large language models
has experienced some of the...
pain, dark art, black magic of...
if I shout loudly enough at my model, maybe...
like literally if I type in all caps, maybe this time it will do
what I'm asking it to do, right?
Uh, which can be a really frustrating process and doesn't make like, logical
sense, like I think we're all rational beings and ideally there would be
a really rational and structured way to try and prompt these models.
So I'm really excited to see a lot of work come out, which is trying to...
not take a human entirely out of the loop, but take a human out of
the loop of finding these phrases and tokens and words and patterns...
that seemed to be more effective for one given model, uh, to
perform a task that's in question.
So, you know, being able to, for example, search a broader space of natural
language and try and identify, okay, if I frame my question this way, um, now I
can get an improved level of accuracy.
I think that is going to be really powerful, um, overall just to improve
productivity and, and reduce some of the stress when working with models.
Tim Hwang: Yeah, for sure.
And now Kaoutar, you said actually in your response Is you, you, you agreed
with everybody that kind of, well, maybe prompt engineering is kind of not long
for this world, but you, you did say that you feel like the role will shift.
Um, do you want to tell us a little bit more about what you're thinking there?
Kaoutar El Maghraoui: Yeah, sure.
So there has been a lot of recent developments in prompt engineering
that is leading to significant changes, particularly in how prompt
engineers interact with large language models like Kate mentioned.
Things, for example, the meta prompt prompting from
anthropic, uh, meta prompting.
And the development here, it shifts the focus of the prompt engineers
from crafting these individual prompts to designing systems that guide
the AI to adjust its own behavior.
So prompt engineers may increasingly hear focus on creating
frameworks for meta prompting...
or refining the logic that underpins it.
And this creates a more robust Uh, role where engineers manage
how prompts evolve in real time.
And if you look, for example, at what, uh, prompt tuning from coheres,
for example, the prompt tuner.
So here, the, the prompt tuner from Cohere enables user to fine tune
and optimize prompts specifically for different applications.
And, you know, here, the implications prompt engineers may transition from
manually crafting prompts to overseeing or curating automated tuning systems.
So this kind of democratizes the prompt creation, and this could reduce the some
of these technical barriers to entry, pushing prompt engineers to focus more
on more complex or high impact tasks...
where deep expertise is still required, such as, you know,
designing industry specific models or optimizations of scales.
So, and there is also other, like, also if you look at the Prompt points,
uh, Poet acquisition by Google.
So here, you know, this acquisition emphasizes automation
here in the generation and the optimization of prompts.
And the implication here, this kind of further blurs the line between
AI systems and prompt engineers.
So, AI systems here, like, Prompt Poet evolve as they evolve, the role of the
engineer here may shift from towards more supervising role, uh, so where
you're supervising these AI systems that continuously optimize themselves.
So human prompt engineers might focus more on edge cases or creative tasks
or model specific customizations.
So I think the implications overall here is kind of shifting from manual
to kind of a supervisory role.
I don't like to say that, you know, we're going to completely remove human out of
the loop here, but more increased focus on optimizations, expansions of the
skill sets here for the prompt engineers.
They will need a broader set of skills, including model training,
data set curation, the integration of the LLMs into broader AI pipelines
and also some niche specializations.
So I think to sum up is kind of the prompt engineering is likely likely
evolving from hands on manual role into a more, you know, supervisory
role where engineers focus on higher level design optimization and
supervision of these automated systems.
Tim Hwang: Yeah, that makes a lot of sense.
And it's sort of interesting that kind of like the process that's happening
in the movement to like AI agents...
will also sort of happen in the the prompt space right which is rather than
kind of like, you know doing everything you're just sort of like monitoring the
system as it goes and keeping it together.
Shobhit Varshney: Yes, I think the prompts will get more and more
personalized to that particular person.
And over time, there will be a lot more context that will automatically pull in.
So the center of gravity is going to keep moving towards more hyper personalization
to show it as an individual.
Uh, so the way the prompt, when I say something to a model, the way it expands
it out and makes a meta prompt out of it, that'll be super hyper personalized
to the context, the memory of everything that I've done in the past, right?
Uh, like I, I feel like being a good prompter...
to these LLMs at work has made me a much better parent...
talking to my eight year old daughter.
Uh, she just
Tim Hwang: Explain it clearly think through it step by step, you know.
Shobhit Varshney: Yes, I have to talk to my daughter saying that Anya,
you are, uh, you just turned nine.
You are a big girl now, and then I walk her to a chair and start
reasoning and I get the answer I'm expecting her to say that no, I should
not have ice cream before I sleep
Tim Hwang: Got it, right?
Exactly.
That's the desired outcome.
Shobhit Varshney: Absolutely, and there's a lot, and that's a
two way feedback training, right?
And now we're at a point where, say it's, um, it's 8 p.
m.
at night, and if I say, Anya, her response is going to be,
"Papa, I'm almost done eating."
Because she understands that there's a pattern that when she's eating and
she's taking more, more time, I'm going to probably be checking in and seeing
if she's eating properly or not, right?
So she has a lot more context on how to respond to Shobhit itself, right?
But if my wife is us calling her, her name, her response is
going to be slightly different.
So I think the hyper personalization of these meta prompts, that's a direction
that we will be looking at going forward.
Tim Hwang: Yeah, for sure.
And I guess, Kate, maybe to turn it to you before we move to the next topic,
I think this exact point was one thing that I did want to bring up is, you
know, when we think about prompting with humans, we encode in language, right?
What's sort of interesting is that, you know, the prompting that we've
done is both to kind of, like, help us understand how we're interfacing with the
system, and then also direct the system.
I think...
I don't know if you buy this which is like, many of the optimizations
may use tokens that don't even look like, you know, normal grammar, right?
Like it could just be like a random string of numbers and letters that actually
get the best results out of the system.
And so I got some kind of curious like do you feel like prompts over
time will become like more and more kind of obscure to us, right?
Because it turns out like the optimal encoding for the language
model may actually not be something that's particularly human readable
or easily understandable at all.
And so there's almost this very interesting trade off of like
optimization and readability.
Just wanted to kind of get your thoughts on that.
Kate Soule: Yeah, well, I think to answer that question, it's important to recognize
that there's really kind of two different sides of innovation that are happening
on, uh, happening around this area.
So one is improving our ability to prompt the models, but the other is
improving the model's ability to take structured and more reasonable action.
prompt.
So, you know, instead of talking to a show of its eight year old daughter, like,
can I talk to a software developer that understands, you know, structured inputs
and can provide very structured responses?
So if we only innovated on the prompt optimization side, where we're trying
to create new tokens and, you know, keep the model frozen, then yes, I think we
could get to a point where we're starting to see a non human readable prompts.
But I think we're also seeing like with OpenAI structured outputs,
like more and more structure being baked into these models to make it
more standardized and systematic and how we work with these models.
And ultimately, you know, I think that's where the real value would
get unlocked and where a lot of, um, really exciting workflows could
develop, especially in agentic patterns.
If we can really start to focus more on having very structured, formulaic, maybe
not perfectly machine read, uh, human readable and that it's not, you know, it's
not like storytelling when I read what the model is happening, but a very formulaic
way to work with these models, I think is, is ultimately where we're going to end up.
Tim Hwang: Yeah, it'll be so funny because what you're describing is we're
reconverging towards like code, right?
Like structured language as a way of getting systems to
do what they want them to do.
Kate Soule: Yeah, we started structured, created a bunch of unstructured,
and now we're like, wait, that was actually, there was some good things
there that we should maybe bring back.
Tim Hwang: So I'm going to move us on to our next topic.
Um, uh, we spend a lot of time on Mixture of Experts talking about software,
we talk a lot about enterprise, but I think one of the most kind of, uh,
interesting things viral, if you will, AI moments of the last few weeks was the
launch of a humanoid robot called NEO from a company called 1X Technologies.
Um, and specifically they're working on the ideas to work on humanoid robots that
are designed to be at home assistance.
So, you know, this demo, basically if you've seen it and if you haven't, it's
worth kind of looking up on YouTube or whatever, um, is a humanoid robot
helping out around the home, right?
Cleaning dishes, helping to clean up and otherwise kind of assist on, on tasks.
Um, and.
You know, again, I kind of wanted to kind of ask the question, and I think
it's always an important question to ask in the world of AI, which is how
much of this is going to be a reality?
How much of this is like a really cool demo?
Um, and maybe most importantly, would you buy one for your own home?
But we can address that at a certain point.
Um, Kaoutar, I'm kind of curious about your thoughts, if you saw the
demo, what you thought about it.
And, uh, you know, if you, if you think something like this is really
going to be a reality, and I think in part, right, I think the question is
like, whether or not this is like, a real affordable thing from a hardware
standpoint, there's like a bunch of really kind of very practical, you know,
bits and atoms kind of questions here that I would love to get your take on.
Kaoutar El Maghraoui: I would love to have one actually in my home,
cleaning dishes and cooking, someone who spends like an hour, one of the
tasks I hate the most, of course, the demo was very impressive from 1X.
And I think 1X is among the, one of the most prominent companies in the
emerging field of humanoid robots.
But will humanoid robots become a reality or still a pipe dream?
So I think, you know, humanoid robots have been the focus of
science fiction for a long time.
And transitioning from dream to reality comes with significant challenges.
So the argument for humanoid robots is that they can fit into environments
designed for humans, use existing tools and interact more naturally with people.
However, I think there are still several challenges that need to be fixed.
You know, first, I think there is the mobility aspect, building a robot
with these human level dexterity or mobility has proven very difficult.
Uh, while there are some progress, I think there is still
a lot that needs to be done.
Technologies like soft robotics and advanced actuators are making strides
here but are far from a robot that can perform all human tasks autonomously.
The other, you know, challenge is the energy efficiency.
I think these robots require significant power to function and which limits,
you know, their practical use.
NEO, for example, and other similar projects are working to make these
robots more energy efficient, but the issues around battery life, energy
consumption, there's still bottlenecks.
Uh, the other, the other thing is the cognitive and social interactions
here beyond just the physical tasks.
You know, these robots must navigate.
Um, the complexities of all the human life, interactions, perceptions,
and developing an AI capable, uh, robot of, that's, that is capable of
interpreting these social cues, responding appropriately, making decisions in real
time, is still an ongoing research area.
There, there is still a lot of work around AI and reasoning.
So I think it's going to take time for us to get there.
And another challenge, I think, is the economics of this.
Building something that is affordable and, uh, and versatile,
reliable, it's still a major hurdle.
And for many industrial and service applications, simpler robots or
specialized machines are more efficient and cost effective than having this
general, uh, purpose humanoid robot.
So the complexity and the costs of these humanoid robots, I think, uh, especially
in their design, still limits, uh, the adoption to especially niche markets.
So I think there are, you know, challenges, you know, what's the
reality versus the long term vision?
You know, at present, uh, it is a transitional phase.
Existing prototypes, they are far from ubiquitous, but there are really nice
demos and it shows a lot of promise.
But I think we are still not there in terms of the mass market tools
and, and adoption and, um, there, there, but in it's not just, you
know, a technological pipe dream.
I think, um, it's gonna happen.
That's my, my thinking.
But it's, you know, for the full realization, it's gonna take years,
if not maybe decades away before they really become a reality.
Tim Hwang: Yeah, that functionality gap is very interesting to think about, like,
I love the idea that for a period of time people are purchasing these, but it
turns out there's like not a whole lot you can do around the home with them, so
that they end up just like being like all the lonely Pelotons you see in people's
houses, or it's like this really expensive piece of hardware that just kind of sits
around, but it's, it's just funny because it's like a humanoid guy, basically, um.
I guess, uh, I don't know, Kate, Shobhit, if you've got a kind of view on this,
if you're a little bit more skeptical or if you kind of agree that like,
yeah, maybe, I don't know, Kaoutar, you didn't put a date on it, but like
in our lifetime, you know, we'll see this become like a practical reality.
Shobhit Varshney: Yeah.
So, um, I'm a big geek and I will, I will go and buy stuff that I
think is, is, is awesome, right.
So I'm
Tim Hwang: You're going to have the Peloton, uh, NEO robot in your house.
Shobhit Varshney: So, um, I feel that the same argument about one massive model.
That's just absolutely stunning, can do everything like a GPT 4.0
model or cloud models, right?
Versus the argument that a smaller set of models and a niche for specific
use cases, a lot more efficient and a target for a particular use case, right?
I'm on the camp of, I would rather have a device.
That is helping me for a particular task and it's incredibly
doing a good job at that task.
As an example, uh, I use the Roborock S8 MaxV Ultra, whatever the highest end
of their robot that does vacuuming and mopping and goes back and cleans itself
up and dries itself up, comes back again and finishes off that last little bit
of scrubbing that it missed somewhere.
More specialized tools helping us augment what humans aren't good at...
I think that's the future direction in the short run.
It'll take a while for us to get to something that solves for
all the constraints that we just discussed before you get to a point
where a humanoid replica of you can actually start doing things.
So I think in the short next five years, specialized tools that do
a particular task incredibly well, are cost optimized, it's repetitive,
they nail that particular use case.
I'm more, I'm more in that camp.
Kate, do you think the same?
Kate Soule: I completely agree.
If you think about how model specialization has progressed,
uh, you know, we see the same exact trends as you pulled out.
So, I'm 100 percent in the same camp.
It also reminds me of, you know, the common, you know, story that you hear
of where if you asked someone back in the horse and buggy, days what they
wanted, and they always said they wanted a faster horse, and then, you know, Ford
came along and released the first cars.
And I think we're in a bit of that scenario right now where it's like,
I just want more human time to do the things that I don't want to do as a human.
So create some humanoid robot, but really like, can we rethink of like what the
right way this is to make, um, humans more superpowered, not just create more humans
that we don't have to worry about feeding them or other potential, uh, labor issues.
Shobhit Varshney: Okay, that sounds more like, say, uh, how we solve
the dishwasher paradigm, right?
Kate Soule: Yeah.
Shobhit Varshney: We figured out that there's an optimal way of
washing dishes and it does an incredibly good job at a very low
price point and it nails it, right?
So we have changed the way human workflow used to work, right?
Earlier, as a human, I would take a dish, rinse it, and keep it somewhere else.
We did not try to optimize that particular workflow.
We said there's a better way of solving this particular niche use case.
It's very custom optimized, and we'll nail it.
So, I'm on that camp with you that I think we'll get to a point where smaller
machines that do a particular task really well will I don't want, like,
for example, in our pool, we have, uh, we have a skimmer that just skims
and removes all the dirt from the top.
Now, a human will take a net and try to clean up each one of them one by one.
That's not the optimal way of solving for that problem.
So I'm with you that the workflow, the human workflow has got to change.
And then we optimize.
By the time we get to a point where we get a humanoid that can then
solve for all the problems that we discussed around cost and flexibility,
dexterity, and things of that nature.
Tim Hwang: Yeah, and I think you, for what it's worth, I think also
just, like, you can't discount, like, the creep factor, right?
Like, I do feel like it's, like, a little bit, it's a little bit
spooky to have, like, a, you know, a large human in my house.
Um, and, uh, and I do think that will be part of the adoption, almost,
like, leans in favor of these more specialized applications, because
they kind of don't raise that fear.
Uh, I don't know.
We'll have to see in practice whether or not X1 is able to pull this off, or
1X, sorry, is able to pull this off.
Kaoutar El Maghraoui: Yeah, I think it's interesting development, you
know, and it's all it's all comes to to what people are also able to consume
and the capabilities, of course, specialization versus generalization
is always going to be a concern.
But of course, if we can combine both, that, that would be great.
Um, so it's like what these LLMs are doing, but we still need special models.
But, you know, the evolution of LLMs is still important.
Having these large models that can do a variety of things, but then
specializing them for certain tasks.
Can we have the same argument for these, uh, humanized robots that, you know, can
do a variety of tasks, but maybe you can press a button and tell it, now I want
you just to be focused on cleaning the dishwasher or the pool, or so something
that's maybe take a subset of that model that is specialized within that humanized.
I think that would be cool to have.
Tim Hwang: Yeah, I mean, ultimately, you're going to have like, you know,
the humanoid robots going to be the one that does the maintenance
for all the other smaller robots.
It's just going to be robots all the way down.
Kaoutar El Maghraoui: It's like a hierarchy over here.
Tim Hwang: Yeah, exactly.
Shobhit Varshney: I think what Kaoutar, I mean, this Kaoutar, just
the way you framed, I think you're looking at a Transformer robot....
Kaoutar El Maghraoui: Exactly.
Something...
Shobhit Varshney: ...a vacuum cleaner so it can do that
one job really, really well.
That'll be the world we live in.
Kaoutar El Maghraoui: That would be cool.
Yeah.
Tim Hwang: Um, so I'm gonna move us onto our next topic.
Um, so there's a fascinating paper that was shared by a friend of the pod Kush
Varshney, who, uh, if you're a listener, has been a recurring guest on this show.
Um, and what I love about some of these papers in machine learning
is that they like pick the most dramatic name for their paper.
And so the name of the paper is The AI Scientist.
It has a long title about kind of towards, you know, effectively like
using AI to automate end to end science.
Um, and it's a proposed system that tries to really see and kind of
push the limits of whether or not large language models can really
help out with scientific discovery in a fully kind of automated way.
And this is a big deal.
I mean, you know, you think about how you know, societal progress happens,
right, like these technological breakthroughs are really critical.
And so, you know, one way of thinking about it is that we've got this kind
of bottleneck for the researchers, the brilliant minds that we have.
And so, you know, the hope is basically, can we augment that process?
Can we accelerate that process with AI has been kind of a real focus.
Um, you know, what I always worry about these papers is that the
results look almost too good and like the ambition is too great.
Um, but I mean, Kaoutar, I know you looked at this paper in some detail.
I'm curious if you're coming away with this feeling like, yeah, they really
kind of hit upon something here that, that really could be the kernel of
something, um, new, or if you feel like, you know, you know, ultimately
like the way AI fits in science is going to look a little bit different
from the way they're proposing here.
Kaoutar El Maghraoui: Yeah, I, I enjoyed reading the paper.
I think it's really, um, Uh, put forward a very nice way of, you know,
kind of thinking of this automated AI scientist, which made me also worry,
you know, what's going to happen to the scientists in the future?
Um, so, so it presents, you know, this very nice framework where large
language models generate research ideas, write code, run experiments,
visualize results, and even write papers.
So, and they also showed some very interesting papers that were, uh, you
know, generated by this AI scientist.
Uh, one thing that...
Tim Hwang: Yeah, I just needed to do the, uh, the paper session at the conference,
uh, the poster session at the conference.
Kaoutar El Maghraoui: Makes you even worry, you know, what's going to happen
to the conferences in the future and some of the papers, are they really
generated by real scientists, or this is all, you know, LLM generated?
Um, so these advancements could significantly impact scientific
discovery, reducing the cost and also increasing the speed of research.
So there, there could be some benefits to this, especially if you look at it as
an augmentation for human, uh, research.
The, the thing is the controversy surrounding this paper is largely, you
know, coming from the methodological concerns that they have using.
And especially when you look at, uh, you know, the, the reliance
on automated review systems.
to evaluate the scientific quality.
And that kind of raised some concerns to me.
Uh, you know, the questions here, whether, you know, such
reviews can truly assess novelty, creativity, and rigor of the work.
Uh, and also I think one thing that's skeptical is whether the AI could
really fully replace a human intuition in scientific discovery, especially
when you're dealing with more abstract, uh, or interdisciplinary fields.
So this, I think AI is still not there yet.
when you're really looking across multiple fields and, you know, kind
of mimicking that human intuition.
And I think another thing, uh, is also the broader ethical and social implications
for automating scientific research.
So there are a lot of concerns here, but I think from a scientific perspective,
it's a very nice piece of work.
Um, And but has a lot of implications, of course, ethical and and also the automated
review the process that they have.
So...
Tim Hwang: That's right.
Yeah, I'm curious.
Kate, I mean, as a researcher yourself, how do you how do you feel about all this?
You know, I feel like it's like we're very interesting, for example, seeing
like engineers be like, well, They're never going to learn to code as good
as I am, so I know there's kind of like a tendency to kind of push back
on it, but I'm curious about how you think about these types of experiments.
Are they like fun toys?
Like, would you use these?
Like, would you read the papers produced by these AIs?
Kate Soule: Yeah, well, I'm honored you call me a researcher, but I
certainly work with a lot of amazing researchers here at IBM Research,
even if I'm not one directly.
But, you know, I, I actually question whether, as a non-researcher, this might
be a naive opinion, whether there isn't something that, uh, LLMs can do well in
terms of understanding what's been done in the past with related literature on a much
broader scale than what's humanly possible to go through and analyze and read and
try and find similar methods or approaches to apply to a new problem that's related.
Um, I don't know if Kaoutar you have any, any thoughts on that,
if that's a, maybe a jump too far.
Kaoutar El Maghraoui: No, I think I agree.
You have a point there.
So there might be stuff that they're discovering that scientists are not
able to discover because they're pulling from a wide variety of sources.
Uh, but I think we still need human in the loop here to validate, verify,
uh, you know, these experiments and then take them to the real world
and try them and see the results.
So we cannot just take the results out from, you know, these LLMs and then
just apply them directly to, so there, I think there still needs to be some
verification as probably these systems will get better and better as, you know,
we use them more for scientific discovery.
Tim Hwang: Yeah, I think one of the interesting things here is that, uh,
you know, some of the people I know who research this space think a little
bit about like the burden of knowledge, which is like, there's just like more and
more knowledge and more and more papers.
And, you know, part of the hope with some of these systems is
simply that, like, there's a lot of findings that could exist purely in.
Like finding connections between papers that just people are not
making the connection between.
And so that ends up kind of reducing it more to like a search problem, right?
I think what's kind of interesting here is the idea that like, then
you want them to run the experiment.
Then you want the AI to do the empirical stuff.
You know, I think there's a question about how far kind of beyond just the
question of search you need to go.
Shobhit Varshney: Yes.
I think just like any workflow from an enterprise perspective, we help a lot
of, uh, people clients with their R&D research and things of that nature, right?
Coming up with a new formulation for a, for a new food item or a perfume or like
product research for the next, uh, car, you know, so on and so forth, right?
Battery research, whatnot.
So across all of them, just like any other workflow in an organization,
you figure out that here are all the steps that are needed.
When you are hiring somebody brilliant from MIT to come join your team as an
intern, you're giving them a specific task to augment what a senior researcher in the
field for a decade has been doing, right?
So, Kaoutar, you will plan out saying that, hey, here's a task
that I'm going to go give you, go research this particular topic.
I think we'll start to incrementally see more and more AI helping out on specific
tasks in the research spectrum end to end.
I don't think, just like any other workflow, I don't think it'll
completely be taken over by AI.
I think it's.
augmenting intelligence rather than being replacing.
So I think that the good tandem between humans and and AI will also start getting
better at what to request for help.
So for example, you want to just mention a knowledge graph across a whole
bunch of different research papers to figure out if somebody overseas in a
different country had some novel idea that you just didn't think about, right?
So I think we'll get to a point where this research, what I'm really
interested in, is a conference that we get to where each one of us would
have our representatives as AI going and talking to each other, right?
Just imagine if you have a collaboration between a team of researchers
with their AI counterparts in, uh, in Israel, talking to the same,
like, their counterparts in the U.
S.
and they're exchanging ideas and you come up with a new theorem and
say, hey, I think we came up with this new idea that we should do X.
I'm just looking forward to a world where we start using the word "we" when AI is
actually starting to do something for us.
Tim Hwang: Well, and like one of the big dramas in academia, of course,
is like, who's the first author?
Like I wonder if in the future it'll be like, you'll get into
this big struggle with some LLM collaborator that you have is trying
to take all the credit from you now.
And, you know, we'll have that drama play out, but it would just be funny because
it'll be, you know, humans and AIs.
Kaoutar El Maghraoui: So I think it'll be competition between models
who's writing the best paper and who's, uh, AI conference completely
generated by AI and reviewed by AI.
Tim Hwang: That's right.
Yeah, exactly.
Angry that you're unjustly turned down for your paper.
Reviewer number two, you know.
Shobhit Varshney: I would say that there are certain things that
we don't think about quite yet in the whole research spectrum.
When you, we are so focused on doing our, our actual novel research,
when it comes to say peer reviews.
I'll give you an example of what we're doing with some of our utility companies.
Utilities, when they have to go file for increasing the price of their
electricity in a particular state, they have to go file for a case.
And they have to make a case and say, here's why I think I should
increase it by X cents, right?
Five cents.
We're helping these utilities create that whole submission package.
So we're looking at everything that they have submitted all competition.
It's all openly available online.
So you research and help create the first package itself.
Then once you know who's going to be on the panel, who's going to be
assessing it, we can then go look at every question that they've ever asked.
So in this case, in a peer review, we know when Shobhit gets to be the
reviewer, I typically ask more about ethical concerns about a particular
paper and so on and so forth, right?
Each one of us has a pattern on how we ask questions, right?
So now we reverse engineer what the judges would ask on the panel and
then we change the documentation so that the submission itself is
going to address those proactively.
Then when you actually go and have to present your case in person,
that's an interview that's happening.
So then we are preparing the witness based on the kind of questions that
the person has asked everywhere else and what's the right.
chain of thought to go on to that.
So I think there are aspects of research that researchers don't
want to do that I think AI will be really helpful in augmenting.
Do you think that'll be helpful, Kaoutar?
Kaoutar El Maghraoui: I think so, definitely.
Yeah, of course, as humans we're limited and if we're augmented by, you know, AI,
we're, we're going to be superhumans and, uh, hopefully in the right direction.
So...
Kate Soule: Well, and I think it gets back to what we were just talking about, right?
Like, are we going to have AI, like literally try and become
its own researcher and just replicate what a human can do?
Or are we going to have AI specialize in parts of the process and run that
process faster and better and support humans and new, more efficient workflows?
It's just, you know, now without the robots focused on scientific method.
Tim Hwang: The news story of the week was that it was finally kind of
rumored a new story kind of came out that OpenAI is going to be investing
in trying to produce its own in house chips to support its work.
And part of this is it's.
You know, integration and collaboration with Apple, but more generally,
you know, there's been something they've been rumored about for some
time that now looks like it's now more in the realm of certainty that
they really are kind of investing this in a really, really big way.
Um, you know, Kaoutar, you're the most natural person to ask about this, but
like, why would OpenAI want to do this?
Like semiconductors are like wildly expensive, very hard to pull off.
You know, my understanding is basically like, you know, China, the whole country
has been trying to like reproduce the Taiwanese semiconductor industry.
And like, is only moderately successful at it.
Like, why should, why is OpenAI kind of making such a big bet on hardware?
Kaoutar El Maghraoui: I think, um, the CEO of OpenAI, Sam Altman, has
made the acquisition of more AI chips a top priority of his company.
And he publicly even said, he complained actually about the
scarcity of, uh, of these AI chips.
So given, I think, all the rising costs, uh, chip costs, the supply
chain challenges, and the need for specialized hardware, uh,
especially specialized hardware that's optimized for OpenAI models.
It seems to me that this is a strategic move.
So designing their own chips could enable OpenAI to tailor hardware for their
specific workloads, improving performance, efficiency, and scaling potential.
However, of course, there are challenges here and financial challenges given
the complexity, especially of the semiconductor design and manufacturing.
Um, so by creating this in house chips, OpenAI can reduce its reliance on
third party manufacturers like NVIDIA, which control a significant portion
of the AI hardware market, almost 80%.
So it's going to give them more control over the supply chains and allow them to
specialize and optimize for their unique workloads, potentially improving their
efficiency, performance, and scalability.
While semiconductor development is a challenging and costly endeavor, I
think this move could enable OpenAI to differentiate its hardware and
scale, its operations effectively.
I think they've thought a lot about this, but I think it's
a strategic move for them.
But also to diversify.
Tim Hwang: Totally.
I mean, as wild as what you're saying is basically like, you know, what's
cheaper than trying to get H 100s?
It's like literally building your own semiconductor supply chain,
which is a really crazy thing to say.
Um, I guess, uh, I don't know, Kate, Shobhit, but if you've got
kind of thoughts on this, I mean, one big question is like, do we
think it's going to be successful?
Like I can almost see the argument for it, but man, if it isn't a
high risk sort of thing, right?
Kate Soule: I mean, certainly high risk.
I really want to emphasize one point that Kaoutar brought up, which is
there's tremendous opportunities.
We look at kind of this next generation of AI and what's going to come
next on AI and hardware co design.
So making sure that we're developing these models and the hardware
that runs them in tandem to really unlock kind of new performance
levels, new efficiencies and cost.
Um, there's, there's tremendous opportunity there.
So, you know, I think it makes sense.
It makes a lot of sense to start to put some skin in the game, so to speak,
um, given that, you know, there's just a ton of ways that they could
continue to innovate, um, once they have better control over hardware design.
Tim Hwang: Yeah, for sure.
And Shobhit, I guess maybe you're kind of ideal to wrap up this section and
close this out for the episode is, you know, you think a little bit about how,
what this all means for business, right?
What this all means for enterprise.
Like, can you paint a picture a little bit more, right?
Because I think the semiconductor stuff is often very abstract.
But as Kate is saying, there's some very practical implications to, you
know, our experience of these kinds of technologies in the systems.
But like, I'm kind of curious, like, what does the everyday look like if OpenAI
is really successful here, you think?
Shobhit Varshney: NVIDIA is a great partner with us.
We do a lot of work, uh, we have joint clients and whatnot, right?
So we do a significant amount of work.
Yesterday, I spent the entire day with NVIDIA.
We're doing a lot of work around where, where they can go and
work with enterprises beyond the hyperscalers themselves.
So they got into quite a bit of detail, uh, behind the covers,
explaining us the intellectual property they've built, the differentiation.
They have a significant moat today.
Not just on the chip level but the way you do the architect
the entire end to end flow.
The total cost of ownership-- you're going down from a massive
data center down to one box.
Just the wiring in the existing data centers is more expensive
than that one box from NVIDIA.
So the total cost of ownership and Jensen made this uh this famous statement
saying even if they're competitors who are the customers as well, even
if they made free chips, the total cost would still be lower on NVIDIA.
So they've done an incredibly good job on driving higher
efficiencies, more throughput, 5x, 10x on the same kind of footprint.
So I think it'll take a while for a company like OpenAI to
do everything that's around it.
It'll take them a while, just like when Tesla came to market, it took them a
while to figure out how to actually productionalize this end to end.
Creating a car, the actual, the core of it, that piece was great.
The researchers could solve for that.
But the whole manufacturing and the supply chain and the total cost, how
do you get a car to actually be a $30,000 car that people want to buy?
It'll take a while for OpenAI to get there.
And I think there's that, in my view, is going to distract them a little bit...
from their core business.
They should, in my view, should be focusing more on how do we get to
adding more intelligence, what Ilya just did with SSI, raising a billion
dollars, uh, what Claude, uh, models are doing with more responsible AI and
stuff, I think there's still a lot more focus that's needed on solving that
side of the problem for enterprises.
The cost will come down over time, just the way the economics work,
the cost of computing on NVIDIA has implemented in the last decade.
So I think that the focus of OpenAI should still be problems that need
to resolve before they start to go vertically integrating end to end.
Tim Hwang: Yeah, it'll be fascinating to see.
And as I said, I think this will not be the last time
that we talk about this issue.
So, I'm not overly sad that we ran out of time today about it, but
we will pick it up in the future.
Um, uh, So that's what we have time for today.
So Shobhit, Kate, Kaoutar, thanks for joining us on the show.
Um, and for all you listeners out there, if you enjoyed what you heard,
uh, as always, you can get mixture of experts on Apple Podcasts, Spotify,
and podcast platforms everywhere.
And we'll see you next week.