Learning Library

← Back to Library

Prompt Engineering: Here to Stay

Key Points

  • Prompt engineering is considered a lasting discipline, even as tools emerge to automate prompt creation.
  • The panelists disagree on the future of prompt engineers: some say the role will disappear, others say it will evolve into something different.
  • Major AI firms (Anthropic, Cohere, Google) are releasing or acquiring technologies that generate or tune prompts automatically, aiming to remove the human from the loop.
  • Guests discuss broader AI impacts, from robots handling household tasks to potential job displacement for scientists and the hope for AI to become a collaborative “we.”
  • The episode is part of the “Mixture of Experts” series, featuring a recurring lineup of engineers, researchers, and product leaders who dissect weekly AI news.

Sections

Full Transcript

# Prompt Engineering: Here to Stay **Source:** [https://www.youtube.com/watch?v=iSJenVM7KnQ](https://www.youtube.com/watch?v=iSJenVM7KnQ) **Duration:** 00:38:04 ## Summary - Prompt engineering is considered a lasting discipline, even as tools emerge to automate prompt creation. - The panelists disagree on the future of prompt engineers: some say the role will disappear, others say it will evolve into something different. - Major AI firms (Anthropic, Cohere, Google) are releasing or acquiring technologies that generate or tune prompts automatically, aiming to remove the human from the loop. - Guests discuss broader AI impacts, from robots handling household tasks to potential job displacement for scientists and the hope for AI to become a collaborative “we.” - The episode is part of the “Mixture of Experts” series, featuring a recurring lineup of engineers, researchers, and product leaders who dissect weekly AI news. ## Sections - [00:00:00](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=0s) **Future of Prompt Engineering** - A panel of AI experts debates the longevity of prompt engineering, the expanding role of AI in everyday tasks, and its potential impact on future jobs. - [00:03:04](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=184s) **Evolving Role of Prompt Engineering** - The speakers critique the tediousness of manual prompt crafting, advocate for automated methods that explore broader natural‑language spaces to boost accuracy and productivity, and predict a shift in prompt engineering from low‑level token tweaking toward higher‑level model interaction. - [00:06:13](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=373s) **Prompt Engineering Evolves to Supervisory Role** - The speakers argue that prompt engineers will transition from hands‑on tasks to overseeing automated systems, requiring broader skills like model training, data curation, pipeline integration, and hyper‑personalized context handling. - [00:09:20](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=560s) **Future of Obscure Prompt Encoding** - The speakers debate whether prompts will grow increasingly unintelligible as models are optimized, highlighting a trade‑off between efficiency and readability while noting that advances in structured outputs aim to keep human‑model interaction understandable. - [00:12:32](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=752s) **Humanoid Robots: Promise and Hurdles** - The speaker praises a recent demo and argues that while humanoid robots could integrate into human environments, obstacles like human‑level mobility and energy efficiency still keep them out of everyday use. - [00:15:40](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=940s) **Home Robot Adoption Timeline Debate** - Participants discuss when advanced home robots will become practical, highlighting the current functionality gap, potential decades‑long timeline, and the trade‑off between massive generalist AI models versus smaller, task‑specific solutions. - [00:18:44](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1124s) **Specialized Machines vs Human Workflow** - The speakers argue that purpose-built devices—like an optimized dishwasher and pool skimmer—can outperform traditional human routines, anticipate future humanoid robots that unify cost, flexibility, and dexterity, but warn that perceived creepiness may hinder widespread adoption. - [00:21:53](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1313s) **AI-Driven Automated Scientific Discovery Debate** - The speakers examine a paper proposing fully automated AI scientists that could accelerate breakthroughs, while expressing skepticism about the claims' realism and pondering the future role of human researchers. - [00:24:59](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1499s) **Ethics and Potential of LLM Literature Review** - Researchers debate the ethical implications of automated AI reviews while highlighting how large language models could vastly outperform humans in scanning and synthesizing existing scientific literature. - [00:28:03](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1683s) **AI-Augmented Collaborative Research Future** - The speaker envisions AI as a supportive partner that augments human researchers, handling tasks like knowledge‑graph synthesis and even acting as representatives in international collaborations, while noting authorship dilemmas. - [00:31:06](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=1866s) **AI Augmentation and In‑House Chip Strategy** - The speakers explore whether AI will become a fully autonomous researcher or serve as a specialized tool to boost human workflows, and discuss OpenAI’s rumored push to develop its own expensive semiconductor chips in partnership with Apple to support this vision. - [00:34:10](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=2050s) **Building a Proprietary Semiconductor Future** - The panel debates the high‑risk prospect of creating an in‑house semiconductor supply chain to enable AI‑hardware co‑design and its potential benefits for enterprises. - [00:37:24](https://www.youtube.com/watch?v=iSJenVM7KnQ&t=2244s) **Future of Compute Costs & OpenAI Strategy** - The panel notes that declining GPU expenses will lower AI costs over time, advises OpenAI to focus on solving key problems before pursuing full vertical integration, and promises to revisit the discussion in future episodes. ## Full Transcript
0:00Tim Hwang: My opinion is that prompt engineering is never going to die. 0:03It's a forever thing. 0:04Kate Soule: Anyone who's worked with large language models has experienced some of 0:07the pain, dark art, black magic of... 0:11if I shout loudly enough at my model, maybe like literally if I 0:15type in all caps, maybe this time it will do what I'm asking it to do. 0:19Tim Hwang: The creepy factor is big, but these robots are also pretty 0:22cool if you can get them to work. 0:24Kaoutar El Maghraoui: I would love to have one actually in my 0:25home, cleaning dishes and cooking. 0:29Tim Hwang: How many scientists are going to be out of a job 0:31in the next 10 to 15 years. 0:33Shobhit Varshney: I'm just looking forward to a world where we start using the 0:35word "we" when AI is actually starting to do something meaningful for us. 0:40Tim Hwang: All that and more on today's episode of Mixture of Experts. 0:49I'm Tim Huang and I'm joined today as I am every Friday by a world class 0:53panel of engineers, researchers, product leaders, and more to 0:56hash out the week's news in AI. 0:59On the panel today, Kate Soule is a Program Director of Generative AI 1:02Research, Shobit Varshney, a Senior Partner Consulting on AI for U. 1:06S., 1:06Canada, and Latin America, and Kaoutar El Maghraoui, Principal Research Scientist, 1:11AI Engineering and AI Hardware Center. 1:18So as always on Mixture of Experts, we're going to start with a round the horn 1:21question, and that question is, will prompt engineers even exist in five years? 1:26Kate, yes or no? 1:27Kate Soule: No, 1:28Tim Hwang: Shobhit, yes or no? 1:30Shobhit Varshney: Not at all, man. 1:31Tim Hwang: Uh, okay, alright, and how about you, Koutar? 1:33Kaoutar El Maghraoui: I think it's gonna evolve to a different role. 1:36Tim Hwang: Okay, alright, well let's get right into it. 1:38The prompt for this first story that we want to cover today is that we've just 1:41had kind of a slew of sort of subplot, you know, sub B kind of announcements 1:47coming out from all the companies. 1:49They haven't been the most kind of prominent things they've been 1:51announcing, but it has really kind of created a little bit of a pattern. 1:55I think, Kate, you flagged this for us. 1:57Which is that a lot of the companies have all been working 2:00on prompt automation, right? 2:02So Anthropic announced a Meta Prompt system that helps 2:05generate prompts for you. 2:06Cohere is launching a prompt tuning feature, which takes a prompt that you 2:10have and improves it automatically. 2:11And then Google recently acquired a company called Prompt Poet, which is 2:15very much in the same functionality. 2:18Um, and so this is a big deal, right? 2:20If you're familiar with LLMs in the past, right, a lot of the work 2:23has gone into making a good prompt. 2:25Um, and, uh, I think the big thing about this is the future of basically taking 2:30the human out of the loop, the idea that you won't need prompting anymore. 2:33Um, and I guess, Kate, as someone who kind of threw this topic to us, do you 2:36want to just explain for our listeners, like why, why is that important? 2:39Right, like why, what changes when that happens? 2:41Kate Soule: Yeah. 2:42Uh, and I, I like what you did there, Tim, the prompt for today. 2:45Uh, so look, I think anyone who's worked with large language models 2:50has experienced some of the... 2:52pain, dark art, black magic of... 2:55if I shout loudly enough at my model, maybe... 2:58like literally if I type in all caps, maybe this time it will do 3:02what I'm asking it to do, right? 3:04Uh, which can be a really frustrating process and doesn't make like, logical 3:09sense, like I think we're all rational beings and ideally there would be 3:13a really rational and structured way to try and prompt these models. 3:17So I'm really excited to see a lot of work come out, which is trying to... 3:23not take a human entirely out of the loop, but take a human out of 3:26the loop of finding these phrases and tokens and words and patterns... 3:30that seemed to be more effective for one given model, uh, to 3:34perform a task that's in question. 3:37So, you know, being able to, for example, search a broader space of natural 3:41language and try and identify, okay, if I frame my question this way, um, now I 3:46can get an improved level of accuracy. 3:48I think that is going to be really powerful, um, overall just to improve 3:51productivity and, and reduce some of the stress when working with models. 3:55Tim Hwang: Yeah, for sure. 3:56And now Kaoutar, you said actually in your response Is you, you, you agreed 3:59with everybody that kind of, well, maybe prompt engineering is kind of not long 4:02for this world, but you, you did say that you feel like the role will shift. 4:06Um, do you want to tell us a little bit more about what you're thinking there? 4:08Kaoutar El Maghraoui: Yeah, sure. 4:09So there has been a lot of recent developments in prompt engineering 4:12that is leading to significant changes, particularly in how prompt 4:16engineers interact with large language models like Kate mentioned. 4:20Things, for example, the meta prompt prompting from 4:23anthropic, uh, meta prompting. 4:25And the development here, it shifts the focus of the prompt engineers 4:28from crafting these individual prompts to designing systems that guide 4:32the AI to adjust its own behavior. 4:35So prompt engineers may increasingly hear focus on creating 4:37frameworks for meta prompting... 4:40or refining the logic that underpins it. 4:43And this creates a more robust Uh, role where engineers manage 4:46how prompts evolve in real time. 4:48And if you look, for example, at what, uh, prompt tuning from coheres, 4:52for example, the prompt tuner. 4:54So here, the, the prompt tuner from Cohere enables user to fine tune 4:57and optimize prompts specifically for different applications. 5:01And, you know, here, the implications prompt engineers may transition from 5:05manually crafting prompts to overseeing or curating automated tuning systems. 5:11So this kind of democratizes the prompt creation, and this could reduce the some 5:17of these technical barriers to entry, pushing prompt engineers to focus more 5:21on more complex or high impact tasks... 5:24where deep expertise is still required, such as, you know, 5:27designing industry specific models or optimizations of scales. 5:31So, and there is also other, like, also if you look at the Prompt points, 5:36uh, Poet acquisition by Google. 5:38So here, you know, this acquisition emphasizes automation 5:42here in the generation and the optimization of prompts. 5:45And the implication here, this kind of further blurs the line between 5:49AI systems and prompt engineers. 5:51So, AI systems here, like, Prompt Poet evolve as they evolve, the role of the 5:56engineer here may shift from towards more supervising role, uh, so where 6:01you're supervising these AI systems that continuously optimize themselves. 6:06So human prompt engineers might focus more on edge cases or creative tasks 6:11or model specific customizations. 6:13So I think the implications overall here is kind of shifting from manual 6:19to kind of a supervisory role. 6:21I don't like to say that, you know, we're going to completely remove human out of 6:24the loop here, but more increased focus on optimizations, expansions of the 6:30skill sets here for the prompt engineers. 6:32They will need a broader set of skills, including model training, 6:36data set curation, the integration of the LLMs into broader AI pipelines 6:41and also some niche specializations. 6:43So I think to sum up is kind of the prompt engineering is likely likely 6:48evolving from hands on manual role into a more, you know, supervisory 6:53role where engineers focus on higher level design optimization and 6:58supervision of these automated systems. 7:01Tim Hwang: Yeah, that makes a lot of sense. 7:02And it's sort of interesting that kind of like the process that's happening 7:04in the movement to like AI agents... 7:06will also sort of happen in the the prompt space right which is rather than 7:10kind of like, you know doing everything you're just sort of like monitoring the 7:13system as it goes and keeping it together. 7:15Shobhit Varshney: Yes, I think the prompts will get more and more 7:19personalized to that particular person. 7:21And over time, there will be a lot more context that will automatically pull in. 7:24So the center of gravity is going to keep moving towards more hyper personalization 7:28to show it as an individual. 7:30Uh, so the way the prompt, when I say something to a model, the way it expands 7:35it out and makes a meta prompt out of it, that'll be super hyper personalized 7:38to the context, the memory of everything that I've done in the past, right? 7:42Uh, like I, I feel like being a good prompter... 7:46to these LLMs at work has made me a much better parent... 7:50talking to my eight year old daughter. 7:52Uh, she just 7:54Tim Hwang: Explain it clearly think through it step by step, you know. 7:56Shobhit Varshney: Yes, I have to talk to my daughter saying that Anya, 7:59you are, uh, you just turned nine. 8:02You are a big girl now, and then I walk her to a chair and start 8:05reasoning and I get the answer I'm expecting her to say that no, I should 8:08not have ice cream before I sleep 8:09Tim Hwang: Got it, right? 8:10Exactly. 8:11That's the desired outcome. 8:12Shobhit Varshney: Absolutely, and there's a lot, and that's a 8:14two way feedback training, right? 8:16And now we're at a point where, say it's, um, it's 8 p. 8:19m. 8:19at night, and if I say, Anya, her response is going to be, 8:24"Papa, I'm almost done eating." 8:26Because she understands that there's a pattern that when she's eating and 8:30she's taking more, more time, I'm going to probably be checking in and seeing 8:32if she's eating properly or not, right? 8:34So she has a lot more context on how to respond to Shobhit itself, right? 8:38But if my wife is us calling her, her name, her response is 8:40going to be slightly different. 8:42So I think the hyper personalization of these meta prompts, that's a direction 8:46that we will be looking at going forward. 8:48Tim Hwang: Yeah, for sure. 8:49And I guess, Kate, maybe to turn it to you before we move to the next topic, 8:51I think this exact point was one thing that I did want to bring up is, you 8:55know, when we think about prompting with humans, we encode in language, right? 8:59What's sort of interesting is that, you know, the prompting that we've 9:02done is both to kind of, like, help us understand how we're interfacing with the 9:06system, and then also direct the system. 9:08I think... 9:09I don't know if you buy this which is like, many of the optimizations 9:12may use tokens that don't even look like, you know, normal grammar, right? 9:15Like it could just be like a random string of numbers and letters that actually 9:19get the best results out of the system. 9:21And so I got some kind of curious like do you feel like prompts over 9:23time will become like more and more kind of obscure to us, right? 9:27Because it turns out like the optimal encoding for the language 9:29model may actually not be something that's particularly human readable 9:32or easily understandable at all. 9:33And so there's almost this very interesting trade off of like 9:36optimization and readability. 9:38Just wanted to kind of get your thoughts on that. 9:39Kate Soule: Yeah, well, I think to answer that question, it's important to recognize 9:42that there's really kind of two different sides of innovation that are happening 9:47on, uh, happening around this area. 9:49So one is improving our ability to prompt the models, but the other is 9:53improving the model's ability to take structured and more reasonable action. 9:57prompt. 9:57So, you know, instead of talking to a show of its eight year old daughter, like, 10:01can I talk to a software developer that understands, you know, structured inputs 10:05and can provide very structured responses? 10:07So if we only innovated on the prompt optimization side, where we're trying 10:13to create new tokens and, you know, keep the model frozen, then yes, I think we 10:18could get to a point where we're starting to see a non human readable prompts. 10:22But I think we're also seeing like with OpenAI structured outputs, 10:26like more and more structure being baked into these models to make it 10:30more standardized and systematic and how we work with these models. 10:34And ultimately, you know, I think that's where the real value would 10:38get unlocked and where a lot of, um, really exciting workflows could 10:42develop, especially in agentic patterns. 10:45If we can really start to focus more on having very structured, formulaic, maybe 10:49not perfectly machine read, uh, human readable and that it's not, you know, it's 10:52not like storytelling when I read what the model is happening, but a very formulaic 10:56way to work with these models, I think is, is ultimately where we're going to end up. 11:00Tim Hwang: Yeah, it'll be so funny because what you're describing is we're 11:02reconverging towards like code, right? 11:04Like structured language as a way of getting systems to 11:06do what they want them to do. 11:07Kate Soule: Yeah, we started structured, created a bunch of unstructured, 11:10and now we're like, wait, that was actually, there was some good things 11:13there that we should maybe bring back. 11:20Tim Hwang: So I'm going to move us on to our next topic. 11:21Um, uh, we spend a lot of time on Mixture of Experts talking about software, 11:25we talk a lot about enterprise, but I think one of the most kind of, uh, 11:28interesting things viral, if you will, AI moments of the last few weeks was the 11:31launch of a humanoid robot called NEO from a company called 1X Technologies. 11:37Um, and specifically they're working on the ideas to work on humanoid robots that 11:41are designed to be at home assistance. 11:44So, you know, this demo, basically if you've seen it and if you haven't, it's 11:47worth kind of looking up on YouTube or whatever, um, is a humanoid robot 11:50helping out around the home, right? 11:51Cleaning dishes, helping to clean up and otherwise kind of assist on, on tasks. 11:56Um, and. 11:57You know, again, I kind of wanted to kind of ask the question, and I think 12:00it's always an important question to ask in the world of AI, which is how 12:04much of this is going to be a reality? 12:06How much of this is like a really cool demo? 12:08Um, and maybe most importantly, would you buy one for your own home? 12:12But we can address that at a certain point. 12:13Um, Kaoutar, I'm kind of curious about your thoughts, if you saw the 12:16demo, what you thought about it. 12:17And, uh, you know, if you, if you think something like this is really 12:20going to be a reality, and I think in part, right, I think the question is 12:23like, whether or not this is like, a real affordable thing from a hardware 12:26standpoint, there's like a bunch of really kind of very practical, you know, 12:29bits and atoms kind of questions here that I would love to get your take on. 12:33Kaoutar El Maghraoui: I would love to have one actually in my home, 12:36cleaning dishes and cooking, someone who spends like an hour, one of the 12:41tasks I hate the most, of course, the demo was very impressive from 1X. 12:46And I think 1X is among the, one of the most prominent companies in the 12:49emerging field of humanoid robots. 12:51But will humanoid robots become a reality or still a pipe dream? 12:55So I think, you know, humanoid robots have been the focus of 12:59science fiction for a long time. 13:01And transitioning from dream to reality comes with significant challenges. 13:05So the argument for humanoid robots is that they can fit into environments 13:09designed for humans, use existing tools and interact more naturally with people. 13:14However, I think there are still several challenges that need to be fixed. 13:18You know, first, I think there is the mobility aspect, building a robot 13:22with these human level dexterity or mobility has proven very difficult. 13:28Uh, while there are some progress, I think there is still 13:31a lot that needs to be done. 13:34Technologies like soft robotics and advanced actuators are making strides 13:38here but are far from a robot that can perform all human tasks autonomously. 13:43The other, you know, challenge is the energy efficiency. 13:47I think these robots require significant power to function and which limits, 13:51you know, their practical use. 13:53NEO, for example, and other similar projects are working to make these 13:57robots more energy efficient, but the issues around battery life, energy 14:02consumption, there's still bottlenecks. 14:04Uh, the other, the other thing is the cognitive and social interactions 14:08here beyond just the physical tasks. 14:11You know, these robots must navigate. 14:13Um, the complexities of all the human life, interactions, perceptions, 14:18and developing an AI capable, uh, robot of, that's, that is capable of 14:23interpreting these social cues, responding appropriately, making decisions in real 14:28time, is still an ongoing research area. 14:31There, there is still a lot of work around AI and reasoning. 14:35So I think it's going to take time for us to get there. 14:38And another challenge, I think, is the economics of this. 14:41Building something that is affordable and, uh, and versatile, 14:45reliable, it's still a major hurdle. 14:48And for many industrial and service applications, simpler robots or 14:52specialized machines are more efficient and cost effective than having this 14:56general, uh, purpose humanoid robot. 14:59So the complexity and the costs of these humanoid robots, I think, uh, especially 15:05in their design, still limits, uh, the adoption to especially niche markets. 15:11So I think there are, you know, challenges, you know, what's the 15:14reality versus the long term vision? 15:17You know, at present, uh, it is a transitional phase. 15:21Existing prototypes, they are far from ubiquitous, but there are really nice 15:26demos and it shows a lot of promise. 15:29But I think we are still not there in terms of the mass market tools 15:34and, and adoption and, um, there, there, but in it's not just, you 15:38know, a technological pipe dream. 15:40I think, um, it's gonna happen. 15:43That's my, my thinking. 15:45But it's, you know, for the full realization, it's gonna take years, 15:49if not maybe decades away before they really become a reality. 15:52Tim Hwang: Yeah, that functionality gap is very interesting to think about, like, 15:55I love the idea that for a period of time people are purchasing these, but it 15:58turns out there's like not a whole lot you can do around the home with them, so 16:02that they end up just like being like all the lonely Pelotons you see in people's 16:05houses, or it's like this really expensive piece of hardware that just kind of sits 16:08around, but it's, it's just funny because it's like a humanoid guy, basically, um. 16:13I guess, uh, I don't know, Kate, Shobhit, if you've got a kind of view on this, 16:16if you're a little bit more skeptical or if you kind of agree that like, 16:19yeah, maybe, I don't know, Kaoutar, you didn't put a date on it, but like 16:21in our lifetime, you know, we'll see this become like a practical reality. 16:25Shobhit Varshney: Yeah. 16:25So, um, I'm a big geek and I will, I will go and buy stuff that I 16:30think is, is, is awesome, right. 16:32So I'm 16:32Tim Hwang: You're going to have the Peloton, uh, NEO robot in your house. 16:35Shobhit Varshney: So, um, I feel that the same argument about one massive model. 16:41That's just absolutely stunning, can do everything like a GPT 4.0 16:44model or cloud models, right? 16:46Versus the argument that a smaller set of models and a niche for specific 16:49use cases, a lot more efficient and a target for a particular use case, right? 16:53I'm on the camp of, I would rather have a device. 16:58That is helping me for a particular task and it's incredibly 17:01doing a good job at that task. 17:03As an example, uh, I use the Roborock S8 MaxV Ultra, whatever the highest end 17:10of their robot that does vacuuming and mopping and goes back and cleans itself 17:15up and dries itself up, comes back again and finishes off that last little bit 17:19of scrubbing that it missed somewhere. 17:21More specialized tools helping us augment what humans aren't good at... 17:26I think that's the future direction in the short run. 17:28It'll take a while for us to get to something that solves for 17:31all the constraints that we just discussed before you get to a point 17:35where a humanoid replica of you can actually start doing things. 17:38So I think in the short next five years, specialized tools that do 17:42a particular task incredibly well, are cost optimized, it's repetitive, 17:46they nail that particular use case. 17:48I'm more, I'm more in that camp. 17:50Kate, do you think the same? 17:52Kate Soule: I completely agree. 17:54If you think about how model specialization has progressed, 17:58uh, you know, we see the same exact trends as you pulled out. 18:01So, I'm 100 percent in the same camp. 18:04It also reminds me of, you know, the common, you know, story that you hear 18:08of where if you asked someone back in the horse and buggy, days what they 18:11wanted, and they always said they wanted a faster horse, and then, you know, Ford 18:15came along and released the first cars. 18:17And I think we're in a bit of that scenario right now where it's like, 18:21I just want more human time to do the things that I don't want to do as a human. 18:25So create some humanoid robot, but really like, can we rethink of like what the 18:29right way this is to make, um, humans more superpowered, not just create more humans 18:34that we don't have to worry about feeding them or other potential, uh, labor issues. 18:39Shobhit Varshney: Okay, that sounds more like, say, uh, how we solve 18:42the dishwasher paradigm, right? 18:44Kate Soule: Yeah. 18:45Shobhit Varshney: We figured out that there's an optimal way of 18:47washing dishes and it does an incredibly good job at a very low 18:50price point and it nails it, right? 18:53So we have changed the way human workflow used to work, right? 18:56Earlier, as a human, I would take a dish, rinse it, and keep it somewhere else. 19:00We did not try to optimize that particular workflow. 19:03We said there's a better way of solving this particular niche use case. 19:06It's very custom optimized, and we'll nail it. 19:08So, I'm on that camp with you that I think we'll get to a point where smaller 19:12machines that do a particular task really well will I don't want, like, 19:15for example, in our pool, we have, uh, we have a skimmer that just skims 19:19and removes all the dirt from the top. 19:21Now, a human will take a net and try to clean up each one of them one by one. 19:25That's not the optimal way of solving for that problem. 19:27So I'm with you that the workflow, the human workflow has got to change. 19:32And then we optimize. 19:33By the time we get to a point where we get a humanoid that can then 19:36solve for all the problems that we discussed around cost and flexibility, 19:39dexterity, and things of that nature. 19:40Tim Hwang: Yeah, and I think you, for what it's worth, I think also 19:43just, like, you can't discount, like, the creep factor, right? 19:45Like, I do feel like it's, like, a little bit, it's a little bit 19:47spooky to have, like, a, you know, a large human in my house. 19:52Um, and, uh, and I do think that will be part of the adoption, almost, 19:55like, leans in favor of these more specialized applications, because 19:57they kind of don't raise that fear. 20:00Uh, I don't know. 20:00We'll have to see in practice whether or not X1 is able to pull this off, or 20:031X, sorry, is able to pull this off. 20:06Kaoutar El Maghraoui: Yeah, I think it's interesting development, you 20:08know, and it's all it's all comes to to what people are also able to consume 20:13and the capabilities, of course, specialization versus generalization 20:16is always going to be a concern. 20:19But of course, if we can combine both, that, that would be great. 20:22Um, so it's like what these LLMs are doing, but we still need special models. 20:28But, you know, the evolution of LLMs is still important. 20:31Having these large models that can do a variety of things, but then 20:34specializing them for certain tasks. 20:36Can we have the same argument for these, uh, humanized robots that, you know, can 20:42do a variety of tasks, but maybe you can press a button and tell it, now I want 20:46you just to be focused on cleaning the dishwasher or the pool, or so something 20:52that's maybe take a subset of that model that is specialized within that humanized. 20:57I think that would be cool to have. 20:59Tim Hwang: Yeah, I mean, ultimately, you're going to have like, you know, 21:01the humanoid robots going to be the one that does the maintenance 21:03for all the other smaller robots. 21:05It's just going to be robots all the way down. 21:06Kaoutar El Maghraoui: It's like a hierarchy over here. 21:07Tim Hwang: Yeah, exactly. 21:08Shobhit Varshney: I think what Kaoutar, I mean, this Kaoutar, just 21:11the way you framed, I think you're looking at a Transformer robot.... 21:14Kaoutar El Maghraoui: Exactly. 21:16Something... 21:16Shobhit Varshney: ...a vacuum cleaner so it can do that 21:18one job really, really well. 21:20That'll be the world we live in. 21:21Kaoutar El Maghraoui: That would be cool. 21:22Yeah. 21:28Tim Hwang: Um, so I'm gonna move us onto our next topic. 21:29Um, so there's a fascinating paper that was shared by a friend of the pod Kush 21:33Varshney, who, uh, if you're a listener, has been a recurring guest on this show. 21:37Um, and what I love about some of these papers in machine learning 21:40is that they like pick the most dramatic name for their paper. 21:44And so the name of the paper is The AI Scientist. 21:47It has a long title about kind of towards, you know, effectively like 21:50using AI to automate end to end science. 21:53Um, and it's a proposed system that tries to really see and kind of 21:57push the limits of whether or not large language models can really 22:01help out with scientific discovery in a fully kind of automated way. 22:05And this is a big deal. 22:06I mean, you know, you think about how you know, societal progress happens, 22:10right, like these technological breakthroughs are really critical. 22:12And so, you know, one way of thinking about it is that we've got this kind 22:15of bottleneck for the researchers, the brilliant minds that we have. 22:19And so, you know, the hope is basically, can we augment that process? 22:22Can we accelerate that process with AI has been kind of a real focus. 22:26Um, you know, what I always worry about these papers is that the 22:29results look almost too good and like the ambition is too great. 22:33Um, but I mean, Kaoutar, I know you looked at this paper in some detail. 22:37I'm curious if you're coming away with this feeling like, yeah, they really 22:40kind of hit upon something here that, that really could be the kernel of 22:43something, um, new, or if you feel like, you know, you know, ultimately 22:48like the way AI fits in science is going to look a little bit different 22:51from the way they're proposing here. 22:53Kaoutar El Maghraoui: Yeah, I, I enjoyed reading the paper. 22:55I think it's really, um, Uh, put forward a very nice way of, you know, 23:00kind of thinking of this automated AI scientist, which made me also worry, 23:05you know, what's going to happen to the scientists in the future? 23:08Um, so, so it presents, you know, this very nice framework where large 23:12language models generate research ideas, write code, run experiments, 23:16visualize results, and even write papers. 23:19So, and they also showed some very interesting papers that were, uh, you 23:22know, generated by this AI scientist. 23:26Uh, one thing that... 23:27Tim Hwang: Yeah, I just needed to do the, uh, the paper session at the conference, 23:29uh, the poster session at the conference. 23:31Kaoutar El Maghraoui: Makes you even worry, you know, what's going to happen 23:33to the conferences in the future and some of the papers, are they really 23:37generated by real scientists, or this is all, you know, LLM generated? 23:41Um, so these advancements could significantly impact scientific 23:45discovery, reducing the cost and also increasing the speed of research. 23:48So there, there could be some benefits to this, especially if you look at it as 23:51an augmentation for human, uh, research. 23:55The, the thing is the controversy surrounding this paper is largely, you 23:59know, coming from the methodological concerns that they have using. 24:03And especially when you look at, uh, you know, the, the reliance 24:08on automated review systems. 24:10to evaluate the scientific quality. 24:12And that kind of raised some concerns to me. 24:15Uh, you know, the questions here, whether, you know, such 24:17reviews can truly assess novelty, creativity, and rigor of the work. 24:22Uh, and also I think one thing that's skeptical is whether the AI could 24:26really fully replace a human intuition in scientific discovery, especially 24:31when you're dealing with more abstract, uh, or interdisciplinary fields. 24:36So this, I think AI is still not there yet. 24:39when you're really looking across multiple fields and, you know, kind 24:43of mimicking that human intuition. 24:46And I think another thing, uh, is also the broader ethical and social implications 24:51for automating scientific research. 24:53So there are a lot of concerns here, but I think from a scientific perspective, 24:57it's a very nice piece of work. 24:59Um, And but has a lot of implications, of course, ethical and and also the automated 25:07review the process that they have. 25:08So... 25:09Tim Hwang: That's right. 25:09Yeah, I'm curious. 25:10Kate, I mean, as a researcher yourself, how do you how do you feel about all this? 25:13You know, I feel like it's like we're very interesting, for example, seeing 25:16like engineers be like, well, They're never going to learn to code as good 25:19as I am, so I know there's kind of like a tendency to kind of push back 25:22on it, but I'm curious about how you think about these types of experiments. 25:25Are they like fun toys? 25:26Like, would you use these? 25:27Like, would you read the papers produced by these AIs? 25:30Kate Soule: Yeah, well, I'm honored you call me a researcher, but I 25:33certainly work with a lot of amazing researchers here at IBM Research, 25:37even if I'm not one directly. 25:39But, you know, I, I actually question whether, as a non-researcher, this might 25:46be a naive opinion, whether there isn't something that, uh, LLMs can do well in 25:52terms of understanding what's been done in the past with related literature on a much 25:56broader scale than what's humanly possible to go through and analyze and read and 26:01try and find similar methods or approaches to apply to a new problem that's related. 26:07Um, I don't know if Kaoutar you have any, any thoughts on that, 26:10if that's a, maybe a jump too far. 26:12Kaoutar El Maghraoui: No, I think I agree. 26:13You have a point there. 26:14So there might be stuff that they're discovering that scientists are not 26:17able to discover because they're pulling from a wide variety of sources. 26:22Uh, but I think we still need human in the loop here to validate, verify, 26:28uh, you know, these experiments and then take them to the real world 26:31and try them and see the results. 26:34So we cannot just take the results out from, you know, these LLMs and then 26:38just apply them directly to, so there, I think there still needs to be some 26:41verification as probably these systems will get better and better as, you know, 26:46we use them more for scientific discovery. 26:48Tim Hwang: Yeah, I think one of the interesting things here is that, uh, 26:51you know, some of the people I know who research this space think a little 26:53bit about like the burden of knowledge, which is like, there's just like more and 26:56more knowledge and more and more papers. 26:59And, you know, part of the hope with some of these systems is 27:01simply that, like, there's a lot of findings that could exist purely in. 27:05Like finding connections between papers that just people are not 27:08making the connection between. 27:09And so that ends up kind of reducing it more to like a search problem, right? 27:13I think what's kind of interesting here is the idea that like, then 27:15you want them to run the experiment. 27:17Then you want the AI to do the empirical stuff. 27:19You know, I think there's a question about how far kind of beyond just the 27:22question of search you need to go. 27:24Shobhit Varshney: Yes. 27:24I think just like any workflow from an enterprise perspective, we help a lot 27:28of, uh, people clients with their R&D research and things of that nature, right? 27:32Coming up with a new formulation for a, for a new food item or a perfume or like 27:37product research for the next, uh, car, you know, so on and so forth, right? 27:41Battery research, whatnot. 27:42So across all of them, just like any other workflow in an organization, 27:46you figure out that here are all the steps that are needed. 27:48When you are hiring somebody brilliant from MIT to come join your team as an 27:51intern, you're giving them a specific task to augment what a senior researcher in the 27:56field for a decade has been doing, right? 27:58So, Kaoutar, you will plan out saying that, hey, here's a task 28:00that I'm going to go give you, go research this particular topic. 28:03I think we'll start to incrementally see more and more AI helping out on specific 28:07tasks in the research spectrum end to end. 28:10I don't think, just like any other workflow, I don't think it'll 28:13completely be taken over by AI. 28:14I think it's. 28:15augmenting intelligence rather than being replacing. 28:18So I think that the good tandem between humans and and AI will also start getting 28:23better at what to request for help. 28:25So for example, you want to just mention a knowledge graph across a whole 28:28bunch of different research papers to figure out if somebody overseas in a 28:32different country had some novel idea that you just didn't think about, right? 28:36So I think we'll get to a point where this research, what I'm really 28:39interested in, is a conference that we get to where each one of us would 28:43have our representatives as AI going and talking to each other, right? 28:46Just imagine if you have a collaboration between a team of researchers 28:50with their AI counterparts in, uh, in Israel, talking to the same, 28:56like, their counterparts in the U. 28:57S. 28:57and they're exchanging ideas and you come up with a new theorem and 29:00say, hey, I think we came up with this new idea that we should do X. 29:04I'm just looking forward to a world where we start using the word "we" when AI is 29:08actually starting to do something for us. 29:11Tim Hwang: Well, and like one of the big dramas in academia, of course, 29:14is like, who's the first author? 29:15Like I wonder if in the future it'll be like, you'll get into 29:17this big struggle with some LLM collaborator that you have is trying 29:20to take all the credit from you now. 29:21And, you know, we'll have that drama play out, but it would just be funny because 29:24it'll be, you know, humans and AIs. 29:25Kaoutar El Maghraoui: So I think it'll be competition between models 29:28who's writing the best paper and who's, uh, AI conference completely 29:33generated by AI and reviewed by AI. 29:36Tim Hwang: That's right. 29:37Yeah, exactly. 29:38Angry that you're unjustly turned down for your paper. 29:40Reviewer number two, you know. 29:41Shobhit Varshney: I would say that there are certain things that 29:43we don't think about quite yet in the whole research spectrum. 29:46When you, we are so focused on doing our, our actual novel research, 29:50when it comes to say peer reviews. 29:53I'll give you an example of what we're doing with some of our utility companies. 29:56Utilities, when they have to go file for increasing the price of their 29:59electricity in a particular state, they have to go file for a case. 30:03And they have to make a case and say, here's why I think I should 30:05increase it by X cents, right? 30:06Five cents. 30:08We're helping these utilities create that whole submission package. 30:11So we're looking at everything that they have submitted all competition. 30:14It's all openly available online. 30:16So you research and help create the first package itself. 30:19Then once you know who's going to be on the panel, who's going to be 30:21assessing it, we can then go look at every question that they've ever asked. 30:25So in this case, in a peer review, we know when Shobhit gets to be the 30:29reviewer, I typically ask more about ethical concerns about a particular 30:32paper and so on and so forth, right? 30:33Each one of us has a pattern on how we ask questions, right? 30:37So now we reverse engineer what the judges would ask on the panel and 30:40then we change the documentation so that the submission itself is 30:43going to address those proactively. 30:45Then when you actually go and have to present your case in person, 30:48that's an interview that's happening. 30:50So then we are preparing the witness based on the kind of questions that 30:53the person has asked everywhere else and what's the right. 30:55chain of thought to go on to that. 30:57So I think there are aspects of research that researchers don't 31:01want to do that I think AI will be really helpful in augmenting. 31:04Do you think that'll be helpful, Kaoutar? 31:06Kaoutar El Maghraoui: I think so, definitely. 31:08Yeah, of course, as humans we're limited and if we're augmented by, you know, AI, 31:14we're, we're going to be superhumans and, uh, hopefully in the right direction. 31:19So... 31:20Kate Soule: Well, and I think it gets back to what we were just talking about, right? 31:23Like, are we going to have AI, like literally try and become 31:26its own researcher and just replicate what a human can do? 31:30Or are we going to have AI specialize in parts of the process and run that 31:34process faster and better and support humans and new, more efficient workflows? 31:39It's just, you know, now without the robots focused on scientific method. 31:48Tim Hwang: The news story of the week was that it was finally kind of 31:51rumored a new story kind of came out that OpenAI is going to be investing 31:55in trying to produce its own in house chips to support its work. 31:59And part of this is it's. 32:00You know, integration and collaboration with Apple, but more generally, 32:03you know, there's been something they've been rumored about for some 32:05time that now looks like it's now more in the realm of certainty that 32:08they really are kind of investing this in a really, really big way. 32:12Um, you know, Kaoutar, you're the most natural person to ask about this, but 32:15like, why would OpenAI want to do this? 32:17Like semiconductors are like wildly expensive, very hard to pull off. 32:21You know, my understanding is basically like, you know, China, the whole country 32:24has been trying to like reproduce the Taiwanese semiconductor industry. 32:28And like, is only moderately successful at it. 32:31Like, why should, why is OpenAI kind of making such a big bet on hardware? 32:35Kaoutar El Maghraoui: I think, um, the CEO of OpenAI, Sam Altman, has 32:40made the acquisition of more AI chips a top priority of his company. 32:43And he publicly even said, he complained actually about the 32:46scarcity of, uh, of these AI chips. 32:49So given, I think, all the rising costs, uh, chip costs, the supply 32:56chain challenges, and the need for specialized hardware, uh, 32:59especially specialized hardware that's optimized for OpenAI models. 33:03It seems to me that this is a strategic move. 33:06So designing their own chips could enable OpenAI to tailor hardware for their 33:10specific workloads, improving performance, efficiency, and scaling potential. 33:15However, of course, there are challenges here and financial challenges given 33:19the complexity, especially of the semiconductor design and manufacturing. 33:23Um, so by creating this in house chips, OpenAI can reduce its reliance on 33:28third party manufacturers like NVIDIA, which control a significant portion 33:32of the AI hardware market, almost 80%. 33:35So it's going to give them more control over the supply chains and allow them to 33:39specialize and optimize for their unique workloads, potentially improving their 33:43efficiency, performance, and scalability. 33:46While semiconductor development is a challenging and costly endeavor, I 33:50think this move could enable OpenAI to differentiate its hardware and 33:54scale, its operations effectively. 33:58I think they've thought a lot about this, but I think it's 34:00a strategic move for them. 34:02But also to diversify. 34:04Tim Hwang: Totally. 34:04I mean, as wild as what you're saying is basically like, you know, what's 34:08cheaper than trying to get H 100s? 34:10It's like literally building your own semiconductor supply chain, 34:13which is a really crazy thing to say. 34:15Um, I guess, uh, I don't know, Kate, Shobhit, but if you've got 34:18kind of thoughts on this, I mean, one big question is like, do we 34:21think it's going to be successful? 34:22Like I can almost see the argument for it, but man, if it isn't a 34:25high risk sort of thing, right? 34:26Kate Soule: I mean, certainly high risk. 34:27I really want to emphasize one point that Kaoutar brought up, which is 34:30there's tremendous opportunities. 34:32We look at kind of this next generation of AI and what's going to come 34:36next on AI and hardware co design. 34:39So making sure that we're developing these models and the hardware 34:43that runs them in tandem to really unlock kind of new performance 34:48levels, new efficiencies and cost. 34:50Um, there's, there's tremendous opportunity there. 34:53So, you know, I think it makes sense. 34:55It makes a lot of sense to start to put some skin in the game, so to speak, 34:58um, given that, you know, there's just a ton of ways that they could 35:02continue to innovate, um, once they have better control over hardware design. 35:06Tim Hwang: Yeah, for sure. 35:07And Shobhit, I guess maybe you're kind of ideal to wrap up this section and 35:10close this out for the episode is, you know, you think a little bit about how, 35:13what this all means for business, right? 35:15What this all means for enterprise. 35:17Like, can you paint a picture a little bit more, right? 35:18Because I think the semiconductor stuff is often very abstract. 35:21But as Kate is saying, there's some very practical implications to, you 35:24know, our experience of these kinds of technologies in the systems. 35:27But like, I'm kind of curious, like, what does the everyday look like if OpenAI 35:30is really successful here, you think? 35:32Shobhit Varshney: NVIDIA is a great partner with us. 35:34We do a lot of work, uh, we have joint clients and whatnot, right? 35:37So we do a significant amount of work. 35:39Yesterday, I spent the entire day with NVIDIA. 35:42We're doing a lot of work around where, where they can go and 35:45work with enterprises beyond the hyperscalers themselves. 35:48So they got into quite a bit of detail, uh, behind the covers, 35:51explaining us the intellectual property they've built, the differentiation. 35:55They have a significant moat today. 35:58Not just on the chip level but the way you do the architect 36:01the entire end to end flow. 36:03The total cost of ownership-- you're going down from a massive 36:06data center down to one box. 36:08Just the wiring in the existing data centers is more expensive 36:11than that one box from NVIDIA. 36:13So the total cost of ownership and Jensen made this uh this famous statement 36:16saying even if they're competitors who are the customers as well, even 36:20if they made free chips, the total cost would still be lower on NVIDIA. 36:24So they've done an incredibly good job on driving higher 36:27efficiencies, more throughput, 5x, 10x on the same kind of footprint. 36:32So I think it'll take a while for a company like OpenAI to 36:35do everything that's around it. 36:37It'll take them a while, just like when Tesla came to market, it took them a 36:40while to figure out how to actually productionalize this end to end. 36:43Creating a car, the actual, the core of it, that piece was great. 36:49The researchers could solve for that. 36:50But the whole manufacturing and the supply chain and the total cost, how 36:54do you get a car to actually be a $30,000 car that people want to buy? 36:58It'll take a while for OpenAI to get there. 37:00And I think there's that, in my view, is going to distract them a little bit... 37:04from their core business. 37:06They should, in my view, should be focusing more on how do we get to 37:10adding more intelligence, what Ilya just did with SSI, raising a billion 37:14dollars, uh, what Claude, uh, models are doing with more responsible AI and 37:19stuff, I think there's still a lot more focus that's needed on solving that 37:23side of the problem for enterprises. 37:24The cost will come down over time, just the way the economics work, 37:28the cost of computing on NVIDIA has implemented in the last decade. 37:33So I think that the focus of OpenAI should still be problems that need 37:36to resolve before they start to go vertically integrating end to end. 37:40Tim Hwang: Yeah, it'll be fascinating to see. 37:41And as I said, I think this will not be the last time 37:43that we talk about this issue. 37:44So, I'm not overly sad that we ran out of time today about it, but 37:48we will pick it up in the future. 37:49Um, uh, So that's what we have time for today. 37:51So Shobhit, Kate, Kaoutar, thanks for joining us on the show. 37:54Um, and for all you listeners out there, if you enjoyed what you heard, 37:57uh, as always, you can get mixture of experts on Apple Podcasts, Spotify, 38:01and podcast platforms everywhere. 38:03And we'll see you next week.