White House AI Plan Meets IMO Milestone
Key Points
- The White House unveiled an AI action plan that serves as a national strategy for artificial intelligence and a “starter pistol” for future congressional legislation.
- Tim Huang’s “Mixture of Experts” podcast gathers leading AI thinkers—including Kate Soul, Gabe Goodhart, Mihi Crevetti, and policy expert Ryan Hagaman—to unpack the week’s most important AI news.
- DeepMind and OpenAI have demonstrated AI systems that can achieve gold‑medal level performance on the International Math Olympiad, placing them in the top 8‑10 % of high‑school mathematicians worldwide.
- The panel likens this IMO breakthrough to AlphaGo’s historic impact—recognizing it as a major benchmark shift but noting it may not yet translate into immediate real‑world applications.
- Upcoming segments of the show will dive into the ChatGPT agent, Mihi’s MCP gateway project, and a deeper discussion of the newly released AI action plan with Ryan Hagaman.
Sections
- White House AI Action Plan Overview - The segment introduces the administration’s AI action plan as a national strategy and a catalyst for future legislation, while previewing the Mixture of Experts podcast discussion with AI experts.
- AI Tooling, Aentic Techniques, and IMO Context - The speakers explain how modern AI augments language models with external tools such as calculators, illustrate the growing precision of Aentic AI methods, and then shift to describing the prestige and difficulty of the International Math Olympiad, noting the guest’s modest mathematics background.
- Evolving AI from Prompts to Toolchains - The speaker explains how AI development has shifted from simple, token‑cheap prompting toward pre‑built toolkits that define, verify, and execute tasks in parallel, dramatically improving result quality.
- AI Solves Olympiad Math in Real Time - The speaker highlights how the specialized AlphaProof model, once requiring days per problem, now completes all six Olympiad questions within the contest’s 4½‑hour limit, demonstrating AI’s shift toward practical, time‑constrained problem solving.
- Evaluating Niche AI Performance - The speaker argues that AI systems addressing extremely specialized problems cannot be reliably assessed with traditional statistical methods, likening this shift in evaluation to the gradual acceptance of Wikipedia as a trustworthy source despite its earlier skepticism.
- OpenAI Unveils ChatGPT Agent, Cost Concerns - The speaker introduces OpenAI’s new ChatGPT agent—a Lux‑tier, browser‑like, agentic tool—asks a colleague for quick impressions while highlighting the growing expense of the service, especially in Europe.
- From PC Recipes to AI Agents - The speakers liken early PC experimentation to today’s AI agents, emphasizing the minimalist UX, public trust concerns, and security hurdles that keep such tools consumer‑focused rather than enterprise‑ready.
- Cautious Adoption and UX Leap - The speaker explains limited trust in AI without final control, favors slow, experimental adoption, and emphasizes that improved user experience—rather than new technology—transformed existing tools like GPT‑3 into a major breakthrough, highlighting the need for simple, tool‑agnostic entry points.
- Unified MCP Gateway for Multi‑Server Federation - The speakers explain how the open‑source MCP gateway and registry let you combine disparate servers into a single virtual endpoint with centralized authentication, authorization, observability, plugin hooks, and protocol conversion, providing a lever to manage the complexity of real‑world MCP deployments.
- Scaling Trust in AI Agents - The speakers examine how UI/UX design must manage agent failures—stemming from model limits, noisy internet data, and poor implementations—and advocate for middleware solutions to enhance scalability, maintenance, and security of emerging AI agents.
- Trump AI Policy Blueprint Overview - The speaker summarizes the administration’s newly released AI agenda—including three executive orders, roughly 135 agency actions across three pillars, and a forthcoming legislative push—framed as a “policy Super Bowl” for AI‑focused officials.
- Executive Order Streamlines Data Center Expansion - The participants describe how a recent executive order simplifies regulatory approvals to accelerate energy grid capacity and data‑center construction, indirectly benefiting IBM by supporting the increased power needs of future large‑language‑model workloads.
- Ryan Returns for Ongoing Coverage - The host invites Ryan back to discuss unfolding DC developments, thanks him for his participation, and concludes the episode with the usual podcast sign‑off.
Full Transcript
# White House AI Plan Meets IMO Milestone **Source:** [https://www.youtube.com/watch?v=6RYrUyxXsYU](https://www.youtube.com/watch?v=6RYrUyxXsYU) **Duration:** 00:43:22 ## Summary - The White House unveiled an AI action plan that serves as a national strategy for artificial intelligence and a “starter pistol” for future congressional legislation. - Tim Huang’s “Mixture of Experts” podcast gathers leading AI thinkers—including Kate Soul, Gabe Goodhart, Mihi Crevetti, and policy expert Ryan Hagaman—to unpack the week’s most important AI news. - DeepMind and OpenAI have demonstrated AI systems that can achieve gold‑medal level performance on the International Math Olympiad, placing them in the top 8‑10 % of high‑school mathematicians worldwide. - The panel likens this IMO breakthrough to AlphaGo’s historic impact—recognizing it as a major benchmark shift but noting it may not yet translate into immediate real‑world applications. - Upcoming segments of the show will dive into the ChatGPT agent, Mihi’s MCP gateway project, and a deeper discussion of the newly released AI action plan with Ryan Hagaman. ## Sections - [00:00:00](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=0s) **White House AI Action Plan Overview** - The segment introduces the administration’s AI action plan as a national strategy and a catalyst for future legislation, while previewing the Mixture of Experts podcast discussion with AI experts. - [00:03:22](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=202s) **AI Tooling, Aentic Techniques, and IMO Context** - The speakers explain how modern AI augments language models with external tools such as calculators, illustrate the growing precision of Aentic AI methods, and then shift to describing the prestige and difficulty of the International Math Olympiad, noting the guest’s modest mathematics background. - [00:06:33](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=393s) **Evolving AI from Prompts to Toolchains** - The speaker explains how AI development has shifted from simple, token‑cheap prompting toward pre‑built toolkits that define, verify, and execute tasks in parallel, dramatically improving result quality. - [00:10:01](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=601s) **AI Solves Olympiad Math in Real Time** - The speaker highlights how the specialized AlphaProof model, once requiring days per problem, now completes all six Olympiad questions within the contest’s 4½‑hour limit, demonstrating AI’s shift toward practical, time‑constrained problem solving. - [00:13:07](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=787s) **Evaluating Niche AI Performance** - The speaker argues that AI systems addressing extremely specialized problems cannot be reliably assessed with traditional statistical methods, likening this shift in evaluation to the gradual acceptance of Wikipedia as a trustworthy source despite its earlier skepticism. - [00:16:22](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=982s) **OpenAI Unveils ChatGPT Agent, Cost Concerns** - The speaker introduces OpenAI’s new ChatGPT agent—a Lux‑tier, browser‑like, agentic tool—asks a colleague for quick impressions while highlighting the growing expense of the service, especially in Europe. - [00:20:08](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=1208s) **From PC Recipes to AI Agents** - The speakers liken early PC experimentation to today’s AI agents, emphasizing the minimalist UX, public trust concerns, and security hurdles that keep such tools consumer‑focused rather than enterprise‑ready. - [00:23:24](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=1404s) **Cautious Adoption and UX Leap** - The speaker explains limited trust in AI without final control, favors slow, experimental adoption, and emphasizes that improved user experience—rather than new technology—transformed existing tools like GPT‑3 into a major breakthrough, highlighting the need for simple, tool‑agnostic entry points. - [00:27:49](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=1669s) **Unified MCP Gateway for Multi‑Server Federation** - The speakers explain how the open‑source MCP gateway and registry let you combine disparate servers into a single virtual endpoint with centralized authentication, authorization, observability, plugin hooks, and protocol conversion, providing a lever to manage the complexity of real‑world MCP deployments. - [00:33:04](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=1984s) **Scaling Trust in AI Agents** - The speakers examine how UI/UX design must manage agent failures—stemming from model limits, noisy internet data, and poor implementations—and advocate for middleware solutions to enhance scalability, maintenance, and security of emerging AI agents. - [00:36:18](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=2178s) **Trump AI Policy Blueprint Overview** - The speaker summarizes the administration’s newly released AI agenda—including three executive orders, roughly 135 agency actions across three pillars, and a forthcoming legislative push—framed as a “policy Super Bowl” for AI‑focused officials. - [00:39:38](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=2378s) **Executive Order Streamlines Data Center Expansion** - The participants describe how a recent executive order simplifies regulatory approvals to accelerate energy grid capacity and data‑center construction, indirectly benefiting IBM by supporting the increased power needs of future large‑language‑model workloads. - [00:42:51](https://www.youtube.com/watch?v=6RYrUyxXsYU&t=2571s) **Ryan Returns for Ongoing Coverage** - The host invites Ryan back to discuss unfolding DC developments, thanks him for his participation, and concludes the episode with the usual podcast sign‑off. ## Full Transcript
The White House has released its AI
action plan, which is sort of a national
strategy for artificial intelligence.
The short version is what this document
basically does is it lays out the Trump
administration's policy agenda as it
relates to artificial intelligence. And
part of the reason I think you haven't
seen so much action uh in Congress is
this is also the starter pistol for
legislative action in the future.
>> All that and more on today's mixture of
experts.
[Music]
I'm Tim Huang and welcome to Mixture of
Experts. Each week, Moe brings together
a sharp group of thinkers working at the
very cutting edge of artificial
intelligence to discuss, debate, and
distill the week's news. Today I'm
joined by Kate Soul, director of
technical product management for
Granite, Gabe Goodhart, chief architect
AI open innovation, and Mihi Crevetti,
distinguished engineer, Aentic AI. Later
in the show, we're going to be joined by
Ryan Hagaman, who's the global AI policy
issue lead. Per usual, we're going to
talk about ChateBT agent, uh, Mihi's MCP
gateway project. Uh, and we're going to
have Ryan on to talk about this newly
released AI action plan. But first, I
really want to start by talking about
IMO, which is the International Math
Olympiad.
[Music]
First, I think wanted to start today by
talking about IMO. Um, IMO stands for
the International Math Olympiad. It's
the world's foremost annual mathematics
competition for high school students.
And we're talking about it today because
both DeepMind and OpenAI have claimed
sort of uh their systems achieving a
gold standard in competing in that
competition which would basically put
their technology within kind of a
comparable performance to the top 8 to
10% of high school mathematicians. Um
now this is a big competition. Uh over
110 countries send teams to it each
year. And I think some mathematicians
actually took a step back on this news
breaking and said, you know, we think
that this is a a Lisa Sadole moment.
This is as big as Alph Go uh from a
number of years back. And I think that's
the first kind of round the horn
question I wanted to kind of prompt you
guys with is, you know, is this a Lee
Sad moment? Is this a really big deal or
is this kind of just another benchmark?
Um Gabe, maybe I'll start with you.
Well, as a former mathematician who
discovered that computers can do math a
lot better than I can, uh, and then
turned to become a computer scientist,
uh, this is both extremely comforting
and not surprising. Um, but, uh, I think
seeing the depth of the logic and
reading some of the techniques they
used, um, it's a really cool piece of
technology change. I don't know that
it's going to flip any tables over
today.
>> Cool. Got it. Kate, what do you think? I
mean, I think it is a similar moment to
Alph Go in that we've, you know, cracked
a new benchmark just like, you know,
AlphaGo cracked the game of chess.
Similar to Alph Go though, I don't think
it's going to have a tremendous kind of
real world tangible impact in the next
couple of years. Like I don't think uh
we're all, you know, we saw Alph Go beat
chess and then all of a sudden we were
impacted in AI in our daily lives every
day with all these different value
drivers and applications. And I think
this is kind of similar in that this is
an impressive challenge that was beaten.
Um I don't think this is like, oh, we've
now unlocked AGI and all these other
applications are going to fall into
place.
>> Got it. Emihi, what do you think?
>> I think it's really cool. I think it's
cool because it demonstrates a lot of
the techniques we use in Aentic AI as
well. Things like computer use, building
calculators, and using those functions
to solve the problems is not just
relying on what inherently the large
language models had been trained with.
And I think from that perspective, it
just demonstrates growth where aentic
techniques are becoming more and more
and more fine-tuned and more specific to
the type of task or workload they're
being applied to.
>> Yeah, that's great. And I definitely
want to get into that, but I think first
maybe Gabe, I'll go back to you because
I hadn't realized you had kind of a
mathematics background. Um, do you want
to just give our listeners a flavor of
like how big of a deal is the
International Math Olympiad? Is this
like pretty significant? How difficult
is this test? Like could I do this test?
like what what is this? What what are we
talking about?
>> To be fair, I don't think I have a good
answer to that one because frankly I
wasn't that kind of mathematician. Uh I
was a uh a liberal arts student that was
good at math and needed to pick a major.
So I started with math uh and then I
discovered computer science and said
this is much better. Um but no, I I mean
I went far enough in in math to really
get to the point where there were some
hard problems. Um, but frankly, uh, you
know, the beautiful thing about math is
that it is a well-defined ecosystem with
strict rules. That's kind of the whole
point of it. Um, and the higher you go
in math, the more you're exploring the
boundaries, the edge cases of those
rules. Um, and sort of the the esoterica
of rules that you might not have thought
of when you're looking at sort of a
simple uh arithmetic based or geometry
based space. So when you get into the
chaotic dynamics or uh you know
multivariate calculus and the like uh
and things that I never even got to um
you're really starting to take those
rules and go as far as you can and I
think that's one thing that's really
fascinating about this is that it's
basically showing that with these
additional techniques that Mihi
referenced um and with some uh inference
time compute that Kate I know you've
talked a lot about um they were really
able to push the model's reasoning
capabilities a lot further down the
boundaries of this very nicely defined
space of mathematics to explore portions
of it that uh typically you have to go
pretty deep and you have to actually as
a human uh have some good intuition
about where you're exploring within
those boundaries of mathematics. So I
think that's the the really interesting
thing here and I think Mahai you said it
really well. It's basically getting to
the point where we're no longer just
sort of throwing uh you know guess and
hope type of depth where every step in
the chain is going to have some errors.
They're going to compound and you might
get lucky and prove that it's possible
to get to a good answer. But the fact
that it could do it consistently enough
to get a good score on this many
difficult problems shows that the
techniques are really starting to reach
a point where they have uh sort of real
world applicab well maybe not even real
world but consistent applicability such
that you could imagine applying them to
a more difficult challenge in the real
world and actually relying on the
results without needing to check every
step of the way.
>> Yeah, for sure. And that's something I
really want to get to. I think Mihi to
kind of pick up on a comment that you
had in the response to the opening
question. You know, I think one of the
the most interesting things about AI is
we say like, oh, AI does this, AI does
that, but like we don't often talk about
the fact that like AI itself is sort of
changing as we go. And you highlighted
that like there's a bunch of different
techniques here. Um, and so do you want
to give us a little bit of flavor of
like what's new, right? Versus like how
we were maybe attacking these problems a
few years back when I guess we were more
in like stochastic parrot land. It seems
like you're almost suggesting that like,
you know, there's there's there's more
more tools being used to get get these
types of results.
>> I think what's changed is how we
approach these problems. Like to your
point, Gabe, we're no longer just
throwing a simple prompt or even a
simple chain of thought at a problem.
We're spending time beforehand to build
tools to define how those tools will be
used either to generate the answer to a
mathematical equation to verify those
results. And we're executing these tools
in parallel massively. So, think about
the problems you used to try to solve
with AI before. It cost you a nickel and
a couple of tokens to go to ask a
question. You're going to get a result.
That result was probably terrible and
you'd say, "I'm going to work with it
and try again and try again and try
again." Uh, the approach here is
massively parallel. You're firing off
millions, tens of millions of tokens at
hundreds of different tools, verifying
them every step of the way. It's similar
how deep the researchers work and you're
going to get really good results and the
question even as a business you have to
ask yourself is it more cost effective
to ask a question 10 times and get the
wrong answer or to ask it a million
times parally is going to be more
expensive but you're going to get the
correct answer so finding that balance
is going to be key. I don't think we can
use the same approach for every problem
because that approach requires you to
first engineer your tools. It requires
you to design a system and execute a
very expensive query because I don't
suspect this costs a dollar or $10 or
100. It probably costs tens of thousands
to ask all of the questions to solve
this problem. But if we can get to a
reasonable level within software
development use cases, for example,
where for one or two dollars you can do
massive refactoring of a codebase of a
class of a function with the same level
of tools, I think we'll be in good
shape. Yeah. And it leads quite nicely
actually. I guess Kate, you know, I
think you had a good comment which was
um look, I think almost like the Lisa
Dole moment is actually like a great
parallel because you're like, okay, you
you cracked go, you know, like a thing
that we didn't think we could do, but
it's also just like, okay, then then
what do what do we do? Like I guess, you
know, this is obviously like an
impressive technical achievement. is
kind of what you were saying a little
bit earlier that you feel like the
actual practical impact is going to be
limited in the near term like you know
that like what we're kind of seeing here
to Mihi's point is like it's not
actually kind of like a practical
approach to most of the problems that
people are trying to use AI for today.
>> Yeah, I'm skeptical that this is going
to all of a sudden just totally change
the calculus and how we approach all
math problems and unsolved problems. I
think it's going to be a really helpful
tool. I think it's going to be
incremental change though and not step
change in where we see things going. The
thing that I think most interesting is
that if you look, this isn't the first
time Gemini or Google at least has been
in the news around the math Olympiad.
Like over maybe a little over a year
ago. Um if we look at alpha proof which
is their specialized model for math it
achieved silver performance uh on the
the Olympiad but what was different I
mean they used a specialized model but
their model took days sometimes to solve
a single question and what's really
exciting and what I think is a bit of a
breakthrough is that they are able to
solve these problems now all of them
within the time limit that all of the
other competitors have to observe which
I think is something like 4 and 1/2
hours total for six questions. So, I
think that as we talk about kind of
these general tools as they're advancing
uh and techniques to Mihi's point, I
think this is a really great
demonstration of how those tools are
starting to enable much bigger changes.
I don't think it's something that like
okay math specific all of a sudden we're
going to have this crazy breakthrough
and uh the in the field and all of these
>> we're like not going to solve math or
whatever. M math is still going to be
really complicated and there's going to
be a lot of really thorny cool research
problems to explore hopefully faster
with the help of AI. Um but I think it's
really exciting the uh as we talk about
how these techniques and capabilities
are evolving to be much more practical
and be able to run in a much more real
time.
>> I'd really like to see one of these
benchmarks have a time limit like this
one, but also a budget and the time
limit of the prep work you prep work you
do ahead of time. So what can you do
with free engineers? 4 hours to prepare.
Kind of like robot wars, right? You
know, you only have four hours to write
your tools. You need to get it done. You
have a budget of 10 million tokens or
however many dollars with whatever
platform. Now, let's see who's the best.
>> Yeah. I want to do like a cooking
reality show where it's like the secret
ingredient today is this data set and
you have four hours to create a
mathsolving robot. I I don't know. My
tastes are pretty specific. I guess I'd
watch that. So I'd watch that. I think
the last thing I want to address before
we move on to the next topic is you know
I was thinking about IMO and you know
one way I think about IMO is it is it is
an eval right um and uh one thing I I
occurred to me is like a lot of people
spend a lot of time getting the
international math olympia test together
each year um and it's a really expensive
eval to build but it is as a result kind
of like gold standard in some ways right
like I think the reason deep mind and
openi are here is that it seems to be a
really strong test of the technology.
Um, and I was talking with a friend
recently about this and just wanted to
kind of test this group on it is it
feels like as capabilities expand like
trying to get good evals is going to get
more and more expensive, right? Like you
want to know whether or not it can do
expert graduate level math. Well, you
know, it's a little bit different from
getting like, you know, a bunch of
simple arithmetic problems together. Um,
and so I guess I'm curious if if
anyone's kind of seeing that in their
work, like seeing basically like eval
become like more and more expensive for
us to produce. um in a way that I think
introduces some new problems, right? Is
like how much time do you need in order
to kind of get like an expert level
evaluation, you know, the the number of
humans that you need to put together
that kind of eval gets more and more
limited. I guess Gabe, you're smiling.
Do you want to I don't know if you want
to respond to that?
>> Well, I mean, I think it's a really
interesting point you're you're you're
putting at pointing at here. uh is just
that the more the the AIs try to tackle
problems that are already specialized to
a very small subset of humans, the
harder it is to actually have a rigorous
evaluation because statistics stop
really applying in standard like Gausian
distributions of people that would solve
this thing, right? Like if there's a
small handful of humans that can solve
this, that's not a very well formulated
statistical distribution to say, hey,
look, you've got this score with this
variance. Like that math just doesn't
apply. And so um you know I think one
thing that has I this is gonna be a very
nonsp scientific answer but I feel like
we are going to hit a point with these
models in general sort of uh I've made
this analogy before but there was a time
when we all probably were in our
learning phases where it was verboten to
site uh a Wikipedia article in a
research paper like you can't site
Wikipedia because you just cannot trust
its validity and eventually that just
kind of eroded, right? Like everyone
just was like, "Well, yes, technically
there's there's some error bars in my
mind if I see a Wikipedia citation." But
I have pretty darn high confidence that
I can go over to Wikipedia, read the
article, and then maybe click through to
their citations and say, "Yep, the
article is accurate. Okay, we're fine. I
can just skip that second part. I can
probably trust the Wikipedia article."
And I think we're going to probably
start hitting that, you know, the higher
up we go. And in some ways, to your
question, the more specific and narrow
the the population that could solve this
as a human becomes, the more we're going
to have to start relying on the just I'm
going to choose to trust the model here,
right? Like it's it's built enough cred
on other things that I trust that in my
mind that translates to probably
believing that it's good at this thing.
Um because it just becomes much much
harder to apply rigorous statistics and
evaluations to a much smaller population
of data.
>> Yeah, for sure. There'll almost be like
a hypothetical math or like big if true
math where you're like, well,
>> the model has produced all these proofs
and no one's really verified whether or
not it's right, but if it were, then
this is next step. And there are I mean
there are plenty of techniques in math
and again it's been a long time since my
mathematician days so I'm not going to
use the right words here but uh you know
there are a lot of techniques where you
do like validation against one another
where nei neither is a source of truth
but you have a way of cross validating
uh and I imagine we'll get to that point
right where you've got a model that is
trying to tackle problems that only a
small handful of folks can do and so
rather than trying to produce a rigorous
benchmark for those tasks you instead
say hey expert that could eval do this
on your own evaluate what the AI did and
now we've got a small sample size and in
some ways that is what the uh Olympiad
is doing here. You've got probably a
small subsample of expert judges that
are able to actually qualify what these
mathematicians are doing. Um, and so
you're, and this is already kind of an
example of that where you're not using
sort of a standard benchmarking
approach, but instead you're using
expert judges to evaluate what the model
has done. And those judges are
theoretically fallible as well. But, uh,
especially as you push further into the
frontier, you just kind of have to trust
the humans and then the humans have to
work with the AIs to trust each other.
>> Um, I'm going to move us on to our next
topic. A lot to discuss here. I'm sure
now they they've gotten gold, you know,
now they have to find a new AV out to
do. Uh so we'll we keep tracking this
into the next year. Uh second topic I
want to get to is uh maybe what is the
big kind of product announcement uh of
the week uh which is that after a long
period of speculation and rumors, OpenAI
has finally released chat GPT agent. Um,
and so this is a feature whereby you can
ask things into the model and it has
kind of full-on agentic behavior as its
own little browser. It can do all sorts
of things for you. Um, it's only I think
my understanding it's only available at
like the Lux tier. Um, and uh, I guess
Mihi maybe I'll toss it to you because I
know you work with agents day in day
out. Have you played with it? Any
impressions, strengths, weaknesses? Just
kind of curious about your quick kind of
off-the cuff review. And you probably
also know I have the lux steer for um
both GGP and clothes and all these
things and now I'm in trouble probably
>> because when I see the bill at the end
of the month uh it's getting quite
expensive especially in Europe. Uh that
said um I don't think this is
necessarily a new thing. I think Chad
GBT has had these kind of agents
internally and many of the tools that it
was using were agentic in nature. So for
example it used the code execution
sandbox. So if you say for example write
me a build me a diagram it internally
generated some Python code inside an
execution sandbox. It used mplot lib it
generated u that kind of thing and it
then gave you that diagram. Uh this just
increases the number of tools it makes
available. It makes them a lot more
customizable. It also gives the same
kind of tooling support that your deep
researchers used to support with
internet browsing and web browsing and
that virtual uh computer use idea. So I
think from that perspective is building
on the same concepts it used to have
before. Um, but the market is pushing
towards Agentic, especially with
Entropic releasing model context
protocol. Everybody building agents in
the open source community. I think
they're seeing the need to give their
platform the same firstear experience
for uh for an agentic system.
>> Yeah. K, do you do you agree with that?
I mean, I guess a cynical view of Mihi,
what you just said is, well, this is
kind of an incremental improvement. Uh,
really we should see this as marketing
more than anything else. Is is that the
right way of thinking about this? I
definitely agree with that from a
technology perspective, like an
algorithmic perspective of what's going
on. But I do think this is a tremendous
leap forward in a UI and user
interaction with AI perspective because
it at least from what I can tell this
seems to be the first major like
asynchronous workflow enablement with
agents and uh users for OpenAI. So, a
lot of their their marketing materials,
I haven't played with it, but just, you
know, reading about it and watching some
demos really heavily focus on start a
task and then close your laptop, walk
away, the agent's going to run, do
different things for you, and you can
come back, you know, whenever it's done,
having made more productive use of your
time elsewhere. So, they're really seem
to be heavily indexing on this kind of
asynchronous deal, which is kind of the
first I've seen, and I think it's long
overdue. Like, we've been waiting for
this for a while. So really excited to
see uh some of that start to come to
fruition. Um I do wonder a lot like they
talk about all this, you know, value of
being able to walk away, but then they
also say don't worry, you know, if the
agent's going to do anything. It's going
to ask you for permission. You're going
to have to, you know, enter your
credentials yourself. And so it's kind
of like, okay, well, what how much can
it do on its own if I'm also not
trusting it to go and do a bunch of
stuff without me giving the approval
every step of the way. So I'm curious to
see how some of that plays out. But I I
do think from a a UX perspective and a
just interaction it is a very
interesting leap forward.
>> Yeah. Remind me reminds me actually of
these kind of stories from the early
days of PCs where like people had these
PCs and they're like what do we what do
we do with this? And so for a period of
time they're like oh you can use it to
keep recipes like this like they're
clearly trying to invent and like
there's an effort to have to like teach
people what what they do like
>> that's right. Yeah. Yeah, like K, you're
almost kind of describing like a very
similar situation where it's like
agents. Okay, that's really cool. What
what now? What? How am I supposed to do
this? And it is kind of funny to me that
like the the big kind of UX part of this
is actually you just can walk away from
your computer. Like it's actually like
the UX is no UX, I guess, in some sense.
Um yeah. Uh how far do you think that's
going to go? Do you think the public is
ready to trust these models in this way?
>> I think it's going to be a really uh
engaging and interesting consumerf
facing tool. I do not see this being
ready for any sort of enterprise
deployment. I mean, I I mentioned
already the security and that's just for
like trusting the agent with my open
table login to be able to make
reservations where there's a
non-refundable deposit. Like those are
very small stakes uh depending on the
restaurant, I guess. But I, you know, I
I think we're still really far off and
there's so much to work out from a
security perspective uh to get this into
enterprise use. I mean it feels like
it's a little bit delicate because I
don't know if you'd agree with this case
like the two worlds are connected right
like you can imagine that like not just
open AI but like a number of these
companies don't get the consumer
experience right and like I think
everybody's general impression is well
if it can't even book a restaurant
reservation I'm definitely not going to
use it for this like and do you think
there's actually some risk that if we
don't do this agent thing well on the
consumer side it actually almost almost
like closes off the pass on the
enterprise side I
>> I think the pattern that we've seen emer
emerged so far and that will continue to
be true is iterate and work on getting
the workflow right for consumer and
flush out all the bugs, flush out all
the kinks, then bring it to enterprise.
And I think OpenAI is following that
exact same playbook here. So they're
going to figure out the kinks, they're
going to iterate and evolve while it's
low, relatively low stakes, you know,
planning an itinerary, minor purchases
at the grocery store, that type of
thing. Um and you know hopefully that
that will give them some experience to
learn of what's uh what could go wrong
when and then uh be able to address that
for when the risks are far greater. I
don't know how their incentives if
they'll perfectly align where they're
trying to go again they're they've
always strayed a little bit more towards
the AGI at all costs versus focus on
making really really great enterprise
specific tools. So I think there's going
to be a little bit of friction between
those two priorities with OpenAI that
other companies might not um might have
kind of a clearer alignment of trying to
get faster to the enterprise readiness
for even if it means going a little bit
slower on the more general purpose
intelligence frontier. Um so I think
that's going to be a big question is can
other providers get to enterprise ready
faster?
>> Gabe, maybe I'll end with you. What's
your trust level with agents? Like I
don't know if you've played with chatbt
agent like for any agent like what's
what's the most important thing you
would trust it with?
>> Um yeah great question. Um I am very
trusting as long as there are no stakes
which is to say I'm not very trusting at
all. Uh so no I I um I am very happy to
uh experiment, try things out and
generally use uh an agentic system
anytime it could accelerate what I'm
doing as long as I am the final arbiter
of the output. Um and that's just sort
of my comfort level with using these
tools for my own personal use. Um I
think uh eventually if and when they
keep getting better at those things I'll
gradually step it up. But I'm a fairly
uh slow adopter of things that that just
have magic behind them. Uh and I suspect
that's true of a lot of people that want
to know how things work. Um but I do
think you know I I I want to second what
both Mahai and Kate said about this is
that on the one hand this is an
incremental technology change and on the
other hand uh this is a major step
function in UX. Uh to me it brings to
mind the the difference between GPT3 and
chat GPT. Fundamentally the technology
was all there in GPT3 and it was
literally just the UX of the uh
instruction tuning that went into chat
GPT that made it explode. So even though
you know these this these agentic
patterns of uh tool usage and uh even
you know long-term inference scaling uh
with deep research all of these things
have existed and in fact the individual
tools the building blocks have all been
there. I think the idea of a single
entry point that doesn't require the
user to have knowledge of what tools are
appropriate for the task and the idea of
something that interacts more in the way
you would with like a colleague where
you delegate a task or you uh you just
hand something off and and sort of wait
for feedback and have potentially you
know a mechanism for interactive
updating you know Kate to your point I
can imagine my phone pinging me to say
hey my agent needs uh permission to do
this do you want to give permission um
yeah I'm happy to say oh cool like I'll
be interrupted for a quick context
switch to say, "Oh, yeah, this looks
good. Check. Go ahead. Keep keep going."
I think the UX pattern is really going
to change uh with this to be the central
agent entry point that uh hopefully will
make this much more accessible to folks
to help build that trust in these
systems.
All right, I'm going to move us on to uh
our third segment of today. Um, Mihi, I
think because we've gotten you on the
show, we want to give you the
opportunity to plug plug your project,
but just to kind of quickly set up the
context. Um, set up the context. Um, uh,
we've talked a lot about MCP, uh, over
the course of many shows ate. Um, and I
think one of the reasons I like the
topic is that's really this fascinating
question of how new standards are going
to emerge in the space and how adoption
occurs in open technologies. Um and so
Miha, you've been working on a specific
project I understand which is called MCP
gateway. Um do you want to give our
listeners just a quick like sort of
overview of what it is and why you think
it's important?
>> Yeah, sure. So first let me give you a
bit of um idea of how the project
started. Uh we like to treat AI agents
as insider threats. So every time an AI
agent interacts with the system through
a tool, we believe that's actually a
potential insider threat because it is
an input. You're giving it a text. that
text goes to your tool and if that tool
just happily executes the input from the
user that can drop your database that
can delete a database record that can
delete bits of your code. Um so we
wanted to have a way to provide
observability guardrails monitoring
security authentication authorization
um even things like user impersonation
for saying hey do you want to access
this and then it goes on your behalf and
we've been building a similar system for
the last year year and a half but we
haven't done something very important we
didn't go open source with it and said
hey we believe this is the standard for
how an agent should interact with tools
and tropically and they did a great job
with the model context protocol
and they've released it as a standard
way to decouple your AI agent from your
tools. However, there's a couple of
interesting things in the mix here,
which is the protocol came out. There's
already version four of it. There was a
draft. Uh there's multiple
implementations. There's like 15,000
open source servers somewhere in the
community, all implementing different
standards, different versions, uh
incomplete implementations of the MCP
protocol. Some of them don't have things
like authorization and authentication
and accounting and all the other things.
So we've created the MCP gateway and
registry as an open source project which
gives you the ability to first federate
multiple servers in the same gateway. So
if you have for example resources
prompts tools from multiple servers you
can combine them into a virtual server
with its own authentication with its own
authorization with retry mechanisms with
observability with monitoring with rate
limits with health checks and a plug-in
system. So you can plug in for example
pre and post hooks for every operation.
Before a user input you could trigger
open policy agent or you can trigger a
PII filter or after a specific input you
can again do the same thing. So it's
meant to be as a centralized point that
can give you control over your context
whether that is going to be tools
resources prompts but also a mechanism
that lets you convert between different
protocols. So if your tools aren't
already written as an MCB server, maybe
you have a REST API, you can connect
that REST API to the gateway. It will
turn it into an MCB server and give you
the same control over it.
>> And so is it right to say I mean it
seems like part of the project is I
don't know if this is putting it in too
simply, but it almost kind of feels like
MCP is a standard, but we kind of know
that the world is going to be really
really messy when when we actually put
MCP into action. And so what you're
attempting to do is like give people a
lever on kind of controlling that
craziness, right? Like that there's
going to be just so much variance that
you will want kind of checks at every
step. Is that the right way of thinking
about it? I
>> I think to some extent, yes. We also
want to give a point where you can plug
in your plugins. You can add your own
spin to control or to even change the
input and the output going to these MCB
servers.
>> So Gabe, you think a lot about open
protocols is my understanding. Um, do
you want to think about or do you want
to kind of respond to kind of Mihi's
project? I'm sort of interested in your
take on like on the backdrop of
everything we know about open source
open tech. Is it just the right time
like do we see MCP gateway type projects
in other domains, right? Like how does
this all look kind of situated in
history?
>> Uh, yeah. I mean, I think this is an
excellent time for a project like this
because we're at this this blossoming of
a new open- source standard for a novel
interaction pattern, uh, which in this
case is fetching context for your AI
models. And um if you think about
historically how other protocols have
emerged and the technology the
implementations that have arisen around
those how much bad HTML is out there on
the internet right uh and you know that
put the onus on the browsers to be able
to just be wildly robust to all the
things that could go wrong. um you know
how many HTTP servers out there that
occasionally just don't return
occasionally just randomly spit out a
500 or uh you know re even return some
malformed you know stream of packets
that everything on the client side has
to be robust to and so to your point
Mihi like we are very much in these
early days and I imagine what we'll see
just like with HTTP servers is that for
every favorite programming language
we'll eventually the deacto standard
community edition and potentially
enterprise edition of the MCP server and
client library emerge probably one or
two small handful of them that have
their passionate followers and their
their differentiators relative to one
another and there will start to be some
you know coalescence because people will
stop being interested in implementing
the server layer of the MCP and they'll
start being interested in what's sitting
behind it but I think for a while right
now we're going to be in the wild west
of actually getting you know the the
bits to flow correctly uh and staying on
top of the spec as it evolves cuz also
with anything like this you know uh
there's sort of an exponential decay in
the volatility of the spec itself uh I
think Mihi you pointed out in the blog
post you wrote that um you know they've
already deprecated one of the primary
transport protocols for MCP so SSSE is
going away that's server sentent events
um and being replaced by streamable HTTP
but so many people have already
implemented their streaming MCP servers
on top of SSE like huh what do we do
with that? So I think having a piece of
for lack of a better word middleware in
the MCP domain where you can actually
coers standards and isol uh basically
manage the chaos in a single central
place uh is an excellent both
implementation tool for engineers trying
to build this ecosystem themselves. Um,
and you know, a good thing to maybe help
shake out some of these uh
inconsistencies across implementations
where we can actually start to clearly
identify, hey, look, every time somebody
attaches an MCP server written with this
random MCP library in Elixir, um, it
turns out that I have, you know, we have
to enable all this glue in the gateway
to make it work. So maybe we should not
use that one or, you know, get that
community to step up their game and, you
know, etc., etc. So I think I think
there could be some really strong
benefits both at the leaves of this
graph, right, where the developers are
trying to build things and at the the
connective tissue of this graph to start
isolating patterns.
>> Okay, I'll give you the last word here.
Um, it strikes me that this topic
actually weirdly goes back to what we
were talking about a moment ago with
chat GBT agent. Um, which is that like
sometimes the fault of the agent failing
is that like the model wasn't smart
enough or the agent wasn't smart enough.
Um, sometimes it's just because like the
internet's really messy and like the
technical implementation is really bad.
Um, and curious about how you think
about that from like a UIUX standpoint.
You know, we're talking a lot about like
trust and it's like almost like the
agent's going to take the blame, I
think, for all of this messiness if we
don't find a way around it. It seems
like
>> Yeah. No, absolutely. I think this
speaks to a much bigger trend which is
obviously 2025 you're the agents
everyone's really excited what agents
can do but we're finding that
maintaining these agents and maintaining
these systems is really difficult
especially given how quickly
everything's moving and there's so much
performance that's tied up as you said
Tim like in how this is all architected
and built that's beyond the pure LLM
weights that's behind the scenes I mean
that's obviously going dictate
performance too. And so I think Mihi,
this is a great example of emerging
classes of middleware that augment
existing agent frameworks and protocols
in order to better improve how we
scalably build agents, how we maintain
these, how do we build these in a secure
manner. And I think this is just the tip
of the iceberg of this kind of class of
projects that's going to have to emerge
if we're going to really deploy these
and maintain them for for the future.
>> Well, Mihi, any final thoughts? And if
people want to learn more about the
project, where where should they go?
>> Well, look, um, if you want to learn
more about the project, go to
github.com/ibbmcp-context-forge
and you'll be able to get started with
our MCP gateway implementation. Um, if
you want to contribute, we also have a
detailed road map as well as an issues
page where you can bring in new features
or say, I'm missing this feature or I'm
having this issue. Uh I think Kate to
your point uh decoupling this logic from
the agent and letting a piece of AI
middleware handle things like your retry
logic and all of the filtering is going
to simplify your agentic framework as
well. And there's going to be a lot of
duplication between all of these agentic
frameworks longchain long graph autogenb
IBM. All of these frameworks have their
own specific way of doing things. But if
you manage to decouple the logic then
you're going to be able to consolidate
that work into one one library. All
right. Well, I think that's the time
that we have for this panel today. Uh,
Kate, Gabe, always good to see both of
you. Um, and Mihi, we hope to have you
back on the show sometime.
>> Anytime. Thank you all.
>> All right. Thanks everybody. We're going
to go ahead and move to Ryan.
>> So, today we've got Ryan Hagaman joining
us. He's the global AI policy issue
lead. Uh, Ryan, welcome to Moe.
>> No, great to be here. Thanks for having
me, Tim. So, uh, I know we want to
scramble this segment because this news
just broke yesterday, which is that the
White House has released its AI action
plan, which is sort of a national
strategy for artificial intelligence.
Um, and I know there's a lot going on. I
was looking at the document this
morning, just like a lot of different
recommendations. Do you want to just
walk us through like what is this
exactly and and is it important? Like
should we be paying attention to it?
>> Yeah, sure. I mean, I'll walk you
backwards. The answer to the second
question is absolutely. Yes. Very. Uh
this is something that not only IBM but
pretty much everyone in industry and
here in DC and frankly around the world
has been anticipating and looking
forward to for basically the last 6
months uh since it was originally
announced. It's in a way yeah this is
this is policy Super Bowl for policy
nerds and policy wonks. Um but the short
version is what this document basically
does is it lays out the Trump
administration's policy agenda as it
relates to artificial intelligence. And
part of the reason I think you haven't
seen so much action uh in Congress is
this is also the starter pistol for
legislative action in the future.
Probably not by the end of this year but
probably moving into next year. Uh but
what is the plan? Uh I mean in short the
plan is basically how the administration
wants this Congress and the existing
agency apparatus in DC to approach
thinking about AI. So there's something
on the order of about 134 135 individual
actions that are recommended agencies
take. Um this plan was also accompanied
by three executive orders from the
president which he signed yesterday
which provides a little bit more clarity
on what exactly some of the agencies are
supposed to do with respect to some of
the more important features of the plan.
But basically the plan as outlined goes
a little bit something like this.
There's three pillars. There's
accelerating AI innovation. Uh there's
building out American AI infrastructure.
And then there's leading in
international AI diplomacy and security.
And what this basically boils down to
for IBM is a lot of positive momentum.
Uh I think the big thing for us is there
was a specific call out on the value and
the importance of open-source and openw
weight model development and deployment
which is frankly something that we've
been asking for from the administration
and from Congress. It's one of our major
pillars that we advocate for here in DC.
The need for policy makers to make sure
that the open space and the open
community kind of remains handsoff from
policy makers. That's like a big shift,
right? It kind of feels like for a
little bit there there was sort of a
debate over, well, we've got these
powerful new technologies. Is it kind of
unsafe for them to be open? And it feels
like here they're kind of very much
affirmatively saying no, like we want
open. We actually think that's like a
really important thing to happen. So it
feels like almost like there's there's
kind of a a shift in that discussion and
you know it's landing very firmly on the
side of open which I I think I'm
personally very excited about.
>> Yeah. I mean me too and I think a lot of
us at IBM would view that as a very
positive development. I would say where
it kind of moved us from was a space of
uncertainty because no one really knew
how the administration was going to come
down on this. The last administration
never really took a super strong stand.
There was a great report from the
Department of Commerce that looked at
open model weight development. Um,
didn't make any, you know, super strong
statements, but it also didn't say open
source bad, right? Which was kind of a
win at the time because of the
uncertainty. So, it's a little bit of
sea change shift if only from moving
from neutral to positive. Um but it's a
big positive and it's a big you know
signal for us in industry that kind of
this direction that we've been taking is
now essentially getting a little bit of
kudos from policy makers which will be
good for the next few years of
administrative action.
>> And so you mentioned that there was
these uh executive orders that were
signed. I think one of them had to do
with like energy. I know IBM we all
frequently on we're talking about the
compute buildout and infrastructure side
of this. Sounds like there's been kind
of like a a kind of clear path to really
build a lot more. It seems like
>> yeah I mean that was the gist of the not
only the section on energy and data
center buildout in the plan but in the
executive order as well and it basically
it doesn't touch IBM so much because
we're not really in the data center
buildout game but we definitely benefit
from more data centers right more you
know energy capacity on the grid that
executive order basically just
streamlines a lot of regulatory approval
processes it gets kind of in the weed
references a lot of different statutory
authorities and existing legislation
Um, but the short version of it is
basically to the extent that there are
opportunities for us to be building more
capacity on the grid, to be building
more data centers, the federal
government shouldn't be standing in the
way of that, they should be finding
opportunities to help expedite that and
make it happen quicker because, you
know, America's going to need a lot more
energy if we're going to be doing a lot
more LLMs in the future.
>> Yeah, for sure. Uh, anything else that
uh, you were pleased to see? It sounds
like you focused kind of a lot on the
open side. anything else that you think
kind of is in this action plan that like
there's there's stuff that folks should
focus on or or even kind of pick up the
PDF and read.
>> Yeah, I mean too much to mention in a
couple minutes, but I do think the the
one other thing that really struck uh
struck a chord with me was under the AI
diplomacy bucket. Um there's essentially
a dictate for the department of commerce
to figure out uh with partners in
industry how to kind of create a larger
export package of a full American AI
tech stack, right? So everything from
data center buildout to you know model
developers, model deployers, get
everyone together in an industry
consortium essentially package that all
together so that the department of
commerce can then use those essentially
export packages to push out American
tech to the rest of the world. And
something that we've said a lot that
this administration um can do in AI is
really promote that idea of exporting
American AI and American technology. So,
you know, that's an opportunity for IBM.
We'll see what comes of it. Um, but that
I would say is the other really big call
out here that really IBM, I think, and I
hope stands to benefit from over the
next year or two.
>> That's great. Well, I know we just have
a few minutes left. Um, action plan is
out. Executive orders have been signed.
What comes next? Are we I assume you're
not saying, "Well, we're all done with
this AI thing here in DC."
Yeah, I mean I I kind of wish I could
take vacation for the next rest of the
year, but I think the reality is there's
a lot of requests for information that
are going to be, you know, this the AI
action plan basically says here's what
we're going to do. Now they actually
have to do it, which means yeah, we've
got a lot of uh responses to provide.
We've got a lot more education uh and
engagements to do on the hill as they
think about legislative package packages
to help, you know, make some of this a
reality. So, you know, like I said at
the outset, this is really just the
starter pistol for the start of the
race. Um, and I think now it's really
heads down and books open and pens and
paper in hand. We just got to get to
work doing it.
>> Well, Ryan, look, I know there's a lot
going on. Uh, we'll have to have you
back on the show as this all kind of
unfolds. Um, it'd be really good to have
your voice in here as we kind of track
what's happening uh in DC on all this
stuff.
>> No, always always happy to stop by for a
chat.
>> Cool. Thanks, Ryan.
>> Yeah, thank you.
>> Thanks for joining us, listeners. If you
enjoyed what you heard, you can get us
on Apple Podcasts, Spotify, and podcast
platforms everywhere. And we're going to
see you next week on Mixture of Experts.
[Music]