MACE Framework: Assessing Agentic AI Tools
Key Points
- Manis AAI launched in March 2025 with hype that outpaced its early performance, leading to reliability, cost, and token‑usage complaints until the platform began stabilizing around June‑July.
- The speaker highlights a broader challenge in AI: naming and categorising capabilities is difficult because the technology is highly general‑purpose, yet clear terminology is essential for practical work.
- To address this, a new “MACE” framework is proposed for evaluating agentic AI tools, comprising four dimensions—Modality, Autonomy, Complexity, and Environment.
- The framework’s first two dimensions are explained: Modality (e.g., text, coding, workflow, research, multimodal—Manis falls in the last category) and Autonomy (ranging from reactive prompt‑responses to fully autonomous agents), providing a structured way to discuss and compare AI agents.
Sections
- Proposing a Framework for Agentic AI - After recounting Manis AAI’s rocky launch and reliability issues, the speaker introduces a new framework to name, categorize, and assess agentic AI tools.
- Classifying Modern AI Agent Capabilities - The passage contrasts basic step‑by‑step models like Claude and ChatGPT with more advanced, multi‑agent systems that support sequential, branching, and dynamic replanning tasks across cloud, IDE, and platform runtimes, proposing a framework based on complexity, execution environment, autonomy, and modality to map current AI agents.
- Balancing Autonomous AI and Human Collaboration - It highlights the rise of fully autonomous AI agents like Manis, the challenges of cost and task selection, and the importance of hybrid workflows where humans intervene.
- Enterprise AI Agent Challenges - The speaker outlines the major technical hurdles for scaling AI agents in enterprises—including tool selection and fallback, memory and long‑context management, cross‑modal context handling, token‑cost optimization, and designing robust error‑recovery decision trees.
- Scaling Multi-Agent Orchestration Amid Ambiguous Intent - The speaker discusses the difficulty of interpreting vague user requests while ensuring privacy, compliance, and scalability in enterprise‑level multi‑agent orchestration platforms like Manis.
- Manis: Unique Enterprise Agent Platform - The speaker explains that Manis stands apart from other AI agents—such as ChatGPT, Claude, and Google offerings—by tackling hard engineering challenges to deliver reliable, scalable, enterprise‑focused intelligence through its MACE framework.
- AI-Driven Process Mapping & Prototyping - The speaker explains how the AI tool Manis enables operations teams and consultants to rapidly map workflows, generate documentation, and build technical proof‑of‑concepts, delivering fast, low‑cost first drafts that save weeks of manual effort and cut expenses by up to 90 %.
- Manis AI Agent Launch Forecast - The speaker predicts major AI companies will soon release a Manis‑type agent, highlighting its ability to dramatically cut costs on high‑price specialized tasks and generate new revenue streams for model providers.
Full Transcript
# MACE Framework: Assessing Agentic AI Tools **Source:** [https://www.youtube.com/watch?v=8m2-WKhidYk](https://www.youtube.com/watch?v=8m2-WKhidYk) **Duration:** 00:27:02 ## Summary - Manis AAI launched in March 2025 with hype that outpaced its early performance, leading to reliability, cost, and token‑usage complaints until the platform began stabilizing around June‑July. - The speaker highlights a broader challenge in AI: naming and categorising capabilities is difficult because the technology is highly general‑purpose, yet clear terminology is essential for practical work. - To address this, a new “MACE” framework is proposed for evaluating agentic AI tools, comprising four dimensions—Modality, Autonomy, Complexity, and Environment. - The framework’s first two dimensions are explained: Modality (e.g., text, coding, workflow, research, multimodal—Manis falls in the last category) and Autonomy (ranging from reactive prompt‑responses to fully autonomous agents), providing a structured way to discuss and compare AI agents. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8m2-WKhidYk&t=0s) **Proposing a Framework for Agentic AI** - After recounting Manis AAI’s rocky launch and reliability issues, the speaker introduces a new framework to name, categorize, and assess agentic AI tools. - [00:03:30](https://www.youtube.com/watch?v=8m2-WKhidYk&t=210s) **Classifying Modern AI Agent Capabilities** - The passage contrasts basic step‑by‑step models like Claude and ChatGPT with more advanced, multi‑agent systems that support sequential, branching, and dynamic replanning tasks across cloud, IDE, and platform runtimes, proposing a framework based on complexity, execution environment, autonomy, and modality to map current AI agents. - [00:06:52](https://www.youtube.com/watch?v=8m2-WKhidYk&t=412s) **Balancing Autonomous AI and Human Collaboration** - It highlights the rise of fully autonomous AI agents like Manis, the challenges of cost and task selection, and the importance of hybrid workflows where humans intervene. - [00:10:50](https://www.youtube.com/watch?v=8m2-WKhidYk&t=650s) **Enterprise AI Agent Challenges** - The speaker outlines the major technical hurdles for scaling AI agents in enterprises—including tool selection and fallback, memory and long‑context management, cross‑modal context handling, token‑cost optimization, and designing robust error‑recovery decision trees. - [00:14:10](https://www.youtube.com/watch?v=8m2-WKhidYk&t=850s) **Scaling Multi-Agent Orchestration Amid Ambiguous Intent** - The speaker discusses the difficulty of interpreting vague user requests while ensuring privacy, compliance, and scalability in enterprise‑level multi‑agent orchestration platforms like Manis. - [00:17:37](https://www.youtube.com/watch?v=8m2-WKhidYk&t=1057s) **Manis: Unique Enterprise Agent Platform** - The speaker explains that Manis stands apart from other AI agents—such as ChatGPT, Claude, and Google offerings—by tackling hard engineering challenges to deliver reliable, scalable, enterprise‑focused intelligence through its MACE framework. - [00:21:31](https://www.youtube.com/watch?v=8m2-WKhidYk&t=1291s) **AI-Driven Process Mapping & Prototyping** - The speaker explains how the AI tool Manis enables operations teams and consultants to rapidly map workflows, generate documentation, and build technical proof‑of‑concepts, delivering fast, low‑cost first drafts that save weeks of manual effort and cut expenses by up to 90 %. - [00:25:17](https://www.youtube.com/watch?v=8m2-WKhidYk&t=1517s) **Manis AI Agent Launch Forecast** - The speaker predicts major AI companies will soon release a Manis‑type agent, highlighting its ability to dramatically cut costs on high‑price specialized tasks and generate new revenue streams for model providers. ## Full Transcript
Manis AAI launched in March of 2025 and
I didn't talk about it very much and the
reason why is that it was another one of
those cases like Devon where the hype
video ran way ahead of what people in
practice were actually able to do. And
so Reddit forums filled up and Twitter
complaint conversation started. And the
long and the short of it was that after
launch in March through about June or
July, there were a lot of issues with
reliability, with cost, with token
consumption clarity. That is starting to
shift. It is shifting enough and the
platform is stabilizing enough that I
think it's worth having a wider
conversation. But before we do that, I
want to talk about what I think is
actually one of the key challenges when
we think and talk about AI and agents in
particular. naming things. It is really
hard to name an AI capability because AI
is such a slippery technology. It's
general purpose. It can do anything. And
so naming and categorizing what these
different things do becomes both really
important to get work done and also not
at all obvious like it's not clear. And
so before we dive into the capabilities
of Manis itself and kind of why I think
the platform is stabilizing and use
cases you can use. I want to take a
second to talk about a proposed
framework for how we assess Agentic AI
tools. As far as I know, we haven't
really had a good framework for this.
That's why I'm proposing one. I want to
go through it. Tell me where it's wrong.
Tell me where it's better. Let's dive
in. I'm calling this the MACE framework.
Mac stands for modality, autonomy,
complexity, and environment. I think
those four things are all dimensions
that we need to assess agentic AI tools
on and that we've really lacked the
language for assessing them on
previously. Let's dive into each of
these. Number one, what is the primary
modality of this tool? And there's at
least five different things you can look
at there. Text agents. Examples of that
would be Claude, Chat, GPT, Gemini. They
generate, they analyze text. Number two,
coding agents. Cursor, GitHub,
Claude artifacts. Number three, workflow
agents. NAN, Zapier, Make, Langchain,
etc. Number four, research agents like
Deep Research or Perplexity.
And number five, multimodal agents.
Mannis falls into that category.
There are probably other primary
modalities, but you get the idea, right?
It's basically what is the primary mode
of this agent becomes a relevant thing.
Number two, a autonomy. What is the
degree of proactive autonomy that this
agent brings to the table? It can be
reactive, so it responds to individual
prompts again like claude or chat GPT in
the text window. It can be interactive,
so it might be multi-turn with human
guidance. You have that sometimes when
uh deep research comes back and asks you
a question. It can be semiautonomous. So
it might execute plans with checkpoints
and an example of that GitHub copilot
workspace right like it will come back
and ask you along the way or it could be
completely autonomous endto-end
execution very minimal intervention uh
and manis and devon are both in that
category. All right. What is the C in
mace? Complexity. Complexity handling.
It can handle simple tasks step by step.
Some of the non-reasoning models with
claude and chat GPT fall in this
category. It can handle sequential
multistep. I would argue that claude
code is a good example of sequential
multistep. It might handle branching
which is more complex. Good naden
workflows will do that. Or it might do
dynamic replanning based on the results
of what it sees. Manis does that and
more advanced agent configurations can
do that as well. You can set up claude
with multiple agents to do that in cloud
code for example.
What is the E execution execution
environment? Is it cloud contained? So
it runs on the provider sandbox
cloud both do that in their application
interfaces. Is it integrated into your
IDE? So it works within the development
environment inside cursor for example.
Is it platform hosted with dedicated
agent runtime? NADN can be that. It
doesn't have to be that depending on how
you configure it. Is it infrastructure
spanning? Can it deploy or access
different external systems and use
complex tools? Manis can do that. You
can configure cloud code to do that as
well. And so I when you look across
this, it's easy for you to say, well
Nate, you just said a bunch of things,
right? You said it needs to have
complexity def defined, execution
environment defined, autonomy defined,
the modality defined. That's all well
and good, but what do we know about the
current generation of AI agents and
where they would fit? I want to suggest
that we have at least six different
categories, practical categories of AI
agents out there today that sort of fit
within this broader spectrum of use
cases.
The first one, it's the simplest, right?
The conversational generators. Uh, Chad,
GBT, Claudet, Gemini all come to mind.
Deep See comes to mind. You use them
when you need high quality text
generation back fundamentally.
The second class is coding assistance.
When you need to write code and you have
a feedback loop for it, depending on how
you configure it, cloud code is a great
example here. Cursor does this, wind
surf does this, etc.,
you can't use these when you need sort
of broader system orchestration unless
you're going to configure them
specially. And so I think that the
exception to a lot of this is claude
code because it's such a malleable tool.
And so that's why it appears in a couple
of these. But I think that code
assistant is the good vanilla or generic
use case for cloud code. Third class of
agent categories, workflow orchestrators
and zap year, make all fall in here. So
you're connecting known systems, you've
got predictable data flows. You may have
trouble with ambiguous inputs. These
systems tend to be somewhat brittle.
Number four, research synthesizer
agents. So deep research works here.
Perplexity has a deep research function
and you can also use claude in its deep
research function. You put opus 4.1 on
there. You have it search the web and
think hard.
You need current information compiled.
You need it analyzed. You need to act on
findings or integrate them with systems.
Typically I find that the acting part is
a problem with these, right? Like if you
need to actually take an action rather
than just read, don't use these. But if
you just need to develop the
information, it needs to be very high
quality. Research synthesizers are
really, really good and people are using
them for those use cases. Now, number
five, autonomous execution agents.
Menace and Devon obviously go in here uh
and there are custom agents that also
work this way. There are people who are
running claude code continuously, for
example, and so it's been configured
specially to be an autonomous execution
agent.
More and more energy is going into this
category number five. That's part of why
I've called out Manis is because I think
it is a flagship toward a wider future
of autonomous AI execution and it it is
worth paying attention to on that basis
because the world is going to look more
like Manis in the future.
The challenge is managing the cost and
managing the complexity. You have to
know what kind of tasks you want to
entrust to an agent that is that
complex. The sixth category, hybrid
collaboration. And so there are a lot of
these where like you want it to come
back and talk to you. You want it to
engage with you. I would say cursor
composer is a great example of this
where like there's some degree of human
judgment. There's AI capability. I feel
like Andre Carpathy has done a great job
talking about that nuanced human
collaboration piece that happens with
good agent workflows. One of the things
he emphasized uh in a tweet a few weeks
ago is that as we build these AI agents,
probably too much focus right now is
going into bucket five with autonomous
execution. And we are sometimes missing
the realization that we need to have the
smart time for the human to touch the
model or for the human to touch the work
because humans can bring tremendous
value especially seasoned experienced
humans who have domain knowledge. And it
is critical to give human space to do
that. Well, all right, those are six
examples. You've got that mace framework
in your head. We've talked about how
these different agents all bucket
together. I hope you've gotten a better
sense of the landscape. I think we need
to have more of these conversations
around how we bucket these
intelligences. To me, one of the things
that really needs to happen is that we
need to have some degree of like tagging
that goes with these names because
Claude code is a great example. It
doesn't just code. It does a lot more
than code, but it was named Claude code.
Manis happens to write code. It also
runs it. It also continues the workflow.
Calling it a general purpose agent is
fine, but it would be more precise to
talk about it as a multi-AN
orchestrator. I know that's a bit of a
handful, but the precise wording helps
us to name what agents to compare things
to because otherwise we end up making
inappropriate comparisons. I would not
right now compare the agent mode that
chat GPT shipped with Manis. Those are
there are different architectures. They
have different capabilities. I would say
Manis is a whole lot better than the
agent mode that chat GPT shipped and
it's not close. And I'm not even sure
they're playing in the same ballpark
even though they're both called agents.
So if I were to think about this, the
first thing I would do is I would say
how do we talk about
the challenges associated with not only
naming, which I think we we've done the
naming thing. We we won't do any more on
the naming but the challenge is
associated with stabilizing these
technologies into reliable flavors that
companies can go and access because part
of why I am doing the work on naming and
talking about naming and part of why
I've talked a lot about the whole
ecosystem before getting into mana
specifically is I think that
organizations need some predictability
to purchase and delivering that
predictability with a technology like AI
is actually quite challenging. You have
to solve complexity of orchestration,
right? You have to solve state
management across modalities. So if you
have different sub aents and you're
trying to sell this as a bundle the way
Manis is, you have to be able to show
that each sub agent can maintain its own
state, but the orchestrator needs to be
able to have global coherence because
the enterprise will expect that. You
have to be able to show that state
complexity can continue to be maintained
despite task length and modality
extending.
Another example is tool selection. When
it's uncertain, how can you show the
enterprise what tool choice the agent
will execute on when it's not sure
what's going to happen. What does the
fallback look like? What does the error
handling?
Memory management and context is another
big piece that these kinds of companies
including Manis have to solve
effectively. How do you handle long
workflows that accumulate huge huge
context? One of the biggest challenges
in AI right now is that enterprise
businesses bring enterprisecale context
and it's very very difficult to bring
that to AI in a way that's reliable and
scalable.
You can't just truncate. You might lose
dependencies. You have to figure out how
you handle external memory, how you
handle summarization, and it's not
entirely intuitive how to do that at
enterprise scale. Another example of
challenges that these kinds of agents
need to solve, crossmodal context and
how you avoid context bleed. So code
outputs might need to inform text
generation for a complex task. But you
have to make sure that they have
different context requirements and
different token economics so you're not
spending code tokens on text tokens if
code is more expensive and so you're not
leaking requirements back and forth
between the two. Errors are another
challenge. How do you avoid a situation
where you go into an error loop when one
sub agent fails? That's a really hard
challenge. What does an error recovery
decision tree look like that an
enterprise can audit and understand?
Resource predictability is another big
one. This has been one of the chief
issues that people have complained about
with Manis.
How do you predict what it's going to
cost if you're paying in credits? When a
credit is burned, is it the same value
for every action or not? People have
complained that some days it seems like
Manis burns more credits and some days
it seems like it burns less credits and
it's not predictable. Overall, it has
gotten much better since March and
that's part of why I'm talking about it
now. But it isn't at the degree of
enterprise predictability that it needs
to be yet. QA is another massive
challenge. How do you validate code and
not just code but engineering
configurations when the LLM designs all
of it with multi-agent orchestration?
That's really hard to do. That is one of
the reasons why Manis is more popular
right now with with consultants, more
popular with independent builders than
it is with enterprises. Last but not
least, I want to talk a little bit about
user intent and model coordination.
It is really really difficult to handle
different model results consistently
over time if you have different sub
aents working in the same way and some
of them are from different models. That
is not an easy and obvious task to do
but it's a task that many of these
builders are trying to handle underneath
the covers because of the unit economics
associated with token burn for models.
And so if there was an article I think
from mentioned notion where it said
basically 10 percentage point of
notion's margin have been eaten up in
the last year just because notion is
using AI models and so AI model makers
are starting to eat SAS margins. Well if
you want to combat that you have to have
a multi- aent configuration but your
multi- aent configuration needs to
actually work. And that's really hard.
And it gets harder when you think about
the second part of what I said, user
intent. How do you handle user intent
when users are not intentful? When they
aren't clear about what they want. And
you have to assume at the enterprise
level that on the one hand, you're going
to have engineers that are really clear.
And on the other hand, you're going to
have people who just say, "Make it good.
Good luck with that, right? Make a
dashboard." Well, good luck with that.
How do you interpret that? How do you
interpret that for the enterprise in a
way that is uh compliant with privacy
that is able to handle all of the
challenges that come with building a
fullyfledged product and in line with
the user's presumed intent. How do you
handle that with push back and questions
etc and everything I just described
which is all scaling challenges
associated with multi- aent
orchestrators like manis and why it
makes them hard to scale to enterprise.
I didn't even get to the technical
scaling part. Scaling out the actual
system so it serves enterprise. That's
another ch. Why am I going over all the
hard things? This all explains the
challenge that Manis is trying to solve.
Why I believe it's important that we
talk about it and why I believe Manis's
current position makes sense. At the end
of the day, Manis is trying to get to a
point where they can scale multi-agent
orchestration for the enterprise. But to
do that, they're running the classic
startup playbook where they're starting
with indie builders, they're starting
with small startups, and then they're
going to gain the experience they need
to move into the enterprise space.
They are trying to solve all of these
problems in ways that are transparent to
the user and in ways that enable the
user to deliver value specifically where
Manace is good. I'm going to talk about
those use cases toward the end of this
video, but Manis' current position is
pretty simple. They've chosen to
optimize for reliability
and they've chosen to optimize for
capability.
And that explains the issue with cost.
The old engineering dilemma is you can't
optimize for reliability, capability,
and cost all at once. You got to you got
to pick two out of three, right? You can
be reliable and capable, but you're not
going to be cheap. You can be reliable
and cheap, but you're not going to be
fast. You can't have you can't have all
three. And in a sense, I think Madness
has one of the most transparent pricing
systems in the business because when the
tokens run out, you just buy more tokens
and they can allocate the compute to the
people who are willing to pay. They're
also following a very typical platform
evolution pattern. You have a demo phase
that happened in March. You have early
access which happened sort of roughly
April to June. People found edge cases.
People had reliability issues right on
schedule. They're now stabilizing.
They've fixed a number of those
problems. It's not perfect, but it's
good enough to start talking about. And
then going from there, they're going to
be optimizing and scaling into the
second half of this year.
The fundamental tension
that they are operating within is as
follows. Users want chat, GPT,
simplicity. They want autonomous
execution and they want predictable
costs. They can't have it all. And what
you're getting is complicated workflows.
You are getting autonomous execution and
you're getting variable costs. And this
is this this core explains why Manis
remains in the expensive specialist tool
category rather than a mainstream app
because at the end of the day solving
for the engineering challenges that
would enable chat GPT simplicity and
predictable cost and autonomous
execution is non-trivial. Do you want to
know an example of why it's non-trivial?
Nobody else has launched a competitor
that really matches Manis from one of
the major model makers. The agent mode
from chat GPT doesn't do it. Claude code
is just in a separate category. I would
argue it's not the same thing.
Google hasn't launched something. Manis
is its own thing. Part of why is because
the engineering challenges they're
solving are really, really tough. I've
been through a few of those. Okay. So,
we've spent a lot of time talking about
a framework. We talked about the MACE
framework for intelligence and how you
talk about agentic intelligence. We
talked about some of the practical
categories with agents, conversational
generators, code assistants. We came all
the way to autonomous executors where I
would argue manises. We talked a little
bit about the challenge that comes from
scaling intelligence for the enterprise
and sort of where we are in this moment.
I talked about state management. I
talked about context and how you manage
that about error propagation etc. All of
this is top of mind and I wanted to
contextualize that because now you
understand where manis is why manis is
optimizing for reliability and
capability. quality has been an issue
with these agents and they know that and
they know people won't come back as a
challenger brand if they don't optimize
for it where they are in their platform
evolution why they are really positioned
as a specialist tool right now but they
are getting better and why they've had
trouble getting to the enterprise stage
but I think that they are poised to get
there if they continue to optimize all
of that being said in the present state
in September of 2025
what are practical use cases where Manis
is likely to be useful today. I think
there are several sweet spots where I've
already seen people I know people who
are using Manis for this who are very
happy and I think that there's a pattern
we can see that suggests how these tools
tend to evolve that we can apply in
other cases as well because remember the
whole purpose of this discussion it's
not just manis is that we're trying to
understand how Manis exemplifies where
agentic tooling is going. So use case
number one, highv value research and
analysis, a monthly quarterly industry
analysis for execs, competitive in
intelligence briefings, due diligence
research packages, that kind of thing.
It it wins here. Manis wins because the
cost is justifiable. If it costs a
hundred bucks to develop that report,
it's a lot cheaper than 2,000 bucks for
the consultant.
It combines web research. You have to
have a nicely formatted output. You have
to analyze the data. And human review is
expected anyway before you make
strategic decisions. So, it's not too
risky and the time savings can be huge.
If it takes two hours to make that
report, it saves you days and days of
work. Use case number two, content
marketing production pipelines. So, if
you're managing multiple clients as a
small agency, if you're a SAS company
with regular content needs, Manis can
win here because they can scale content
production without a linear cost
increase. They can handle research,
analysis, and creation and formatting of
the content. The quality bar tends to be
good first draft versus publication
ready. And the ROI is really clear
because otherwise you're hiring content
writers. Use case number three, data
analysis and visualization for non- tech
teams. So, business analysts that don't
have coding skills. Are there any of
those left? Marketing teams analyzing
campaign performance. Small businesses
that just need ad hoc analysis. You kind
of get the idea. Manis eliminates the
need to learn Python, right? Or R or
hire a data scientist. It handles messy
data. It handles the analysis and it
handles the visualization piece. I can
think of other tools that do other parts
of that, but I don't know of any tools
that do all of those parts besides
Manis. Output quality will often exceed
Excel-based manual analysis and time to
insight is reduced. Now, with truly
large enterprise data sets, this is not
going to work and I'm not going to
pretend it will. And that's why I
emphasize sort of the small business use
case. Use case number four, process
documentation.
So operations teams that document
workflows, consultants that are
analyzing client process, creating
training materials, Manis wins because
you can map existing processes and
identify opportunities and create
documentation on the fly, right? And it
and it's very fast. It can save weeks of
manual process scraping and it can
provide immediately actionable
recommendations in a nice visualized
format. Again, not a super risky use
case. It saves a ton of time. Technical
proof of concept development is the
fifth one. So if you want to validate a
product idea, if you want to explore a
new integration, if you want to create
technical specs as a PM,
this goes beyond what you would get with
a lovable because manis can create the
prototype, the documentation and the
deployment in one big workflow. It can
handle multiple technical domains and at
the end of the day, you want speed
versus production ready code. Overall,
again, you see the same pattern. And I
want to call out the success patterns
here because I think that there's
something there's something to this that
we should understand when we see where
agents are at in the fall of 2025. One,
there needs to be economic
justification. All of the tasks that
I've described are $500 to $5,000 if
done manually, often in the thousands.
The manice cost is going to be a
fraction of that, a tenth of that or
less. The time savings stretches like
into days typically and the quality
expectation tends to be make a fantastic
first draft not a perfect final product.
Those are the sweet spot for independent
agents in the fall of 2025.
Especially if you notice because these
are technical complex workflows. They
have five to 15 to 20 to 25 distinct
actions. They're combining research with
creation with formatting. They have
human review and refinement and they
have very clear deliverables. Now,
I think the thing to call out is that if
you have an agent that excels at complex
multi-dommain workflows where the
alternative is hiring expensive
specialists,
what you have is still a premium
automation tool rather than a general
productivity app. We keep coming back to
this idea that there are certain buckets
in the agentic landscape that are more
specialized than others and where you're
going to be paying more as a result. And
one of the things I want you to take
away from this is that AI agents should
not be viewed as a singular bucket
anymore. You have agents that are going
to be positioned as general productivity
tools and you have agents that are going
to be positioned as specialist tools for
specialist tasks. Manis as it stabilizes
is looking more and more like a
specialist tool for a specialist task.
It's looking like a surgeon scalpel
versus a Swiss Army knife. I'm sure they
would like to be a Swiss Army knife from
an economics perspective, but because of
the engineering challenges I've
identified earlier in this video, it's a
really hard position for a multi-agent
orchestrator to be in. That being said,
I want to end my video by talking about
it because I think that is where the
market is going and that is why it's
worth talking about Manis. Manis is like
the canary in the coal mine. They're the
ones that are showing us the way forward
on multi- aent orchestration, what it
looks like independently. They've shown
how you can start to stabilize a product
even for smaller businesses and
independent builders or for teams on
larger businesses that don't have too
heavy a data needs, but
they don't have the scale that the major
model makers have and they haven't been
able to build out the kind of footprint
that would enable them to really harvest
unit economic gains and bring down the
cost curve.
And that's where I think we're going. I
think it is likely that we will see a
version of Manis from a major model
maker in the next few months. Maybe from
Google, maybe from Claude, maybe from
OpenAI, but the value that people see
with these these complex use cases is
very high. If you were spending three,
four, $5,000 on this, yeah, you're going
to be willing to pay for the whatever it
costs to get this done with an AI agent
because it's so much cheaper. tenth a
fifth of the price, it's going to be so
much cheaper. Which if you are looking
as a major model maker to recover some
of the cost associated with these
models, you want to have more reasons
for people to pay you 200 bucks. Maybe
this is, you know, the reason why you
have a 200 buck ad hoc task on top of
your 200 bucks subscription. And some
people will pay for that because it's so
good. And you're going to see a lot more
of the economists at these major model
makers and yes they have economists
looking for these kinds of specialist
tasks that enable them to scale margin
and that's where manis is showing the
way. So if you have specialized tasks if
you have stuff that I have talked about
where it's you know $500 to $5,000 task
and you know that you need to get that
task done no matter what maybe try
Manis. it will probably save you a fair
bit of money and you will be willing to
pay the cost because it's so much
cheaper and the ROI is so clear,
especially if you're looking for an
excellent first draft. So, that's my
verdict on Manis. I waited to talk about
it till it started to stabilize. I feel
excited to talk about it now. I think
it's a great example of how AI agents
are developing specialized use cases in
fall 2025. And I'm excited to see where
Manis goes next. Cheers.