Build an MCP Server for LLM Tools
Key Points
- The Model Context Protocol (MCP), released by Anthropic in November 2024, standardizes how LLM agents communicate with external tools, eliminating the need for duplicated integrations across different frameworks.
- Building an MCP server lets you expose any existing API (e.g., a FastAPI employee churn predictor) as a universal tool that any LLM agent can call without custom wrappers.
- The tutorial demonstrates that a functional MCP server can be created in under 10 minutes using familiar Python tooling, showing the step‑by‑step process from project setup to endpoint exposure.
- MCP works with both paid and open‑source LLMs, and includes built‑in observability features so you can track which agents are invoking which tools.
- Once the MCP server is running, the same tool definition can be reused across any client or agent, dramatically simplifying integration and scaling of AI‑driven workflows.
Sections
- Rapid MCP Server Setup - A concise walkthrough showing how to create a Model Context Protocol (MCP) server in under ten minutes to standardize LLM tool integration while addressing paid model compatibility, observability, and deployment details.
- Setting Up Virtual Environment and Server - The speaker creates a Python virtual environment, installs the MCP CLI and requests packages using uv, generates a server.py file, and starts importing the necessary modules to build a FastMCP server.
- Calling Employee Churn Prediction API - The speaker walks through constructing a payload, extracting data from a list, and making a POST request with appropriate JSON headers to invoke an API that predicts whether an employee will churn.
- Choosing Transport Types for Inspector - The speaker walks through connecting to the inspector, copying its URL, and explains the difference between STDIO and server‑sent events transport options, guiding users on selecting the appropriate transport for their tool.
- Running LLM for Employee Churn - The speaker demonstrates configuring and executing an Ollama Granite 3.1 LLM to predict whether an employee will churn, linking sample data, setting server parameters, and running the Python agent within a timed demo.
Full Transcript
# Build an MCP Server for LLM Tools **Source:** [https://www.youtube.com/watch?v=EyYJI8TPIj8](https://www.youtube.com/watch?v=EyYJI8TPIj8) **Duration:** 00:14:53 ## Summary - The Model Context Protocol (MCP), released by Anthropic in November 2024, standardizes how LLM agents communicate with external tools, eliminating the need for duplicated integrations across different frameworks. - Building an MCP server lets you expose any existing API (e.g., a FastAPI employee churn predictor) as a universal tool that any LLM agent can call without custom wrappers. - The tutorial demonstrates that a functional MCP server can be created in under 10 minutes using familiar Python tooling, showing the step‑by‑step process from project setup to endpoint exposure. - MCP works with both paid and open‑source LLMs, and includes built‑in observability features so you can track which agents are invoking which tools. - Once the MCP server is running, the same tool definition can be reused across any client or agent, dramatically simplifying integration and scaling of AI‑driven workflows. ## Sections - [00:00:00](https://www.youtube.com/watch?v=EyYJI8TPIj8&t=0s) **Rapid MCP Server Setup** - A concise walkthrough showing how to create a Model Context Protocol (MCP) server in under ten minutes to standardize LLM tool integration while addressing paid model compatibility, observability, and deployment details. - [00:03:04](https://www.youtube.com/watch?v=EyYJI8TPIj8&t=184s) **Setting Up Virtual Environment and Server** - The speaker creates a Python virtual environment, installs the MCP CLI and requests packages using uv, generates a server.py file, and starts importing the necessary modules to build a FastMCP server. - [00:06:06](https://www.youtube.com/watch?v=EyYJI8TPIj8&t=366s) **Calling Employee Churn Prediction API** - The speaker walks through constructing a payload, extracting data from a list, and making a POST request with appropriate JSON headers to invoke an API that predicts whether an employee will churn. - [00:09:08](https://www.youtube.com/watch?v=EyYJI8TPIj8&t=548s) **Choosing Transport Types for Inspector** - The speaker walks through connecting to the inspector, copying its URL, and explains the difference between STDIO and server‑sent events transport options, guiding users on selecting the appropriate transport for their tool. - [00:12:14](https://www.youtube.com/watch?v=EyYJI8TPIj8&t=734s) **Running LLM for Employee Churn** - The speaker demonstrates configuring and executing an Ollama Granite 3.1 LLM to predict whether an employee will churn, linking sample data, setting server parameters, and running the Python agent within a timed demo. ## Full Transcript
This is how to build an MCP server so you can connect your LLM agents into just about anything.
The model context protocol was released by Anthropic in November, 2024.
It addresses a lot of the issues that have been popping up around agents.
How?
Well, in order for agents to exist, they need tools, right?
But every framework or app or client tends to bring its own way of declaring these tools.
Now, this becomes a pain because
you might find yourself creating integrations repeatedly every time you want to use an AI capability.
This is where MCP comes in.
It standardizes how LLMs talk to tools.
So you can define your tool server once and use it everywhere.
I'm going to show you how to build your own in under 10 minutes,
but does it only work with paid LLM's and how hard is it actually to build?
And what about observability?
Can I track what's using a specific tool?
We'll get to that.
I'm recording this after a big bowl of carbs and without using cursor copilot or my
old mate stack overflow, I'm going to break it down into three straight forward steps.
Phase one, build the server.
Alrighty, so we are gonna go on ahead and build our very own MCP server.
And as usual, we're gonna set a bit of a timer.
So 10 minutes on the clock, let's kick this thing off.
So I've already got, I guess a little bit of background as to what we're going to be building an MCP server for.
So I built a machine learning API in this video where we actually went and deployed it using Fast API.
And you can see here that I've currently got it running locally via this specific endpoint.
Now, if I go and send this particular body, so years at company, so it's predicting employee churn.
So this dictates how many years that particular employee has been at the company,
as well as their satisfaction, their position, whether they're a manager or non-manager, and their salaries.
I've split this up into an ordinal representation between one to five.
So if I send this off, it's gonna predict whether or not the employee is
likely to churn, so you can say down here that we've got a prediction of zero.
If we went and change their employee satisfaction to zero one, you can see they're still not gonna churn.
What if their salary sucked?
So if we send that through, they're not gonna churn.
So maybe if they had less, so maybe they're ultra loyal.
So if change their years at the company, take a look.
So we've now got a one.
So that represents the fact that they are gonna churn.
But how would we convert this into an MCP server so that we can expose it to all of our amazing AI agents?
Well, that's exactly what we're gonna do with our MCP servers.
So.
You can see that I've got my API running here.
So that's just what I've shown using fast API and I'll include a link to that as well.
But for now, we are going to focus on getting our MCP server up and running.
Okay, so we wanna go on ahead and do this.
So, all right, focus.
So we're gonna go UV in it and we're going to create a employee project.
So this is going to now create a folder called employee.
It's got my PI project, thermal file, so on and so forth.
Then what we need to go ahead and a CD into that folder.
So we're now inside it and we want to go ahead and create a virtual environment.
So I'm gonna go UV, V, ENV.
So we've now got a virtual enviroment and then we're actually gonna copy this command to activate it.
Boom, that is our virtual environment now created.
So if we jump in, you can see that we've got our virtual enviornment.
All of our project files were looking good.
Okay, what do we wanna go ahead and do now?
We need to import or install our dependencies.
So we gonna go uv add and we wanna MCP CLI package.
And we also want requests.
Let's make sure I'm not covering that.
So we're actually going to be using the model context protocol.
So this is our big library over here, which is going to allow us to do all of this amazing stuff.
And there's a whole bunch of information about how this actually works.
We're mainly going to using the Python SDK.
Okay, so that is, let's run that install.
Perfect, we're now installed.
All right, we can clear that.
So if we jump into, we also want to create a server file.
So I'm gonna go touch server.py, beautiful.
Okay, so if we jump in here now, we should have a server file.
Okay, that is the beginnings of our server.
Very basic at the moment, but we need to go on ahead and build this out.
So the first thing that we're gonna do is we're going to import our dependencies.
Oh God, the time.
How's that time?
Oh my Lord, seven minutes.
Okay, we are gonna need to punch it.
So we're to go from mcp.server.fastmcp.
we are going to import fast MCP.
And then we need to import a bunch of other stuff.
So we're gonna import JSONs.
We're gonna use this.
So this fast MCP class is going to be like the crux of our entire server.
So you'll see when I instantiate that in a second.
Then we want JSON, we're going to use that for some parsing later on.
We're going import requests to actually make a request out to this API.
And then what do we want?
We need a little bit of typing assistance.
So we gonna go from typing, we're go to import list because that's how we're to pass the input
from that agent.
Okay, those are our dependencies now done.
We're then gonna create our server.
So I'm gonna say MCP is equal to fastMCP and we're gonna call it churn and burn.
sort of in alignment with my desktop, right?
Okay, so then, so that's our server created, server created.
And then we wanna create a tool.
So create the tool,
and there's different resource types, right, or different capabilities that you are able to build inside of your MCP server.
So you've got the ability, let me jump back.
You've got ability over down here.
So you can build resources, prompts, tools, you can handle sampling, transports.
We'll talk about that a little bit later.
Okay, so we are going to create a decorator.
So we're going to mcp.tool.
So this is going to wrap our function that's going to call out to our end point.
So then we're gonna create that specific tool.
So I'm gonna call it predict churn and it's going return a string and we now need to handle the data that we're to take in.
So we are gonna take in a argument called data.
It's going be a list of dictionaries.
Okay, that's beautiful.
So then what we actually need to do is define a docstring.
Now I've gone and written this a little bit earlier.
So this is what we're gonna paste in.
So I'm gonna copy that and then let's read it for a sec, time permitting.
Okay, so this tool predicts whether an employee will churn or not pass through the input as a list of samples.
So the arguments, so data employee attributes which are used for inference,
the example payload, and you can see I've got a list.
wrapped or a dictionary wrapped in a list.
And it takes in the exact same variables that we had inside of postman where I was over here.
Here's a company employee sat position and salary.
Here's the company employees sat position and salary, and it's gonna return either one churn or zero no churn.
Okay, so that's a doc string now created.
Now we wanna go on ahead and handle this.
So we're gonna create a variable for our payload and we're just gonna grab the first.
value that we have inside of our list, right?
So we're just accessing this excluding the list.
Okay.
Then we need to make a call out to our API for minutes.
Okay.
This is not looking good.
So we gonna go request.post cause remember over here, we're making a post
request and then we are gonna send it actually out to that URL.
But if you had an external URL, you'd be going out to, let's paste in there.
We need some headers and our headers, oh, okay.
I should have practiced some typing this morning.
We're going to accept CCEPT application forward slash JSON.
And we are going to specify the content type.
Should have toggled word wrap.
There we go. All right.
Content type is going to be application /JSON,
and then we want to pass through our data, which going to be a JSON dumped
Payload.
Okay, beautiful.
All right, so that's our response now set up.
Then what we're gonna do is we're going to return response.json once the user calls out to this.
Okay, so, that is our tool now created.
Now, we just basically need to specify what we do when it gets called out.
Actually, if name equals main, we are going to run our MCP server.
So, MCP.run and then our transport.
is going to equal STDIO, so standard input output.
All right, I'm pausing the timer.
All right. So we've got
two minutes and 47 seconds left, but that is our server now created.
So we're good to go.
All right so we've knocked off phase one and we've the server up and running, but how do we actually test that it's working?
Can we actually get to our tool?
Well, this brings us to phase two, testing out the server.
Okay, we're back.
So we've gone and created that server, but we haven't really tested it out yet.
So how do we go about doing this?
Well, I've got two minutes, 47 seconds left on the timer.
So let's keep this up.
So let me show you how to do this.
Okay, so we are currently inside of the employee folder.
We want to start off the dev server.
So this is going to give us access to the MCP inspector.
We can actually test out our tools.
We can go UV run MCP dev server.py.
This should start off the inspector.
I'm gonna pause it.
If we successfully get the inspector up.
Okay, that's our inspector up and running.
I can copy this URL here.
I'm going to go to a browser.
Nope, no, go back.
I'm not going to copy this again.
I'm now going to paste that in, beautiful.
All right, that time inspector.
So if I connect down here, then if I go to tools up here, then if go to list tools, that is our tool now running.
I'm go to pause it, all right.
We're good.
We've got two minutes left.
All right, but let me sort of show you, right?
So over here, we've got our transport type.
So there's two different transport types available.
So there standard input output.
There is also a SSE, which is server-sent events.
So this is more important.
So you probably use a standard input-output when you're just connecting with local files or local tools.
When you're doing something more client-server related, you're probably more default over to server-sent events.
We're using...
STDIO or standard input output as dictated based on the fact that right down here under transport, we are specifying that.
Tools like Cursor can handle SSE and STDIO, I think for desktop that uses STDIO only.
The capability that we're gonna use in a second is we're going to do that with STDIO when it comes to using our agent.
Okay, enough of me blabbing.
So let's jump over to our inspector.
So to use our inspector, you just make sure you specify the right transport type, the right command.
and the right argument.
And if you hit connect, you can see that we're connected.
Remember how I said there's different capabilities that you can pass through in your MCP server.
They're all up here.
We're just interested in our tool.
And if we hit predict churn, we can actually go and test this out.
So if I switch to JSON over here, we can go and pass through an object.
So again, I've got one nicely formatted.
So we're gonna copy this over, chuck that in here.
All right, drum roll, please.
So now if we go and hit run tool, take a look.
We've got our prediction.
So we've successfully gone and determined that this particular employee with these particular values will not churn.
Now, if we went and changed it up, so I'm just gonna change it in here because you get this weird thing, not ideal.
So when I go and delete value, so like, let's say I deleted this, you can see we're getting errors.
So it's running through syntax formatting while I'm trying to edit.
So we just do that here.
So let's they had not that many years of the company, they weren't
all that satisfied and their salary was in the lower quadrant or
what is it, fifth?
So if I paste that in now, take a look, that particular person will churn.
Okay, that is our tool now successfully running and our MCP server successfully working.
Right, so we've now established that our server actually works.
We've made a call to it and we're able to get a prediction back as
to whether or not somebody's likely to churn or not churn, how do we bring this into an agent?
Well, that brings us to phase three, adding it into an agent.
Alrighty, last thing we got to do, so I've got two minutes.
Thanks for watching!
Let me bring that back up.
Two minutes left on the timer.
All right, so what we now need to go ahead and do is integrate this into our agent.
So I've gone and pre-written up an agent using the BeeAI framework
and I'll make this available via GitHub so you'll be able to test that out.
And this is all built using, let me show you the LLM.
So we're using Ollama, we're specifically using the Granite 3.1, dense eight
billion parameter model, which is gonna be trained by an amazing research team.
So we are going to go on ahead and use this.
Now, right down the bottom, you can see that I've got this question.
So will this particular employee churn?
And I've gotten my employee sample over here.
So I've go the years of the company, the employees sat, the position and their salary.
So hopefully, fingers crossed, we can send this over.
So we were running out of time.
Got a minute 14 left.
Okay, let's punch this out.
So we need to go over to here.
We've got our standard input output server params.
So we needed to pass through our command.
So our command is going to be UV.
And then we are going to run, we need pass through directory to specify where our server actually is.
And then I just want to go and grab the file path to our server.
So I'm going to copy this, copy path boom, and then back into our agent.
And then, I'm gonna paste that in here
and then we need to go and run our command and we are going to run
server.py and just over here, I just want to get rid of server.
Okay, so now if I go and ran this, let me just go to another terminal, beautiful.
And if I got run Python single flow agent, okay. We've got 30 seconds left.
Let's see how we go.
Drum roll, please.
Take a look.
There we go, I'm going to pause it.
We had 22 seconds left, not too bad.
Okay, sorry.
And let's quickly run through this.
So right down here, you can see that we've got a thought from our agent.
So the user wants to know if the employee will churn based on their attributes.
I need to use the predict churn tool.
And then right down, here, we've managed to get a prediction.
So over here, we've a prediction of one indicating that the employee should churn.
So what does our agent said?
This employee is predicted to churn, so we've successfully gone
and built out our MCP server and integrated it into an agent.
A lmost forgot observability.
All you need to do is import logging and add in this line here.
And this will give you the ability to see every tool call in your server logs.
And in the interest of interoperability, just to prove that this MCP server could be used elsewhere,
I can add it to, for example, cursor by using the following command,
which is effectively just the command that will pass through to our agent, then I can open up a chat,
paste in an example to whether or not this particular person will churn, then convert to agent mode and hit send.
Then I should be able to run the tool by opening up the example and hitting run tool,
and if I scroll on down, you'll see that
I've got a prediction of one, which indicates that that particular employee is going to churn.
Same server, MCP everywhere.