Model Context Protocol

These are basically glorified wrappers with the “function documentation” acting as a prompt for the model to use. One of those really simple but effective solutions that work.

You have 3 parts to it:

  • LLM
  • mcp client
  • mcp server

Protocol “specs” has some features that both the client and the server need to implement. There is an initialisation step where the capabilities are gossiped across so both can adjust accordingly.

Server Capabilities

Tool

This is what is used most often when talking about an MCP. A tool is basically a function that the model can call. The server provides details about the function like what it does and what parameters it takes and returns in the initialisation step. These can be derived cleverly by the MCP SDK from docstrings and type hints ( as in Python ) or be a bit more explicit ( as in Node.js ).

The MCP basically checks each tool and ranks its relevance to the current prompt. If it finds one that is relevant enough it will call it with the parameters it thinks are best. Note that this is an aspect of the MCP client. Effectively selecting the tool and calling it with the right parameters is then both a function of the MCP client and the LLM it uses.

Prompts

There are an entirely separate from tools and is basically pre-defined templates suggested to the user as a way to both demonstrate how to use the tools the MCP server has as well as make it easier to give an elaborate prompt without having to write it all or save it somewhere. Basically a UI around sprintf but supplied via the MCP server. Note that the “UI” part is again an aspect of the MCP client.

The prompt itself is not meant to call any tool, all it does is a string formatting to substitute parameters into the prompt template. When the prompt does end up getting executed, that is now a flow entirely separate from the prompt itself. The model still decides if it wants to call any tools or not, though if the prompt is well designed it should guide the model to use the tools effectively.

Resources

These are meant to be static data like files, documents, images or other assets that the model can refer to. The line between tool and resource can seem to be a bit blurred since you can always have a tool that just returns the content of a resource and resources can take parameters as well. But the main difference is that a tool call is an action that is actively decided on by the LLM whereas a resource is always passively passed to the LLM as context.

So the user can just add the “dynamic resource” as context via the client UI, example, attach records for user with id “3532” and this will make the client fetch the resource from the server and add it to the context for the LLM to use. Note that this is the client’s action based on user input.

Though it need not be an explicit user input, the client can also decide to add resources based on smart non-LLM heuristics, like workspace instructions/configurations etc. Again no LLM decision is involved here, just client logic.

Client Capabilities

Roots

Root is an idea of dynamic scope restriction that is hinted by the client to the server. It’s up to the server to honor them. Note that these are basically hints to the MCP server only and not at all involve the LLM. The client, based on some workspace config or user interactions, sends a message to the server what roots are active. The server then uses these roots to restrict what tools, prompts and resources are visible to the client.

I guess the idea makes sense for a global MCP server or as a way of effectively filtering tools dynamically ( so as not to confuse the LLM with too many options ). I don’t grasp the full use case for it, if there is any at all.

Sampling

This allows for multi-step or sequential tool calls guided by the server as well. Without sampling, the client can still decide to call multiple tools in sequence but that’s entirely up to the LLM and client. Tools allow for servers to use LLM’s capabilities without server having to build LLM support internally.

A simple use case is a GitHub issue tagging bot that uses the LLM to classify issues and then tag them accordingly. Without sampling, the client would have to call the classify tool, get the result, then call the tag tool with the result. With sampling, the server can just call the classify tool, get the result and then call the tag tool all in one go. The client just has to send the initial prompt and then wait for the server to finish. It basically reduces the “intelligence” needed on the client side for orchestrating multiple tool calls.

Elicitation

Elicitation allows for human in-the-loop interaction for tool calls. Basically let’s say I make a dangerous tool call that with the right args ( that are controlled entirely by the LLM ) can delete all my data. For such a tool call, I would want for the user to “confirm” the tool call with the args before proceeding. Elicitation allows for that.

The server basically sends back a “form” that the client can render and the user completes. This also allows for cases where the tool’s non LLM based logic determines some parameters are needed or that the tool call was underparameterized. The server can then send back a form to the client to get the missing parameters from the user.

barebones youtube-mcp

Link to it: https://github.com/ruinivist/

Some learnings:

  • Use uv as it allows to just specify uv run... instead of having to source stuff
  • The clients do the gossiping as part of the initialisation step but it must also know about what servers exist, how to start them if needed etc. So some sort of registry is needed. This is where servers use a simple json file to register themselves.
  • if you use stdio logging needs to go to stderr otherwise it will mess up the mcp protocol communication on stdout.

An example of such a json file:

{
  "servers": {
    "youtube-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "mcp", "run", "src/server.py"]
    }
  }
}
  • mcp dev allows for a development mode where it renders a simple UI to show capabilities and test tool calls etc.
  • Such a JSON can almost always be generated by the client itself.
    • For copilot, you can do “MCP > Add Server” and then give the command like uv run mcp run src/server.py and it will parse it and add the server to the .vscode/mcp.json file.
    • Gemini has a similar command. gemini mcp add youtube-mcp uv run mcp run src/server.py which generates a ~/.gemini/settings.json file.

Other stuff that I haven’t tried yet

  • SSE as the transport layer instead of stdio. Stdio is basically simplest, fastest but local only. SSE allows for remote servers as well as better streaming support.
  • using Context to report progress on long running tasks and use case of client and server sessions ( I guess this is just state management on both sides ).