Tool Use & MCP

Tool use is the mechanism by which LLMs call external functions — search, code execution, database queries, APIs. The Model Context Protocol (MCP) standardizes how tools are exposed to models, enabling reusable tool servers that work across any MCP-compatible host. This lesson covers the mechanics of tool calling (schema design, parallel calls, error handling), then MCP's architecture and when it's worth the additional complexity.

Theory

Tool Call Flow— click any step for details

LLM:The LLM decides whether to respond directly or invoke a tool based on the message content and available schemas.

Tool execution happens outside the model. The model cannot fake results — it only sees the observation the host injects back into context.

Tool calling is the model doing what it always does — generating tokens — except the token distribution is constrained so the output is always valid JSON matching a schema. You define the schema, the model picks which tool to call and fills in the arguments, and the execution environment runs it and returns the result. MCP standardizes this interface so a single tool server can be used by any agent that speaks the protocol.

Tool Calling as Constrained Generation

When the model generates a tool call, it is performing structured output generation subject to the tool's JSON schema. The API constrains the token distribution so that the generated JSON is always valid against the schema — this is the same constrained decoding mechanism as structured output mode.

A tool call with $K$ tools available can be modeled as selecting a tool $k \in [K]$ and generating arguments $a \sim p_\theta(a \mid x, k)$ . The model selects $k$ based on the instruction and available tool descriptions — tool descriptions are part of the prompt and consume input tokens:

$\text{input cost} = n_{\text{prompt}} + n_{\text{tools}} + n_{\text{history}}$

Tool descriptions must be part of the prompt because the model has no other way to know what tools are available — it has no persistent memory between requests, and tool availability can change per-session. This means tool descriptions consume input tokens on every request, not just the first. The formula makes explicit that tool proliferation has direct cost consequences: 10 tools at 100 tokens each adds 1,000 tokens to every request, which compounds across a multi-turn agent session. This is why MCP's design separates tool servers from tool selection — clients can selectively expose only the tools relevant to the current task.

For $K = 10$ tools with average description length 100 tokens: $n_{\text{tools}} \approx 1{,}000$ tokens added per request. Tool proliferation has direct cost implications.

Parallel Tool Calls

When a task requires multiple independent tool calls, they can be batched in a single model response. The model returns multiple tool_use blocks simultaneously, and the host runs them in parallel:

Serial execution time: $T_{\text{serial}} = \sum_{k} T_k$

Parallel execution time: $T_{\text{parallel}} = \max_k T_k$

For 4 tools with average latency 300ms each: serial = 1,200ms, parallel = 300ms — a 4× speedup. Design tasks so independent tools can be called in one turn.

MCP Architecture

MCP (Anthropic, 2024) defines a client-server protocol where:

MCP Server: exposes tools, resources, and prompts over a standardized interface
MCP Client (host): an application (Claude Desktop, IDE, agent framework) that connects to servers and forwards tool schemas to the model
Transport: stdio (for local servers) or HTTP+SSE (for remote servers)

The protocol separates tool definition (what a tool does, its schema) from tool execution (the server that actually runs it). A single MCP server can serve multiple hosts without modification.

Walkthrough

Designing Tool Schemas

Good tool schemas reduce model errors. Key principles:

1. One clear purpose per tool. Don't create a data_tool that can both read and write. Split into read_data and write_data — models make fewer mistakes when tool intent is unambiguous.

2. Descriptive names and descriptions. The model selects tools based on descriptions. "search" is ambiguous; "web_search — retrieves current information from the web given a natural language query" is not.

3. Enum fields over freeform strings when possible. If a field takes one of 5 values, use "enum": ["value1", "value2", ...]. Prevents hallucinated values.

python

import anthropic
 
client = anthropic.Anthropic()
 
TOOLS = [
    {
        "name": "get_weather",
        "description": "Get current weather conditions for a city. Returns temperature, conditions, and humidity.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco, CA'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city", "unit"]
        }
    },
    {
        "name": "get_forecast",
        "description": "Get 5-day weather forecast for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city", "unit"]
        }
    }
]
 
# Parallel tool calls — model may call both tools in one turn
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=TOOLS,
    messages=[{
        "role": "user",
        "content": "What's the weather in Tokyo and the 5-day forecast? Use Celsius."
    }]
)
 
# Process all tool calls from this turn
tool_results = []
for block in response.content:
    if block.type == "tool_use":
        # Run tool — in practice, these would run in parallel (asyncio, ThreadPoolExecutor)
        result = call_weather_api(block.name, block.input)
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result
        })

Setting Up an MCP Server

python

# server.py — MCP server exposing a database query tool
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types
import sqlite3
 
app = Server("database-server")
 
@app.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="query_db",
            description="Run a read-only SQL query against the product database.",
            inputSchema={
                "type": "object",
                "properties": {
                    "sql": {
                        "type": "string",
                        "description": "SELECT query to execute. INSERT/UPDATE/DELETE are rejected."
                    }
                },
                "required": ["sql"]
            }
        )
    ]
 
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name != "query_db":
        raise ValueError(f"Unknown tool: {name}")
 
    sql = arguments["sql"].strip()
    if not sql.upper().startswith("SELECT"):
        return [types.TextContent(type="text", text="Error: only SELECT queries are allowed")]
 
    conn = sqlite3.connect("products.db")
    cursor = conn.execute(sql)
    rows = cursor.fetchall()
    columns = [desc[0] for desc in cursor.description]
    result = [dict(zip(columns, row)) for row in rows]
    return [types.TextContent(type="text", text=str(result))]
 
async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream, app.create_initialization_options())
 
if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Configure in ~/.claude/claude_desktop_config.json (or equivalent host config):

json

{
  "mcpServers": {
    "database": {
      "command": "python",
      "args": ["/path/to/server.py"]
    }
  }
}

Analysis & Evaluation

Where Your Intuition Breaks

MCP is just a naming convention for how tools are defined. MCP is a network protocol with client-server separation: the tool server runs independently of the agent, exposes tools over a standardized transport (stdio or HTTP+SSE), and can serve multiple hosts without modification. This separation means a single MCP server — say, a GitHub tool server — can be connected to Claude Desktop, a custom agent framework, and a CI pipeline simultaneously, with identical tool definitions. Direct tool calling (passing schemas in the API request) tightly couples tool definitions to the agent code; MCP decouples them. The right choice depends on whether the tools need to be reused across multiple agents or hosts.

Tool Calling vs MCP

	Native Tool Calling	MCP
Best for	App-specific tools, prototypes	Reusable tool servers, multi-host setups
Setup	Define JSON schema in code	Build MCP server with list/call protocol
Reuse	Per-application	Any MCP-compatible host
Debugging	Inspect API request/response	MCP inspector, server logs
Overhead	None	Protocol + transport overhead

Use native tool calling when building an application with tools specific to that app. Use MCP when you want the same tool (e.g., a company database query tool) to be available in multiple contexts: Claude Desktop, an agent framework, a CI system.

Tool Error Handling Patterns

Return errors as content, not exceptions. When a tool call fails, return a structured error string rather than throwing. Let the model decide whether to retry, try a different tool, or report failure to the user.

python

def safe_tool_call(name: str, inputs: dict) -> str:
    try:
        return call_tool(name, inputs)
    except TimeoutError:
        return f"Error: {name} timed out after 10s. Try a more specific query."
    except PermissionError:
        return f"Error: {name} requires elevated permissions for this operation."
    except Exception as e:
        return f"Error: {name} failed with: {str(e)[:200]}"

Validate inputs before calling. For expensive or side-effecting tools, validate inputs with schema checks before dispatching. Return a clear error if inputs are malformed — don't let malformed inputs reach the external system.

🚀Production

Tool use in production:

Keep the tool list short. Each tool consumes ~100 tokens of context. More than 10–15 tools starts to degrade model selection accuracy — the model gets confused about which tool to use. Group related operations into fewer, more flexible tools if needed.
Tool names are semantic. The model uses the tool name and description as a signal. If your tool name doesn't match what the model would guess ("process_record" vs "update_customer_record"), expect more tool-selection errors.
Rate limit tool calls independently. Tool calls can make external API requests. Set per-session tool call budgets and rate limits to prevent a runaway agent from exhausting API quotas.
MCP server security: MCP servers run as local processes with filesystem and network access. Validate all inputs, restrict SQL to reads-only, and sandbox execution environments. An LLM can be manipulated into calling a tool with malicious inputs via prompt injection in tool results.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Agent Patterns

Multi-Agent Systems