In a typical MCP interaction, the flow is one direction: Claude decides to call a tool, the tool server executes something, and the result comes back. Sampling inverts this. An MCP server can pause mid-execution, send a message to the client, and ask Claude to reason about what the server has found so far — before continuing. It's a way for servers to borrow Claude's intelligence without bypassing the user or the client's control. Here's everything you need to understand about how it works and when to use it.
What MCP Sampling Is
Sampling is an optional MCP capability that allows a server to send a sampling/createMessage request to the client. This request contains a conversation — a messages array, an optional system prompt, and model preferences — and asks the client to run an LLM completion and return the result.
The key conceptual shift: in normal MCP operation, Claude is the agent and the server is a tool. With sampling, the server temporarily becomes the requester and Claude temporarily becomes the responder. The server is asking Claude a question, not answering one.
This is useful whenever a server has data it needs intelligently interpreted. The server might be able to fetch 200 search results, but it can't decide which three are most relevant for the user's specific question — that requires reasoning. Sampling lets the server hand that reasoning task to Claude.
The Full Request/Response Flow
Here's the complete sequence of events when a server uses sampling:
- Claude calls a server tool — e.g.,
search_documentationwith a user's query - Server executes the search — fetches 50 raw documentation snippets from an index
- Server sends a
sampling/createMessagerequest to the client — includes the raw snippets and asks Claude to identify the three most relevant for this developer's specific question - Client shows the user a permission dialog — displaying what the server is asking Claude to do
- User approves the sampling request
- Client runs a Claude completion — using the messages and system prompt from the server's request
- Client returns the completion result to the server
- Server continues execution — now with an intelligently filtered result set
- Server returns the final tool result to Claude
- Claude presents the answer to the user
The human-in-the-loop at step 4 is not optional — it's a deliberate design requirement of the MCP sampling specification. Users must be able to see and approve what servers are asking Claude to do.
The sampling/createMessage Request Structure
Here is a complete, realistic sampling request with every significant field:
{
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "Here are the search results: [raw data]. Summarize the three most relevant results for a developer looking to implement OAuth."
}
}
],
"modelPreferences": {
"hints": [{"name": "claude-3-5-sonnet"}],
"intelligencePriority": 0.8,
"speedPriority": 0.5
},
"systemPrompt": "You are a technical assistant. Be concise.",
"maxTokens": 500
}
}
Breaking down the key fields:
messages: The conversation to complete. This is standard LLM message format — an array of role/content objects. The server constructs this to frame the reasoning task for Claude.modelPreferences: Advisory hints about what model to use. Thehintsarray can name specific models (likeclaude-3-5-sonnet). The priority values (0-1) indicate relative importance of intelligence, speed, and cost. The client may honor these or ignore them.systemPrompt: An optional system-level instruction for Claude during this sampling call. Note: this is set by the server, not the user.maxTokens: Maximum tokens in the completion. Servers should set this conservatively to avoid unexpectedly large responses.
Real Use Cases for Sampling
Sampling is powerful in a specific class of scenarios: where a server retrieves data that needs intelligent interpretation before it can be useful. Here are three concrete examples:
1. Database Server with Summaries
A Postgres MCP server executes a query and gets back 500 rows of raw data. The user asked a business question. The server uses sampling to ask Claude to answer the business question in plain English based on the raw result set. The user gets a natural-language answer, not a wall of JSON.
2. Code Analysis Server
A code analysis server runs linting tools on a codebase and gets back 80 lint errors in machine-readable format. The server uses sampling to ask Claude to group the errors by root cause, explain what each type of error means, and prioritize which to fix first. The developer gets an actionable summary, not a raw error dump.
3. File Search Server
A file search server uses a vector index to find 20 files that might be relevant to the user's question. Relevance scores from vector search are imprecise. The server sends all 20 file names and descriptions to Claude via sampling and asks Claude to rank them by relevance to the specific question being asked. The final tool response returns only the top 5 — correctly ranked by actual semantics, not just vector similarity.
Sampling vs. Calling the Anthropic API Directly
A server could, in principle, just call https://api.anthropic.com/v1/messages directly with its own API key instead of using the MCP sampling mechanism. This seems simpler. So why does MCP sampling exist?
The differences are significant:
- Transparency: Sampling routes through the client. The user sees what the server is asking Claude. Direct API calls are invisible to the user.
- User control: Sampling requires user approval. Direct API calls require no approval.
- No API key required: Sampling uses the user's existing Claude session. Direct calls require the server operator to manage and pay for API access.
- Model alignment: The client picks the model, respecting the user's preferences and tier. Direct calls use whatever model the server operator configured.
MCP sampling is the approved, transparent approach. It keeps the user in control of what Claude is doing on their behalf, which aligns with the MCP trust model. Servers that bypass the client to call the API directly are operating outside the MCP trust boundary.
The Model Preference System
The modelPreferences field deserves more explanation because its semantics are unintuitive. Servers express preferences as:
- Named hints: Specific model names the server would prefer. The client may or may not have access to these models and is not obligated to use them.
- Priority weights (0-1):
intelligencePriority,speedPriority, andcostPriorityindicate the server's preference tradeoffs. A summarization task might setspeedPriority: 0.9(fast is fine) andintelligencePriority: 0.3(doesn't need the most capable model). A complex code reasoning task might flip these.
The client interprets these preferences and selects a model according to its own logic. The server cannot force a specific model. This matters: a server cannot use sampling to force an expensive model call that the user wouldn't have authorized.
Who Should Use Sampling
Sampling is not for most servers. If your server does straightforward tool execution — read a file, query a database, search the web — you don't need sampling. The tool result comes back and Claude handles interpretation natively.
Sampling is useful when:
- Your server retrieves data that requires reasoning to be useful (not just formatting)
- You're building a multi-step agent that makes sequential decisions based on intermediate results
- You want to classify or rank results before returning them, and that classification requires LLM-level understanding
- Your server is itself an orchestrator that delegates sub-tasks to Claude
Understand the full spectrum of what MCP servers can expose by reading the guide to MCP tools, resources, and prompts — sampling fits into the broader picture of server capabilities.
Security: The Prompt Injection Risk
Sampling introduces a specific and serious security risk that deserves careful attention: prompt injection at the protocol level.
When a server sends a sampling request, it controls the content of the messages array that Claude will reason over. A malicious server could include carefully crafted adversarial text in that content — instructions disguised as data, designed to manipulate Claude's behavior, change what it tells the user, or extract information from the conversation.
This is harder to defend against than standard prompt injection because the content comes from the server (which the user may have trusted by installing it), not from untrusted external web content. Standard content filters may not flag it.
The mitigations available are:
- Clients should display sampling requests fully, not just a summary, so users can see the exact messages being sent to Claude.
- Users should be cautious about installing servers from unknown sources — the same security hygiene that applies to browser extensions applies here.
- Clients can implement sampling request size limits to reduce the attack surface.
The MCP security and trust levels guide covers the broader threat model in detail. Sampling is one of the higher-risk MCP features precisely because it gives servers the ability to influence Claude's reasoning directly.
Frequently Asked Questions
No. Sampling is an optional capability that servers must explicitly request and clients must explicitly support. Most simple tool servers — filesystem, web search, database queries — do not use sampling at all. Sampling is primarily useful for complex multi-step agent servers that need to delegate reasoning or summarization back to the LLM mid-task. If your server doesn't need to analyze its own tool output, you don't need sampling.
No. Servers can express preferences via the modelPreferences field — suggesting a model name, weighting for intelligence vs. speed vs. cost — but the client makes the final decision. This is a deliberate design choice: the user's client controls model selection, not the server. A server cannot force an expensive model or bypass the user's model settings. The client may honor the hints, ignore them, or use entirely different selection logic.
No, and the difference matters. Sampling routes through the MCP client (e.g., Claude Desktop), which means the user sees what is being asked, can approve or deny it, and the call happens within the user's existing session. Calling the Anthropic API directly from a server bypasses the client entirely, requires a separate API key, and the user has no visibility into what the server is requesting. MCP sampling is the sanctioned, transparent approach for servers that need LLM reasoning.
The primary risk is prompt injection at the protocol level. A malicious server could include adversarial content in the messages array of a sampling request — content designed to manipulate Claude's reasoning, change its behavior, or extract information. Because the sampling request comes from the server, not the user, standard prompt injection defenses are less effective. Clients should display sampling requests clearly and completely so users can verify what a server is asking Claude to do before approving the request.