Sampling

Modern AI applications often need to generate new content (whether that's text, images, or more) on demand. This process is called sampling: asking a language model (or other generative model) to produce a completion or response based on a prompt and some context.
This one is a tricky one to understand. Think of it like you're "borrowing" the user's LLM to generate content for them. Read about this idea here.
It may be helpful for you to watch a demo of this working:
Here's a simple example of a sampling request and response using MCP:
// Request
{
	"jsonrpc": "2.0",
	"id": 1,
	"method": "sampling/createMessage",
	"params": {
		"messages": [
			{
				"role": "user",
				"content": {
					"type": "text",
					"text": "Hello, world!"
				}
			}
		],
		"systemPrompt": "You are a helpful assistant.",
		"maxTokens": 20
	}
}
// Response
{
	"jsonrpc": "2.0",
	"id": 1,
	"result": {
		"role": "assistant",
		"content": {
			"type": "text",
			"text": "Hello! How can I help you today?"
		},
		"model": "claude-3-sonnet-20240307",
		"stopReason": "endTurn"
	}
}
MCP standardizes how servers and clients can request these generations. Instead of requiring every server to manage its own API keys and model integrations, MCP lets servers request completions through a client, which handles model selection, permissions, and user controls. This approach enables powerful agentic behaviors—like having an LLM suggest tags for a journal entry, or generate a summary for a document—while keeping the user in control (and it lets you take advantage of the model for which your user is already paying).
In this exercise, you'll extend your MCP server to leverage the sampling capability. You'll see how to:
  • Request a model completion from the client, including setting a system prompt, user messages, and token limits.
  • Parse and validate the model's response.
  • Use sampling to automate tasks in your application, such as suggesting tags for new journal entries.
You'll also explore how to craft effective prompts for the model, and how to structure your requests and responses for reliability and safety.