MCP Agent Orchestration Patterns: Designing AI Workflows
Design patterns for orchestrating AI agents with MCP -- sequential chains, parallel execution, conditional routing, retries, and composition.
MCP agent orchestration patterns define how AI agents chain, parallelize, and compose tool calls to accomplish complex tasks -- the most important patterns are sequential chains for dependent operations, parallel fan-out for independent operations, conditional routing for branching logic, and retry with fallback for resilient execution. Understanding these patterns is the difference between an agent that fumbles through tool calls and one that executes workflows with precision and efficiency.
Every time an AI agent uses MCP tools, it is implicitly applying an orchestration pattern. The agent might call one tool, read the result, then call another (sequential chain). It might call three tools simultaneously (parallel fan-out). It might choose between two tools based on a condition (conditional routing). By making these patterns explicit, you can design agent prompts and server configurations that guide agents toward optimal execution strategies.
This guide builds on the foundational concepts in MCP for AI Agents: Building Autonomous Workflows.
Pattern Overview
| Pattern | When to Use | Example |
|---|---|---|
| Sequential chain | Each step depends on the previous result | Read file, then analyze, then write summary |
| Parallel fan-out | Multiple independent operations | Search three databases simultaneously |
| Conditional routing | Different actions based on data | Route to Postgres or MongoDB based on data type |
| Retry with backoff | Transient failures expected | API calls to rate-limited services |
| Fallback chain | Primary tool might fail | Try primary API, fall back to cached data |
| Map-reduce | Same operation on multiple items | Analyze each file in a directory |
| Pipeline | Stream of transformations | Extract data, transform, load |
| Supervisor loop | Long-running task with checkpoints | Multi-step project with progress tracking |
Sequential Tool Chains
The most fundamental pattern: tools are called one after another, with each step using results from the previous step.
Structure
Tool A --> result A --> Tool B (uses result A) --> result B --> Tool C (uses result B)
When Agents Use This
An agent building a feature might execute:
github_get_issue-- read the issue detailsfilesystem_read_file-- read the relevant source filefilesystem_write_file-- write the modified codeshell_execute-- run the testsgithub_create_pull_request-- submit the changes
Each step depends on information from the previous step. The agent cannot write code without reading the issue first, cannot run tests without writing code first, and so on.
Optimizing Sequential Chains
Minimize chain length. Every tool call adds latency and consumes context window space. If two operations can be merged into one tool call, design the tool to support it.
Provide context at each step. When an agent sends tool results back in the conversation, the AI model uses that context for the next decision. Ensure tool responses include enough information for the agent to plan the next step correctly.
Handle partial failures. If step 3 of a 5-step chain fails, the agent needs enough context to decide whether to retry step 3, roll back steps 1-2, or abort the entire chain.
| Chain Length | Reliability Impact | Recommendation |
|---|---|---|
| 2-3 steps | High reliability | Simple, let the agent handle naturally |
| 4-6 steps | Moderate reliability | Add explicit checkpoints |
| 7-10 steps | Lower reliability | Break into sub-tasks with clear handoffs |
| 10+ steps | Risky | Use supervisor pattern with sub-agents |
Parallel Tool Execution
When multiple operations are independent, they should execute in parallel rather than sequentially. MCP supports this through the protocol's ability to handle concurrent tool calls.
Structure
/--> Tool A --> result A --\
Start --+--> Tool B --> result B --+--> Combine results
\--> Tool C --> result C --/
How MCP Clients Handle Parallelism
Modern MCP clients (Claude, Cursor, and others) can issue multiple tool calls in a single response turn. The client detects when the AI model requests multiple tools simultaneously and executes them in parallel against their respective MCP servers.
Example: an agent analyzing a codebase might request three tools at once:
filesystem_searchfor all Python filesgithub_list_pull_requestsfor recent changesshell_executeto check test coverage
These operations are independent and can run concurrently, reducing total latency from the sum of all three operations to the time of the slowest one.
Designing Tools for Parallelism
Tools that support parallel execution should be:
Stateless. Each call is independent and does not rely on side effects from other concurrent calls.
Idempotent. Calling the tool multiple times with the same arguments produces the same result. This matters for retry scenarios where a parallel batch might be partially re-executed.
Non-conflicting. Parallel write operations to the same resource create race conditions. If two tools might modify the same file, they should not be called in parallel.
| Tool Type | Safe for Parallel? | Notes |
|---|---|---|
| Read operations | Yes | Multiple reads never conflict |
| Search operations | Yes | Independent queries |
| API GET requests | Yes | Read-only external calls |
| File write operations | Only to different files | Same-file writes create races |
| Database writes | Depends on isolation level | May need transactions |
| State mutations | No | Sequential execution required |
Conditional Tool Routing
Agents frequently need to choose between different tools or different parameters based on runtime conditions.
Structure
/-- condition A --> Tool X
Evaluate condition --+-- condition B --> Tool Y
\-- condition C --> Tool Z
Pattern: Data-Driven Routing
The agent reads data first, then decides which tool to call based on the content:
Step 1: read_config --> config says database_type: "postgres"
Step 2: (if postgres) query_postgres
(if mongodb) query_mongodb
(if sqlite) query_sqlite
Pattern: Capability-Based Routing
The agent checks what tools are available and routes accordingly:
Step 1: Check available tools
Step 2: (if browser_tool available) scrape_web_page
(if fetch_tool available) fetch_url_content
(if neither) return "Cannot access web content"
Designing for Conditional Routing
Use tool annotations to help agents make routing decisions:
| Annotation | Purpose | Agent Behavior |
|---|---|---|
| readOnlyHint: true | Tool only reads data | Agent selects for safe exploration |
| destructiveHint: true | Tool modifies or deletes | Agent adds confirmation step |
| openWorldHint: true | Tool accesses external systems | Agent considers network availability |
| idempotentHint: true | Safe to retry | Agent retries on failure |
These annotations are part of the MCP specification and help agents make informed decisions about which tools to use and when to apply safety measures.
Retry Patterns
Transient failures are common when MCP tools interact with external services. Well-designed retry patterns handle these gracefully.
Exponential Backoff
When a tool call fails with a transient error, retry with increasing delays:
Attempt 1: immediate
Attempt 2: wait 1 second
Attempt 3: wait 2 seconds
Attempt 4: wait 4 seconds
Attempt 5: wait 8 seconds
Give up after 5 attempts
Server-Side Retry Implementation
MCP servers that wrap external APIs should implement retries internally rather than relying on the agent to retry:
import asyncio
import random
async def call_with_retry(func, max_retries=3, base_delay=1.0):
"""
Call a function with exponential backoff and jitter.
"""
for attempt in range(max_retries + 1):
try:
return await func()
except TransientError as e:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)
Agent-Level Retry
For tool calls that fail at the MCP protocol level (not just the wrapped API), the agent itself needs retry logic. This is typically handled in the system prompt:
When a tool call returns an error:
1. If the error is transient (timeout, rate limit, connection error),
wait briefly and retry the same call up to 3 times.
2. If the error is permanent (not found, permission denied, invalid input),
do not retry. Adjust your approach or report the failure.
3. If you have retried 3 times and the tool still fails, try an
alternative approach or inform the user.
Retry Decision Matrix
| Error Type | Retry? | Strategy |
|---|---|---|
| Connection timeout | Yes | Exponential backoff |
| Rate limit (429) | Yes | Wait for Retry-After header value |
| Server error (500) | Yes | Exponential backoff, max 3 attempts |
| Not found (404) | No | Check tool arguments |
| Permission denied (403) | No | Check credentials or scope |
| Invalid input (400) | No | Fix the input parameters |
| Parse error | No | Fix the request format |
Fallback Strategies
When a primary tool fails, a fallback provides an alternative path to complete the task.
Structure
Try Tool A --> success? --> use result
\-> failed? --> Try Tool B --> success? --> use result
\-> failed? --> Try Tool C or report error
Common Fallback Chains
Data retrieval fallbacks:
| Priority | Tool | Scenario |
|---|---|---|
| 1 | Database query | Fast, structured data |
| 2 | Cache lookup | Database unavailable |
| 3 | File system read | Cache miss, read from export |
| 4 | Return default value | All sources unavailable |
Code execution fallbacks:
| Priority | Tool | Scenario |
|---|---|---|
| 1 | Shell execute | Run the command directly |
| 2 | Docker execute | Shell restricted, use container |
| 3 | Remote execution | Local execution unavailable |
| 4 | Dry-run simulation | All execution paths blocked |
Implementing Fallback in MCP Servers
A single MCP tool can implement fallback logic internally, presenting a unified interface to the agent:
async def handle_search(query):
"""
Search with automatic fallback across providers.
"""
# Try primary search provider
try:
results = await primary_search_api(query)
if results:
return ToolResult(
content=format_results(results, source="primary")
)
except Exception:
pass # Fall through to next provider
# Fallback to secondary provider
try:
results = await secondary_search_api(query)
if results:
return ToolResult(
content=format_results(results, source="fallback")
)
except Exception:
pass
# Final fallback: local cache
cached = local_cache.search(query)
if cached:
return ToolResult(
content=format_results(cached, source="cache (may be stale)")
)
return ToolResult(
content="No results found from any source",
is_error=True
)
Tool Composition Patterns
Composition creates higher-level operations from combinations of lower-level tools.
Macro Tools
A macro tool encapsulates a common multi-step workflow as a single tool call:
# Instead of requiring the agent to call 4 separate tools:
# 1. git_checkout -b feature-branch
# 2. filesystem_write_file
# 3. git_add
# 4. git_commit
# Provide a composite tool:
async def handle_commit_feature(branch_name, file_path, content, message):
"""
Create a feature branch, write a file, and commit -- all in one step.
"""
await git.checkout(b=branch_name)
await filesystem.write(file_path, content)
await git.add(file_path)
await git.commit(message=message)
return ToolResult(
content=f"Committed to branch {branch_name}: {message}"
)
Macro tools reduce the number of agent reasoning steps, lowering latency and cost while improving reliability.
Adapter Tools
An adapter tool transforms the output of one tool into the format expected by another:
| Source Tool Output | Adapter | Target Tool Input |
|---|---|---|
| CSV data | CSV-to-JSON converter | JSON API endpoint |
| Raw HTML | HTML-to-Markdown parser | Text analysis tool |
| Binary file | Base64 encoder | API that accepts base64 |
| Database rows | Row-to-report formatter | Document writer |
Stateful vs Stateless Workflows
MCP tool calls are inherently stateless -- each call is independent. But many workflows need state. Here is how to handle both approaches.
Stateless Workflows
Each tool call is self-contained. The agent passes all necessary context with every call.
Advantages:
- Simple to implement and debug
- No cleanup required
- Easy to retry or restart
- Scales horizontally
Works for: Search, data retrieval, text transformation, file reads.
Stateful Workflows
The workflow maintains state across multiple tool calls, either in the MCP server or in an external store.
# Server-side session state
sessions = {}
async def handle_start_transaction(session_id):
sessions[session_id] = {
"changes": [],
"started_at": datetime.now().isoformat()
}
return ToolResult(content="Transaction started")
async def handle_add_change(session_id, change):
if session_id not in sessions:
return ToolResult(content="No active transaction", is_error=True)
sessions[session_id]["changes"].append(change)
return ToolResult(content=f"Change added. Total: {len(sessions[session_id]['changes'])}")
async def handle_commit_transaction(session_id):
if session_id not in sessions:
return ToolResult(content="No active transaction", is_error=True)
changes = sessions[session_id]["changes"]
# Apply all changes atomically
await apply_changes(changes)
del sessions[session_id]
return ToolResult(content=f"Committed {len(changes)} changes")
Advantages:
- Supports transactions and rollback
- Reduces data passed per call
- Enables progressive operations
Complications:
- Session cleanup on disconnect
- Memory management for long sessions
- State recovery after server restart
Choosing the Right Approach
| Workflow Type | Recommendation |
|---|---|
| Read-only queries | Stateless |
| File modifications | Stateless (agent tracks state in context) |
| Database transactions | Stateful (server-managed transactions) |
| Multi-step wizards | Stateful (server tracks progress) |
| Batch operations | Stateful (server accumulates items) |
| Idempotent operations | Stateless |
Supervisor Patterns
The supervisor pattern wraps an entire orchestration workflow in a control loop that monitors progress, handles failures, and ensures completion.
Basic Supervisor Loop
while task not complete:
1. Assess current state
2. Determine next action
3. Execute action (tool call)
4. Evaluate result
5. Update state
6. Check for completion or failure
Implementing with MCP
The supervisor is an AI agent whose system prompt defines the workflow rules:
You are a project supervisor agent. Your job is to complete the
assigned task by coordinating tool calls.
Workflow rules:
- Always check the current project state before taking action
- Execute one step at a time and verify the result
- If a step fails, attempt recovery before escalating
- Log progress after each step using the log_progress tool
- Stop and ask the user if you encounter an ambiguous situation
- Maximum 20 tool calls per task; if not done, summarize
progress and request continuation
The tool call limit is an important safety mechanism. Without it, a supervisor agent could enter an infinite loop, consuming API credits and time without making progress.
Checkpoint and Resume
For long-running workflows, implement checkpoints so the workflow can resume after interruption:
async def handle_save_checkpoint(task_id, state):
"""Save workflow state to persistent storage."""
await checkpoint_store.save(task_id, {
"state": state,
"timestamp": datetime.now().isoformat(),
"completed_steps": state.get("completed_steps", []),
"next_step": state.get("next_step")
})
return ToolResult(content=f"Checkpoint saved for task {task_id}")
async def handle_load_checkpoint(task_id):
"""Load the most recent checkpoint for a task."""
checkpoint = await checkpoint_store.load(task_id)
if not checkpoint:
return ToolResult(content="No checkpoint found", is_error=True)
return ToolResult(content=json.dumps(checkpoint))
Pattern Selection Guide
Choosing the right orchestration pattern depends on your workflow characteristics:
| Characteristic | Recommended Pattern |
|---|---|
| Steps depend on each other | Sequential chain |
| Steps are independent | Parallel fan-out |
| Different paths for different data | Conditional routing |
| External services may be unreliable | Retry with backoff |
| Primary approach may not work | Fallback chain |
| Same operation on many items | Map-reduce |
| Long-running with many steps | Supervisor loop |
| Common multi-step operation | Macro tool composition |
Most real-world agent workflows combine multiple patterns. A supervisor loop might contain sequential chains that include parallel fan-outs with retry logic at each step. The key is recognizing which pattern applies at each level of the workflow.
What to Read Next
- MCP for AI Agents: Building Autonomous Workflows -- the parent guide covering agent fundamentals
- Building Multi-Agent Systems with MCP -- coordinating multiple specialized agents
- Testing and Debugging MCP Servers -- testing the tools that power your orchestration patterns
- Composability in MCP -- how tool composition is built into the protocol
- Browse MCP Servers -- find servers to build your agent workflows