MCP Server Production Troubleshooting: Common Errors and Fixes

The most common MCP server production errors are connection refused (wrong path or process not running), JSON-RPC parse errors (stdout corruption from print statements), tool execution timeouts (missing async handling), and transport disconnections (unhandled exceptions crashing the server). Each has a specific cause and a specific fix, and this guide covers them all.

When an MCP server works in development but fails in production -- or works with the Inspector but breaks in Claude Desktop or Cursor -- the problem almost always falls into one of the categories below. This guide is organized as a reference you can jump into by error type.

For the foundational testing and debugging workflow, see the parent guide: Testing and Debugging MCP Servers.

Error Reference Table

Use this table to jump to the relevant section:

Error	Likely Cause	Section
Connection refused	Server not running, wrong path, port conflict	Connection Errors
ENOENT / spawn error	Binary not found, wrong command	Connection Errors
JSON parse error	stdout pollution, malformed response	JSON-RPC Errors
Method not found	Client/server version mismatch	JSON-RPC Errors
Tool execution timeout	Long-running operation without progress	Timeout Errors
Request timeout	Server unresponsive, deadlock	Timeout Errors
Out of memory	Unbounded data loading, memory leak	Memory Issues
Transport disconnected	Server crash, unhandled exception	Transport Errors
Permission denied	File/network access restrictions	Permission Errors
Rate limit exceeded	Too many requests to wrapped API	External Service Errors

Connection Errors

Connection Refused

Symptoms: The MCP client reports "connection refused" or "failed to connect" when trying to start a server.

For stdio servers, "connection refused" usually means the server process failed to start:

Wrong executable path. The command specified in the client config does not exist or is not executable.

{
  "mcpServers": {
    "my-server": {
      "command": "python3",
      "args": ["/path/to/server.py"]
    }
  }
}

Verify the path exists and is executable:

# Check if the file exists
ls -la /path/to/server.py

# Check if python3 is available at the expected location
which python3

# Try running the server directly
python3 /path/to/server.py

Virtual environment not activated. If your server depends on packages installed in a virtual environment, the MCP client needs to use that environment's Python:

{
  "mcpServers": {
    "my-server": {
      "command": "/home/user/projects/my-server/.venv/bin/python",
      "args": ["server.py"]
    }
  }
}

Missing dependencies. The server starts but crashes immediately because an import fails. Check stderr output or run the server manually to see the traceback.

For HTTP/SSE servers, connection refused means the server is not listening on the expected host and port:

# Check if anything is listening on the expected port
lsof -i :3000

# Check if the server is bound to localhost vs 0.0.0.0
netstat -tlnp | grep 3000

A common mistake is binding to 127.0.0.1 when the client is connecting from a different host (or a container). Bind to 0.0.0.0 for network-accessible servers.

ENOENT / Spawn Errors

Symptoms: Error messages containing "ENOENT", "spawn failed", or "command not found."

This means the MCP client cannot find the executable. Common causes:

Cause	Fix
npx not in PATH	Use full path: /usr/local/bin/npx
Node not installed	Install Node.js and verify with node --version
Python not in PATH	Use full path: /usr/bin/python3
UV not in PATH	Use full path: /home/user/.cargo/bin/uvx
Wrong working directory	Set cwd in server config

JSON-RPC Parse Errors

stdout Pollution

Symptoms: "JSON parse error", "unexpected token", or the server connects but every tool call fails.

This is the single most common MCP server bug. Any output written to stdout that is not a valid JSON-RPC message will break the protocol. The stdio transport uses stdout exclusively for JSON-RPC communication.

Common sources of stdout pollution:

# WRONG: print() writes to stdout, corrupting the JSON-RPC stream
print("Server starting...")
print(f"Processing request for tool: {name}")

# CORRECT: Use stderr for all logging
import sys
print("Server starting...", file=sys.stderr)

# BEST: Use the logging module configured for stderr
import logging
logging.basicConfig(stream=sys.stderr, level=logging.INFO)
logger = logging.getLogger("my-mcp-server")
logger.info("Server starting...")

In TypeScript:

// WRONG: console.log writes to stdout
console.log("Processing request");

// CORRECT: console.error writes to stderr
console.error("Processing request");

How to detect stdout pollution:

Run the server manually and check if anything appears on stdout before a client connects:

# Run the server and separate stdout from stderr
python3 server.py 2>/tmp/stderr.log

# If you see ANY output in the terminal (stdout), that's the problem
# All output should go to /tmp/stderr.log (stderr)

Malformed JSON-RPC Responses

Symptoms: Parse errors from the client side, even though the server is not printing to stdout.

Check that your tool handlers return properly structured results. A common mistake is returning raw Python objects instead of serializable data:

# WRONG: datetime is not JSON-serializable
async def handle_tool(name, arguments):
    return {"timestamp": datetime.now()}

# CORRECT: Convert to string
async def handle_tool(name, arguments):
    return {"timestamp": datetime.now().isoformat()}

Method Not Found (-32601)

Symptoms: The client sends a request and gets back a "method not found" error.

This usually means the client and server disagree on the protocol version. Check that both are using compatible MCP SDK versions. The method names changed between early drafts and the stable specification.

Timeout Errors

Tool Execution Timeouts

Symptoms: A tool call starts but never returns, or the client reports a timeout after 30-60 seconds.

MCP clients impose timeouts on tool calls. If your tool performs a long-running operation, it must either complete within the timeout or send progress notifications to keep the connection alive.

Common timeout causes and fixes:

Cause	Fix
HTTP request to slow API	Set request timeout, add retries with backoff
Large file processing	Stream results, send progress notifications
Database query on large dataset	Add query timeout, limit result set
Infinite loop in tool logic	Add iteration limits, deadline checks
Blocking I/O in async server	Use async I/O libraries (aiohttp, aiofiles)

Sending progress notifications to keep the connection alive during long operations:

async def handle_long_tool(arguments, progress_token=None):
    total_steps = 100
    for i in range(total_steps):
        # Do a chunk of work
        await process_chunk(i)

        # Report progress to prevent timeout
        if progress_token:
            await server.send_progress(
                progress_token=progress_token,
                progress=i + 1,
                total=total_steps
            )

    return ToolResult(content="Processing complete")

Request Timeouts (Server Unresponsive)

Symptoms: The client cannot get any response from the server, not just from tool calls.

This indicates the server's event loop is blocked:

# WRONG: Blocking call in an async server stops all processing
async def handle_tool(name, arguments):
    result = requests.get("https://slow-api.com/data")  # Blocks the event loop
    return result.text

# CORRECT: Use async HTTP client
async def handle_tool(name, arguments):
    async with aiohttp.ClientSession() as session:
        async with session.get("https://slow-api.com/data") as resp:
            return await resp.text()

If you must call synchronous code from an async server, run it in a thread pool:

import asyncio

async def handle_tool(name, arguments):
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(None, sync_heavy_function, arguments)
    return result

Memory Issues

Memory Leaks

Symptoms: Server memory usage grows over time until the process is killed by the OS or crashes with an out-of-memory error.

Common memory leak patterns in MCP servers:

Unbounded caches:

# WRONG: Cache grows forever
cache = {}

async def handle_tool(name, arguments):
    key = str(arguments)
    if key not in cache:
        cache[key] = await expensive_computation(arguments)
    return cache[key]

# CORRECT: Use an LRU cache with a size limit
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_computation(arg_tuple):
    return expensive_computation_sync(arg_tuple)

Accumulating session state:

# WRONG: Session data never cleaned up
sessions = {}

async def handle_connection(session_id):
    sessions[session_id] = {"history": [], "data": {}}
    # ... session is never removed from the dict

# CORRECT: Clean up on disconnect
async def handle_disconnect(session_id):
    sessions.pop(session_id, None)

Large file reads held in memory:

# WRONG: Reads entire file into memory
async def handle_read_file(path):
    with open(path, "r") as f:
        content = f.read()  # Could be gigabytes
    return content

# CORRECT: Limit read size
async def handle_read_file(path):
    max_size = 10 * 1024 * 1024  # 10 MB
    file_size = os.path.getsize(path)
    if file_size > max_size:
        return f"File too large: {file_size} bytes (limit: {max_size})"
    with open(path, "r") as f:
        return f.read()

Monitoring Memory

Add memory tracking to your server's health endpoint or logging:

import resource
import logging

logger = logging.getLogger("mcp.health")

def log_memory_usage():
    usage = resource.getrusage(resource.RUSAGE_SELF)
    logger.info(f"Memory RSS: {usage.ru_maxrss / 1024:.1f} MB")

Transport Disconnections

Unhandled Exceptions

Symptoms: The server disconnects mid-session, tools stop working, and the client reports the server is no longer available.

If any exception escapes your tool handler without being caught, the MCP server process may crash, terminating the stdio pipe or SSE connection.

# WRONG: Unhandled exceptions crash the server
async def handle_tool(name, arguments):
    result = 1 / arguments["divisor"]  # ZeroDivisionError crashes server
    return str(result)

# CORRECT: Catch exceptions and return error results
async def handle_tool(name, arguments):
    try:
        result = 1 / arguments["divisor"]
        return ToolResult(content=str(result))
    except ZeroDivisionError:
        return ToolResult(
            content="Error: division by zero",
            is_error=True
        )
    except Exception as e:
        logging.error(f"Tool execution failed: {e}", exc_info=True)
        return ToolResult(
            content=f"Internal error: {type(e).__name__}",
            is_error=True
        )

SSE Connection Drops

For SSE-based remote servers, connection drops can happen due to:

Cause	Fix
Proxy timeout (nginx, Cloudflare)	Send SSE keepalive comments every 15-30s
Load balancer idle timeout	Configure sticky sessions, increase timeout
Client network change	Implement automatic reconnection with backoff
Server restart during deployment	Use graceful shutdown, drain connections

SSE keepalive to prevent proxy timeouts:

async def sse_keepalive(response):
    """Send periodic SSE comments to prevent proxy timeouts."""
    while True:
        await asyncio.sleep(15)
        await response.write(": keepalive\n\n")

Permission Errors

File System Access

Symptoms: Tools that read or write files return "permission denied" errors.

When running as a systemd service, Docker container, or different user, the MCP server may not have the same file permissions as your development user:

# Check what user the server is running as
ps aux | grep mcp-server

# Check file permissions
ls -la /path/to/target/file

# Fix: run the server as the correct user, or adjust file permissions

Network Access

Symptoms: Tools that call external APIs fail with connection errors.

Firewall rules, container network policies, or security groups may block outbound requests from the server:

# Test connectivity from the server's environment
curl -v https://api.example.com/health

# Check iptables rules (Linux)
iptables -L -n

# Check Docker network
docker network inspect bridge

Debugging with stderr

Since stdout is reserved for JSON-RPC in stdio servers, all debugging output must go through stderr. Here is how to set up structured logging:

import logging
import sys
import json

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        if record.exc_info:
            log_entry["exception"] = self.formatException(record.exc_info)
        return json.dumps(log_entry)

# Configure root logger to stderr with JSON formatting
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(JSONFormatter())
logging.root.addHandler(handler)
logging.root.setLevel(logging.INFO)

For Claude Desktop, server stderr is captured in the log files at:

macOS: ~/Library/Logs/Claude/
Windows: %APPDATA%/Claude/logs/

For Cursor, check the Output panel and select the MCP server from the dropdown.

Quick Diagnosis Flowchart

When an MCP server is not working, follow this sequence:

Can the client start the server process? If no, check the command path, executable permissions, and dependencies.
Does the server start without errors? Run it manually and check stderr for import errors or configuration problems.
Does stdout stay clean? Run the server and verify nothing appears on stdout before a client connects.
Does the Inspector work? If the server works in the Inspector but not in your client, the issue is client configuration.
Do tool calls succeed? If connection works but tools fail, check tool handler error handling and external service connectivity.
Does it work initially but fail over time? Suspect memory leaks, connection pool exhaustion, or accumulating state.