Building MCP Servers
Pillar Guide

Deploying Remote MCP Servers: Docker, Cloud Hosting & Scaling

Production deployment guide for remote MCP servers — Docker containerization, cloud hosting (AWS, GCP, Azure), scaling, and monitoring.

26 min read
Updated February 25, 2026
By MCP Server Spot

Moving an MCP server from local development to production deployment involves three key transitions: switching from stdio to HTTP transport, containerizing for consistent environments, and adding the operational infrastructure (monitoring, scaling, secrets management) that production workloads require. This guide walks you through each step.

Local MCP servers use stdio transport and run as child processes of the client application. Remote MCP servers use HTTP-based transports (SSE or Streamable HTTP) and run as standalone services accessible over the network. For a detailed comparison, see Local vs Remote MCP Servers.

Choosing a Remote Transport

Remote MCP servers use HTTP-based transports instead of stdio. There are two main options:

TransportProtocolConnection ModelBest For
SSE (Server-Sent Events)GET for server-to-client, POST for client-to-serverPersistent connectionReal-time updates, long-lived sessions
Streamable HTTPStandard HTTP POST with optional SSE streamingRequest-responseStateless deployments, serverless, simpler infrastructure

SSE Transport Implementation (TypeScript)

import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { createServer } from "./server.js";

const app = express();
app.use(express.json());

// Store active transports
const transports = new Map<string, SSEServerTransport>();

// SSE endpoint — client connects here for server-to-client events
app.get("/sse", async (req, res) => {
  const server = createServer();
  const transport = new SSEServerTransport("/messages", res);

  transports.set(transport.sessionId, transport);

  res.on("close", () => {
    transports.delete(transport.sessionId);
  });

  await server.connect(transport);
});

// Messages endpoint — client sends requests here
app.post("/messages", async (req, res) => {
  const sessionId = req.query.sessionId as string;
  const transport = transports.get(sessionId);

  if (!transport) {
    res.status(404).json({ error: "Session not found" });
    return;
  }

  await transport.handlePostMessage(req, res);
});

// Health check
app.get("/health", (req, res) => {
  res.json({ status: "healthy", sessions: transports.size });
});

const PORT = process.env.PORT || 3001;
app.listen(PORT, () => {
  console.error(`MCP SSE server listening on port ${PORT}`);
});

SSE Transport Implementation (Python)

from mcp.server.fastmcp import FastMCP
from starlette.applications import Starlette
from starlette.routing import Route, Mount
from starlette.responses import JSONResponse
import uvicorn

mcp = FastMCP("Remote Weather Server")

# ... define tools, resources, prompts ...

# Create the Starlette app with SSE transport
app = Starlette(
    routes=[
        Route("/health", endpoint=lambda r: JSONResponse({"status": "healthy"})),
        Mount("/", app=mcp.sse_app()),
    ]
)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=3001)

Docker Containerization

Docker is the recommended deployment method for MCP servers. It ensures consistent environments across development, staging, and production.

Dockerfile for Python MCP Server

# Multi-stage build for smaller final image
FROM python:3.12-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

WORKDIR /app

# Copy dependency files first (for better caching)
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --frozen --no-dev

# Copy application code
COPY src/ ./src/
COPY server.py ./

# --- Production image ---
FROM python:3.12-slim

WORKDIR /app

# Copy virtual environment and app from builder
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/src /app/src
COPY --from=builder /app/server.py /app/

# Set the virtual environment path
ENV PATH="/app/.venv/bin:$PATH"

# Expose the SSE port
EXPOSE 3001

# Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD python -c "import httpx; httpx.get('http://localhost:3001/health').raise_for_status()"

# Run the server
CMD ["python", "server.py"]

Dockerfile for TypeScript MCP Server

FROM node:20-slim AS builder

WORKDIR /app

# Copy package files first for caching
COPY package.json package-lock.json ./
RUN npm ci

# Copy source and build
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build

# --- Production image ---
FROM node:20-slim

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci --production

COPY --from=builder /app/dist ./dist

EXPOSE 3001

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD node -e "fetch('http://localhost:3001/health').then(r => r.ok ? process.exit(0) : process.exit(1))"

CMD ["node", "dist/index.js"]

Dockerfile for Go MCP Server

FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o mcp-server .

# --- Minimal production image ---
FROM alpine:3.19

RUN apk --no-cache add ca-certificates
COPY --from=builder /app/mcp-server /usr/local/bin/

EXPOSE 3001

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD wget -qO- http://localhost:3001/health || exit 1

CMD ["mcp-server"]

Docker Compose for Local Testing

# docker-compose.yml
version: "3.8"

services:
  mcp-server:
    build: .
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgres://user:pass@db:5432/myapp
      - API_KEY=${API_KEY}
    env_file:
      - .env.production
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=myapp
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-LINE", "pg_isready"]
      interval: 10s
      timeout: 5s

volumes:
  pgdata:

Cloud Platform Deployments

AWS (ECS Fargate)

AWS ECS Fargate runs your Docker container without managing servers.

Task definition (simplified):

{
  "family": "mcp-server",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "containerDefinitions": [
    {
      "name": "mcp-server",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest",
      "portMappings": [
        {
          "containerPort": 3001,
          "protocol": "tcp"
        }
      ],
      "environment": [
        { "name": "NODE_ENV", "value": "production" }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:mcp/database-url"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3001/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/mcp-server",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "mcp"
        }
      }
    }
  ]
}

Put an Application Load Balancer (ALB) in front of ECS with:

  • HTTPS listener on port 443
  • Target group pointing to port 3001
  • Sticky sessions enabled (for SSE transport)
  • Health check path: /health

Google Cloud Run

Cloud Run is excellent for MCP servers -- it handles HTTPS, scaling, and container management automatically.

# Build and push the image
gcloud builds submit --tag gcr.io/PROJECT_ID/mcp-server

# Deploy to Cloud Run
gcloud run deploy mcp-server \
  --image gcr.io/PROJECT_ID/mcp-server:latest \
  --platform managed \
  --region us-central1 \
  --port 3001 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 1 \
  --max-instances 10 \
  --set-env-vars "NODE_ENV=production" \
  --set-secrets "DATABASE_URL=mcp-db-url:latest" \
  --allow-unauthenticated

Cloud Run automatically provides:

  • HTTPS with managed certificates
  • Automatic scaling from 1 to N instances
  • Built-in monitoring and logging
  • Request-based billing

Note on SSE with Cloud Run: Cloud Run supports HTTP streaming (SSE) with a maximum request timeout of 60 minutes. For long-lived SSE connections, set --timeout 3600 and implement client-side reconnection logic.

Azure Container Apps

# Create the container app
az containerapp create \
  --name mcp-server \
  --resource-group my-rg \
  --environment my-env \
  --image myregistry.azurecr.io/mcp-server:latest \
  --target-port 3001 \
  --ingress external \
  --min-replicas 1 \
  --max-replicas 10 \
  --cpu 0.5 \
  --memory 1Gi \
  --secrets dburl=mcp-database-url \
  --env-vars "DATABASE_URL=secretref:dburl"

Railway

Railway offers the simplest deployment path for small to medium MCP servers:

# Install Railway CLI
npm install -g @railway/cli

# Login and initialize
railway login
railway init

# Deploy (auto-detects Dockerfile)
railway up

# Set environment variables
railway variables set DATABASE_URL=postgres://...
railway variables set API_KEY=your-key

Railway automatically:

  • Builds from your Dockerfile
  • Assigns a public HTTPS URL
  • Manages environment variables
  • Provides logging and monitoring

Fly.io

# Initialize
fly launch

# This generates a fly.toml configuration:
# fly.toml
app = "mcp-weather-server"
primary_region = "ord"

[build]

[http_service]
  internal_port = 3001
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1

[checks]
  [checks.health]
    port = 3001
    type = "http"
    interval = "30s"
    timeout = "5s"
    path = "/health"
# Set secrets
fly secrets set DATABASE_URL=postgres://...
fly secrets set API_KEY=your-key

# Deploy
fly deploy

Environment Variables and Secrets Management

The Configuration Hierarchy

1. Cloud secrets manager (highest priority, most secure)
   └── AWS Secrets Manager, GCP Secret Manager, Azure Key Vault

2. Platform environment variables
   └── ECS task definition, Cloud Run --set-env-vars, Railway variables

3. .env files (development only, NEVER in production images)
   └── .env.local, .env.development

4. Hardcoded defaults (lowest priority, non-sensitive values only)
   └── Default ports, feature flags, pagination limits

Secrets Management Best Practices

import os

class Config:
    """Server configuration with environment variable loading."""

    # Required secrets — fail fast if missing
    DATABASE_URL: str = os.environ["DATABASE_URL"]
    API_KEY: str = os.environ["API_KEY"]

    # Optional with defaults
    PORT: int = int(os.environ.get("PORT", "3001"))
    LOG_LEVEL: str = os.environ.get("LOG_LEVEL", "INFO")
    MAX_CONNECTIONS: int = int(os.environ.get("MAX_CONNECTIONS", "10"))

    @classmethod
    def validate(cls):
        """Validate configuration at startup."""
        required = ["DATABASE_URL", "API_KEY"]
        missing = [var for var in required if not os.environ.get(var)]
        if missing:
            raise EnvironmentError(
                f"Missing required environment variables: {', '.join(missing)}"
            )

Never Commit Secrets

Add these to your .gitignore:

.env
.env.local
.env.production
*.pem
*.key
secrets/

And use .env.example to document required variables:

# .env.example — Copy to .env and fill in values
DATABASE_URL=postgres://user:password@host:5432/dbname
API_KEY=your-api-key-here
PORT=3001
LOG_LEVEL=INFO

Scaling Strategies

Horizontal Scaling

MCP servers are stateless at the protocol level, making horizontal scaling straightforward:

                    ┌─────────────────┐
                    │  Load Balancer   │
                    │  (HTTPS + SSL)   │
                    └───────┬─────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ MCP      │ │ MCP      │ │ MCP      │
        │ Server 1 │ │ Server 2 │ │ Server 3 │
        └────┬─────┘ └────┬─────┘ └────┬─────┘
             │             │             │
             └─────────────┼─────────────┘
                           ▼
                    ┌──────────────┐
                    │   Database   │
                    │   / Redis    │
                    └──────────────┘

Key considerations for SSE transport:

  • Enable sticky sessions (session affinity) on the load balancer so SSE connections stay on the same instance
  • Use Redis for shared state if your server maintains in-memory data
  • Implement graceful shutdown to drain connections before scaling down
// Graceful shutdown
process.on("SIGTERM", async () => {
  console.error("SIGTERM received, shutting down gracefully...");

  // Stop accepting new connections
  httpServer.close();

  // Close all active MCP sessions
  for (const [id, transport] of transports) {
    await transport.close();
    transports.delete(id);
  }

  // Wait for in-flight requests to complete
  await new Promise((resolve) => setTimeout(resolve, 5000));

  process.exit(0);
});

Auto-Scaling Configuration

AWS ECS:

{
  "targetTrackingScalingPolicies": [
    {
      "targetValue": 70,
      "predefinedMetricSpecification": {
        "predefinedMetricType": "ECSServiceAverageCPUUtilization"
      },
      "scaleInCooldown": 60,
      "scaleOutCooldown": 30
    }
  ]
}

Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Monitoring and Health Checks

Health Check Endpoint

Every production MCP server needs a health check:

app.get("/health", async (req, res) => {
  const checks = {
    server: "healthy",
    uptime: process.uptime(),
    activeSessions: transports.size,
    memoryUsage: process.memoryUsage(),
    timestamp: new Date().toISOString(),
  };

  // Check database connectivity
  try {
    await db.query("SELECT 1");
    checks.database = "healthy";
  } catch (error) {
    checks.database = "unhealthy";
    res.status(503);
  }

  res.json(checks);
});

Metrics Collection

Export metrics for your monitoring platform:

from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Metrics
tool_calls_total = Counter(
    "mcp_tool_calls_total",
    "Total number of MCP tool calls",
    ["tool_name", "status"],
)

tool_call_duration = Histogram(
    "mcp_tool_call_duration_seconds",
    "Duration of MCP tool calls",
    ["tool_name"],
)

active_sessions = Gauge(
    "mcp_active_sessions",
    "Number of active MCP sessions",
)

# Start Prometheus metrics server on a separate port
start_http_server(9090)

# Use in tool handlers
@mcp.tool()
async def my_tool(query: str) -> str:
    with tool_call_duration.labels(tool_name="my_tool").time():
        try:
            result = await do_work(query)
            tool_calls_total.labels(tool_name="my_tool", status="success").inc()
            return result
        except Exception as e:
            tool_calls_total.labels(tool_name="my_tool", status="error").inc()
            raise

Alerting Rules

Set up alerts for critical conditions:

# Prometheus alerting rules
groups:
  - name: mcp-server
    rules:
      - alert: MCPServerHighErrorRate
        expr: rate(mcp_tool_calls_total{status="error"}[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on MCP server"

      - alert: MCPServerDown
        expr: up{job="mcp-server"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MCP server is down"

      - alert: MCPServerHighLatency
        expr: histogram_quantile(0.95, mcp_tool_call_duration_seconds_bucket) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MCP server P95 latency exceeds 5 seconds"

Reverse Proxy Configuration

Nginx Configuration for MCP SSE

upstream mcp_backend {
    ip_hash;  # Sticky sessions for SSE
    server mcp-server-1:3001;
    server mcp-server-2:3001;
    server mcp-server-3:3001;
}

server {
    listen 443 ssl http2;
    server_name mcp.example.com;

    ssl_certificate /etc/ssl/certs/mcp.example.com.pem;
    ssl_certificate_key /etc/ssl/private/mcp.example.com.key;

    # SSE endpoint — disable buffering
    location /sse {
        proxy_pass http://mcp_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 86400s;  # 24 hours for SSE
        proxy_set_header X-Accel-Buffering no;
    }

    # Messages endpoint
    location /messages {
        proxy_pass http://mcp_backend;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Health check
    location /health {
        proxy_pass http://mcp_backend;
    }
}

Caddy (Automatic HTTPS)

mcp.example.com {
    reverse_proxy /sse mcp-server:3001 {
        flush_interval -1
        transport http {
            read_timeout 0
        }
    }

    reverse_proxy /messages mcp-server:3001
    reverse_proxy /health mcp-server:3001
}

Caddy automatically obtains and renews SSL certificates from Let's Encrypt, making it the simplest option for HTTPS.

What to Read Next

Summary

Deploying MCP servers to production follows a well-established pattern: switch to HTTP transport (SSE or Streamable HTTP), containerize with Docker, deploy to a managed platform, and add operational infrastructure (health checks, monitoring, auto-scaling). The MCP protocol's stateless request-response model makes horizontal scaling straightforward, and the rich ecosystem of cloud platforms means you can go from a working Docker container to a production deployment in minutes.

Start with the simplest deployment that meets your needs -- Railway or Fly.io for small projects, Cloud Run or ECS Fargate for production workloads -- and add complexity (Kubernetes, custom metrics, multi-region) only when your scale demands it.

Frequently Asked Questions

What transport should I use for remote MCP servers?

Use the Streamable HTTP transport (the modern standard) or SSE (Server-Sent Events) for remote deployments. Streamable HTTP uses standard HTTP requests with optional SSE streaming for server-to-client notifications. SSE uses a persistent HTTP connection for server-to-client events and a separate POST endpoint for client-to-server messages. Both work well behind load balancers and proxies.

Can I deploy an MCP server as a Docker container?

Yes, Docker is one of the best deployment strategies for MCP servers. Create a Dockerfile that installs your dependencies, copies your server code, and sets the appropriate entry point. The container exposes an HTTP port for SSE/Streamable HTTP transport. This works on any container platform: Docker Compose, Kubernetes, AWS ECS, Google Cloud Run, etc.

How do I handle secrets and API keys in production MCP servers?

Never hardcode secrets. Use environment variables injected at deployment time. On cloud platforms, use their native secrets management: AWS Secrets Manager, Google Secret Manager, Azure Key Vault. For Kubernetes, use Secrets resources. For Docker Compose, use the secrets configuration or .env files that are not committed to version control.

Can I deploy an MCP server to a serverless platform like AWS Lambda?

Serverless platforms work with the Streamable HTTP transport pattern since each request is independent. However, SSE transport requires persistent connections, which conflict with serverless invocation models. For serverless, implement a stateless request-response pattern or use a platform like AWS Fargate or Cloud Run that supports long-running connections.

How do I scale MCP servers horizontally?

MCP servers are stateless at the protocol level (each request is independent), making them naturally scalable. Put multiple server instances behind a load balancer. If your server maintains in-memory state (like caches), use an external data store (Redis, database) so all instances share state. Use sticky sessions or session affinity for SSE connections.

Do I need SSL/TLS for remote MCP servers?

Yes, always use HTTPS for remote MCP servers. This protects the JSON-RPC messages (which may contain sensitive data) in transit. Use a reverse proxy like Nginx or Caddy for TLS termination, or leverage cloud platform load balancers that handle SSL automatically.

How do I monitor a production MCP server?

Implement health check endpoints (/health or /ready), export metrics (request count, latency, error rate) to Prometheus or your monitoring platform, use structured logging to stderr, and set up alerts for high error rates or latency. Most cloud platforms provide built-in monitoring dashboards.

What is the recommended architecture for enterprise MCP deployments?

Use a containerized server behind a load balancer with TLS termination. Implement OAuth 2.1 for authentication. Deploy to a managed container service (ECS, Cloud Run, AKS). Use a centralized logging system. Implement rate limiting at the reverse proxy level. Use separate environments (staging, production) with identical configurations.

How do I deploy an MCP server to Railway or Fly.io?

Both platforms support Docker-based deployments. For Railway, connect your Git repository and it auto-deploys on push. For Fly.io, use 'fly launch' with a Dockerfile. Both handle HTTPS, scaling, and environment variables natively. They are excellent choices for small to medium MCP server deployments.

How do I handle database connections in a containerized MCP server?

Use connection pooling to manage database connections efficiently. Pass the database connection string as an environment variable. For cloud databases, use IAM-based authentication when possible. Implement connection retry logic with exponential backoff. In Kubernetes, use init containers to wait for database readiness before starting your MCP server.

Related Articles

Related Guides