MCP Rate Limiting

Configure fine-grained rate limits for individual MCP servers and their tools to protect expensive operations while maintaining high throughput for lightweight queries.

Server-Level Rate Limits

Configure rate limits for individual MCP servers:


[mcp.servers.my_api]
url = "https://api.example.com/mcp"

[mcp.servers.my_api.rate_limits]
limit = 50
interval = "60s"

Tool-Specific Rate Limits

Set rate limits on individual tools within an MCP server:


[mcp.servers.my_api.rate_limits.tools]
expensive_operation = { limit = 10, interval = "60s" }
bulk_process = { limit = 5, interval = "300s" }
standard_query = { limit = 100, interval = "60s" }

Rate Limit Precedence

Rate limits are evaluated in the following order (most to least restrictive):

Tool-specific limits - Most granular control
MCP server limits - Per-server restrictions
Per-IP limits - Configured at server level
Global limits - Overall system limits

All applicable limits are enforced - a request must pass all rate limit checks to succeed.

Configuration Examples

API with Tiered Operations

Different limits for different operation costs:


[mcp.servers.data_api]
url = "https://data.example.com/mcp"

# Overall server limit
[mcp.servers.data_api.rate_limits]
limit = 1000
interval = "60s"

# Tool-specific limits
[mcp.servers.data_api.rate_limits.tools]
# Expensive data exports
export_full_dataset = { limit = 5, interval = "3600s" }  # 5 per hour
generate_report = { limit = 20, interval = "3600s" }      # 20 per hour

# Moderate operations
bulk_update = { limit = 50, interval = "600s" }           # 50 per 10 min
complex_search = { limit = 100, interval = "300s" }       # 100 per 5 min

# Lightweight queries
simple_lookup = { limit = 1000, interval = "60s" }        # 1000 per minute
get_status = { limit = 2000, interval = "60s" }           # 2000 per minute

Database Operations

Protect database resources:


[mcp.servers.database]
cmd = ["psql-mcp"]
env = { PGHOST = "{{ env.DB_HOST }}" }

# Conservative server limit
[mcp.servers.database.rate_limits]
limit = 100
interval = "60s"

[mcp.servers.database.rate_limits.tools]
# DDL operations - very restricted
create_table = { limit = 2, interval = "3600s" }
drop_table = { limit = 1, interval = "3600s" }
alter_schema = { limit = 5, interval = "3600s" }

# Write operations - moderate limits
insert_batch = { limit = 50, interval = "60s" }
update_records = { limit = 100, interval = "60s" }
delete_records = { limit = 20, interval = "60s" }

# Read operations - generous limits
select_query = { limit = 500, interval = "60s" }
count_records = { limit = 1000, interval = "60s" }

AI/ML Services

Manage compute-intensive operations:


[mcp.servers.ml_service]
url = "https://ml.example.com/mcp"

[mcp.servers.ml_service.rate_limits]
limit = 200
interval = "60s"

[mcp.servers.ml_service.rate_limits.tools]
# Training operations - very limited
train_model = { limit = 2, interval = "86400s" }      # 2 per day
fine_tune = { limit = 5, interval = "86400s" }        # 5 per day

# Inference operations - moderate
batch_inference = { limit = 20, interval = "3600s" }  # 20 per hour
single_inference = { limit = 100, interval = "60s" }  # 100 per minute

# Preprocessing - generous
tokenize_text = { limit = 1000, interval = "60s" }
validate_input = { limit = 2000, interval = "60s" }

External API Integration

Respect third-party rate limits:


[mcp.servers.github]
url = "https://api.github.com/mcp"
[mcp.servers.github.auth]
token = "{{ env.GITHUB_TOKEN }}"

# Match GitHub's rate limits
[mcp.servers.github.rate_limits]
limit = 5000  # GitHub's limit for authenticated requests
interval = "3600s"  # Per hour

[mcp.servers.github.rate_limits.tools]
# GraphQL has separate limit
graphql_query = { limit = 5000, interval = "3600s" }

# Search has stricter limit
search_code = { limit = 30, interval = "60s" }
search_issues = { limit = 30, interval = "60s" }

# Regular API calls
get_repo = { limit = 5000, interval = "3600s" }
list_issues = { limit = 5000, interval = "3600s" }

Time Interval Formats

Supported interval formats:

"60s" - 60 seconds
"5m" - 5 minutes
"1h" - 1 hour
"24h" - 24 hours
"300s" - 300 seconds
"3600s" - 3600 seconds (1 hour)
"86400s" - 86400 seconds (1 day)

Rate Limit Strategies

Cost-Based Limiting

Align limits with operation costs:


[mcp.servers.billing_api.rate_limits.tools]
# $0.001 per call
cheap_operation = { limit = 10000, interval = "3600s" }

# $0.01 per call
standard_operation = { limit = 1000, interval = "3600s" }

# $0.10 per call
expensive_operation = { limit = 100, interval = "3600s" }

# $1.00 per call
premium_operation = { limit = 10, interval = "3600s" }

Resource-Based Limiting

Based on system resources:


[mcp.servers.compute.rate_limits.tools]
# CPU intensive
cpu_heavy = { limit = 10, interval = "60s" }

# Memory intensive
memory_heavy = { limit = 20, interval = "60s" }

# I/O intensive
io_heavy = { limit = 50, interval = "60s" }

# Network intensive
network_heavy = { limit = 100, interval = "60s" }

User Tier Integration

Combine with client identification:


# Note: This is conceptual - actual implementation
# would use client identification features

[mcp.servers.premium_api.rate_limits]
# Default limits
limit = 100
interval = "60s"

[mcp.servers.premium_api.rate_limits.tools]
# Tool limits apply to all users
premium_feature = { limit = 10, interval = "60s" }

# User tiers would be handled by client identification
# See /docs/configuration/server/client-identification

Monitoring Rate Limits

Rate Limit Exceeded Response

When a rate limit is exceeded:


HTTP/1.1 429 Too Many Requests
Retry-After: 42

Best Practices

1. Start Conservative

Begin with lower limits and increase based on usage:


# Start conservative
[mcp.servers.new_api.rate_limits]
limit = 10
interval = "60s"

# Increase after monitoring
[mcp.servers.new_api.rate_limits]
limit = 100  # Increased after testing
interval = "60s"

2. Monitor Tool Usage

Track which tools are called most frequently:


[mcp.servers.api.rate_limits.tools]
# Adjust based on actual usage patterns
frequently_used = { limit = 1000, interval = "60s" }
rarely_used = { limit = 10, interval = "60s" }

3. Consider Operation Cost

Set stricter limits for expensive operations:


[mcp.servers.api.rate_limits.tools]
# Free operations
ping = { limit = 10000, interval = "60s" }

# Cheap operations
read_cache = { limit = 1000, interval = "60s" }

# Expensive operations
generate_report = { limit = 10, interval = "3600s" }
run_analysis = { limit = 5, interval = "3600s" }

4. Use Different Intervals

Match intervals to operation characteristics:


[mcp.servers.api.rate_limits.tools]
# Burst protection - short interval
quick_query = { limit = 100, interval = "10s" }

# Sustained load - medium interval
normal_operation = { limit = 500, interval = "300s" }

# Daily quotas - long interval
expensive_job = { limit = 10, interval = "86400s" }

5. Test Rate Limits

Verify limits work as expected:


# Test tool-specific limit
for i in {1..20}; do
  curl -X POST http://localhost:8000/mcp \
    -d '{"tool": "expensive_operation"}'
  sleep 1
done

# Should see 429 after limit is reached

Storage Backend

Rate limits use the configured storage backend:


# Memory storage (default)
[server.rate_limits]
storage = "memory"

# Redis storage (recommended for production)
[server.rate_limits]
storage = { type = "redis", url = "redis://localhost:6379" }

See Server Rate Limiting for storage configuration details.

Troubleshooting

Rate Limits Not Working

Verify tool names match exactly
Check interval format is correct
Ensure storage backend is configured
Review debug logs for rate limit evaluation

Unexpected 429 Errors

Check all applicable rate limits (tool, server, IP, global)
Verify interval and limit values
Look for typos in tool names
Monitor actual usage patterns

Performance Impact

Use Redis for distributed deployments
Consider longer intervals for expensive checks
Monitor rate limiter overhead
Optimize storage backend configuration

Next Steps

Configure Server-Level Rate Limits
Set up monitoring for rate limit metrics
Review Best Practices for production