Configure fine-grained rate limits for individual MCP servers and their tools to protect expensive operations while maintaining high throughput for lightweight queries.
Configure rate limits for individual MCP servers:
[mcp.servers.my_api]
url = "https://api.example.com/mcp"
[mcp.servers.my_api.rate_limits]
limit = 50
interval = "60s"
Set rate limits on individual tools within an MCP server:
[mcp.servers.my_api.rate_limits.tools]
expensive_operation = { limit = 10, interval = "60s" }
bulk_process = { limit = 5, interval = "300s" }
standard_query = { limit = 100, interval = "60s" }
Rate limits are evaluated in the following order (most to least restrictive):
- Tool-specific limits - Most granular control
- MCP server limits - Per-server restrictions
- Per-IP limits - Configured at server level
- Global limits - Overall system limits
All applicable limits are enforced - a request must pass all rate limit checks to succeed.
Different limits for different operation costs:
[mcp.servers.data_api]
url = "https://data.example.com/mcp"
# Overall server limit
[mcp.servers.data_api.rate_limits]
limit = 1000
interval = "60s"
# Tool-specific limits
[mcp.servers.data_api.rate_limits.tools]
# Expensive data exports
export_full_dataset = { limit = 5, interval = "3600s" } # 5 per hour
generate_report = { limit = 20, interval = "3600s" } # 20 per hour
# Moderate operations
bulk_update = { limit = 50, interval = "600s" } # 50 per 10 min
complex_search = { limit = 100, interval = "300s" } # 100 per 5 min
# Lightweight queries
simple_lookup = { limit = 1000, interval = "60s" } # 1000 per minute
get_status = { limit = 2000, interval = "60s" } # 2000 per minute
Protect database resources:
[mcp.servers.database]
cmd = ["psql-mcp"]
env = { PGHOST = "{{ env.DB_HOST }}" }
# Conservative server limit
[mcp.servers.database.rate_limits]
limit = 100
interval = "60s"
[mcp.servers.database.rate_limits.tools]
# DDL operations - very restricted
create_table = { limit = 2, interval = "3600s" }
drop_table = { limit = 1, interval = "3600s" }
alter_schema = { limit = 5, interval = "3600s" }
# Write operations - moderate limits
insert_batch = { limit = 50, interval = "60s" }
update_records = { limit = 100, interval = "60s" }
delete_records = { limit = 20, interval = "60s" }
# Read operations - generous limits
select_query = { limit = 500, interval = "60s" }
count_records = { limit = 1000, interval = "60s" }
Manage compute-intensive operations:
[mcp.servers.ml_service]
url = "https://ml.example.com/mcp"
[mcp.servers.ml_service.rate_limits]
limit = 200
interval = "60s"
[mcp.servers.ml_service.rate_limits.tools]
# Training operations - very limited
train_model = { limit = 2, interval = "86400s" } # 2 per day
fine_tune = { limit = 5, interval = "86400s" } # 5 per day
# Inference operations - moderate
batch_inference = { limit = 20, interval = "3600s" } # 20 per hour
single_inference = { limit = 100, interval = "60s" } # 100 per minute
# Preprocessing - generous
tokenize_text = { limit = 1000, interval = "60s" }
validate_input = { limit = 2000, interval = "60s" }
Respect third-party rate limits:
[mcp.servers.github]
url = "https://api.github.com/mcp"
[mcp.servers.github.auth]
token = "{{ env.GITHUB_TOKEN }}"
# Match GitHub's rate limits
[mcp.servers.github.rate_limits]
limit = 5000 # GitHub's limit for authenticated requests
interval = "3600s" # Per hour
[mcp.servers.github.rate_limits.tools]
# GraphQL has separate limit
graphql_query = { limit = 5000, interval = "3600s" }
# Search has stricter limit
search_code = { limit = 30, interval = "60s" }
search_issues = { limit = 30, interval = "60s" }
# Regular API calls
get_repo = { limit = 5000, interval = "3600s" }
list_issues = { limit = 5000, interval = "3600s" }
Supported interval formats:
"60s"
- 60 seconds"5m"
- 5 minutes"1h"
- 1 hour"24h"
- 24 hours"300s"
- 300 seconds"3600s"
- 3600 seconds (1 hour)"86400s"
- 86400 seconds (1 day)
Align limits with operation costs:
[mcp.servers.billing_api.rate_limits.tools]
# $0.001 per call
cheap_operation = { limit = 10000, interval = "3600s" }
# $0.01 per call
standard_operation = { limit = 1000, interval = "3600s" }
# $0.10 per call
expensive_operation = { limit = 100, interval = "3600s" }
# $1.00 per call
premium_operation = { limit = 10, interval = "3600s" }
Based on system resources:
[mcp.servers.compute.rate_limits.tools]
# CPU intensive
cpu_heavy = { limit = 10, interval = "60s" }
# Memory intensive
memory_heavy = { limit = 20, interval = "60s" }
# I/O intensive
io_heavy = { limit = 50, interval = "60s" }
# Network intensive
network_heavy = { limit = 100, interval = "60s" }
Combine with client identification:
# Note: This is conceptual - actual implementation
# would use client identification features
[mcp.servers.premium_api.rate_limits]
# Default limits
limit = 100
interval = "60s"
[mcp.servers.premium_api.rate_limits.tools]
# Tool limits apply to all users
premium_feature = { limit = 10, interval = "60s" }
# User tiers would be handled by client identification
# See /docs/configuration/server/client-identification
When a rate limit is exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 42
Begin with lower limits and increase based on usage:
# Start conservative
[mcp.servers.new_api.rate_limits]
limit = 10
interval = "60s"
# Increase after monitoring
[mcp.servers.new_api.rate_limits]
limit = 100 # Increased after testing
interval = "60s"
Track which tools are called most frequently:
[mcp.servers.api.rate_limits.tools]
# Adjust based on actual usage patterns
frequently_used = { limit = 1000, interval = "60s" }
rarely_used = { limit = 10, interval = "60s" }
Set stricter limits for expensive operations:
[mcp.servers.api.rate_limits.tools]
# Free operations
ping = { limit = 10000, interval = "60s" }
# Cheap operations
read_cache = { limit = 1000, interval = "60s" }
# Expensive operations
generate_report = { limit = 10, interval = "3600s" }
run_analysis = { limit = 5, interval = "3600s" }
Match intervals to operation characteristics:
[mcp.servers.api.rate_limits.tools]
# Burst protection - short interval
quick_query = { limit = 100, interval = "10s" }
# Sustained load - medium interval
normal_operation = { limit = 500, interval = "300s" }
# Daily quotas - long interval
expensive_job = { limit = 10, interval = "86400s" }
Verify limits work as expected:
# Test tool-specific limit
for i in {1..20}; do
curl -X POST http://localhost:8000/mcp \
-d '{"tool": "expensive_operation"}'
sleep 1
done
# Should see 429 after limit is reached
Rate limits use the configured storage backend:
# Memory storage (default)
[server.rate_limits]
storage = "memory"
# Redis storage (recommended for production)
[server.rate_limits]
storage = { type = "redis", url = "redis://localhost:6379" }
See Server Rate Limiting for storage configuration details.
- Verify tool names match exactly
- Check interval format is correct
- Ensure storage backend is configured
- Review debug logs for rate limit evaluation
- Check all applicable rate limits (tool, server, IP, global)
- Verify interval and limit values
- Look for typos in tool names
- Monitor actual usage patterns
- Use Redis for distributed deployments
- Consider longer intervals for expensive checks
- Monitor rate limiter overhead
- Optimize storage backend configuration
- Configure Server-Level Rate Limits
- Set up monitoring for rate limit metrics
- Review Best Practices for production