Configure fine-grained rate limits for individual MCP servers and their tools to protect expensive operations while maintaining high throughput for lightweight queries.

Configure rate limits for individual MCP servers:

[mcp.servers.my_api] url = "https://api.example.com/mcp" [mcp.servers.my_api.rate_limits] limit = 50 interval = "60s"

Set rate limits on individual tools within an MCP server:

[mcp.servers.my_api.rate_limits.tools] expensive_operation = { limit = 10, interval = "60s" } bulk_process = { limit = 5, interval = "300s" } standard_query = { limit = 100, interval = "60s" }

Rate limits are evaluated in the following order (most to least restrictive):

  1. Tool-specific limits - Most granular control
  2. MCP server limits - Per-server restrictions
  3. Per-IP limits - Configured at server level
  4. Global limits - Overall system limits

All applicable limits are enforced - a request must pass all rate limit checks to succeed.

Different limits for different operation costs:

[mcp.servers.data_api] url = "https://data.example.com/mcp" # Overall server limit [mcp.servers.data_api.rate_limits] limit = 1000 interval = "60s" # Tool-specific limits [mcp.servers.data_api.rate_limits.tools] # Expensive data exports export_full_dataset = { limit = 5, interval = "3600s" } # 5 per hour generate_report = { limit = 20, interval = "3600s" } # 20 per hour # Moderate operations bulk_update = { limit = 50, interval = "600s" } # 50 per 10 min complex_search = { limit = 100, interval = "300s" } # 100 per 5 min # Lightweight queries simple_lookup = { limit = 1000, interval = "60s" } # 1000 per minute get_status = { limit = 2000, interval = "60s" } # 2000 per minute

Protect database resources:

[mcp.servers.database] cmd = ["psql-mcp"] env = { PGHOST = "{{ env.DB_HOST }}" } # Conservative server limit [mcp.servers.database.rate_limits] limit = 100 interval = "60s" [mcp.servers.database.rate_limits.tools] # DDL operations - very restricted create_table = { limit = 2, interval = "3600s" } drop_table = { limit = 1, interval = "3600s" } alter_schema = { limit = 5, interval = "3600s" } # Write operations - moderate limits insert_batch = { limit = 50, interval = "60s" } update_records = { limit = 100, interval = "60s" } delete_records = { limit = 20, interval = "60s" } # Read operations - generous limits select_query = { limit = 500, interval = "60s" } count_records = { limit = 1000, interval = "60s" }

Manage compute-intensive operations:

[mcp.servers.ml_service] url = "https://ml.example.com/mcp" [mcp.servers.ml_service.rate_limits] limit = 200 interval = "60s" [mcp.servers.ml_service.rate_limits.tools] # Training operations - very limited train_model = { limit = 2, interval = "86400s" } # 2 per day fine_tune = { limit = 5, interval = "86400s" } # 5 per day # Inference operations - moderate batch_inference = { limit = 20, interval = "3600s" } # 20 per hour single_inference = { limit = 100, interval = "60s" } # 100 per minute # Preprocessing - generous tokenize_text = { limit = 1000, interval = "60s" } validate_input = { limit = 2000, interval = "60s" }

Respect third-party rate limits:

[mcp.servers.github] url = "https://api.github.com/mcp" [mcp.servers.github.auth] token = "{{ env.GITHUB_TOKEN }}" # Match GitHub's rate limits [mcp.servers.github.rate_limits] limit = 5000 # GitHub's limit for authenticated requests interval = "3600s" # Per hour [mcp.servers.github.rate_limits.tools] # GraphQL has separate limit graphql_query = { limit = 5000, interval = "3600s" } # Search has stricter limit search_code = { limit = 30, interval = "60s" } search_issues = { limit = 30, interval = "60s" } # Regular API calls get_repo = { limit = 5000, interval = "3600s" } list_issues = { limit = 5000, interval = "3600s" }

Supported interval formats:

  • "60s" - 60 seconds
  • "5m" - 5 minutes
  • "1h" - 1 hour
  • "24h" - 24 hours
  • "300s" - 300 seconds
  • "3600s" - 3600 seconds (1 hour)
  • "86400s" - 86400 seconds (1 day)

Align limits with operation costs:

[mcp.servers.billing_api.rate_limits.tools] # $0.001 per call cheap_operation = { limit = 10000, interval = "3600s" } # $0.01 per call standard_operation = { limit = 1000, interval = "3600s" } # $0.10 per call expensive_operation = { limit = 100, interval = "3600s" } # $1.00 per call premium_operation = { limit = 10, interval = "3600s" }

Based on system resources:

[mcp.servers.compute.rate_limits.tools] # CPU intensive cpu_heavy = { limit = 10, interval = "60s" } # Memory intensive memory_heavy = { limit = 20, interval = "60s" } # I/O intensive io_heavy = { limit = 50, interval = "60s" } # Network intensive network_heavy = { limit = 100, interval = "60s" }

Combine with client identification:

# Note: This is conceptual - actual implementation # would use client identification features [mcp.servers.premium_api.rate_limits] # Default limits limit = 100 interval = "60s" [mcp.servers.premium_api.rate_limits.tools] # Tool limits apply to all users premium_feature = { limit = 10, interval = "60s" } # User tiers would be handled by client identification # See /docs/configuration/server/client-identification

When a rate limit is exceeded:

HTTP/1.1 429 Too Many Requests Retry-After: 42

Begin with lower limits and increase based on usage:

# Start conservative [mcp.servers.new_api.rate_limits] limit = 10 interval = "60s" # Increase after monitoring [mcp.servers.new_api.rate_limits] limit = 100 # Increased after testing interval = "60s"

Track which tools are called most frequently:

[mcp.servers.api.rate_limits.tools] # Adjust based on actual usage patterns frequently_used = { limit = 1000, interval = "60s" } rarely_used = { limit = 10, interval = "60s" }

Set stricter limits for expensive operations:

[mcp.servers.api.rate_limits.tools] # Free operations ping = { limit = 10000, interval = "60s" } # Cheap operations read_cache = { limit = 1000, interval = "60s" } # Expensive operations generate_report = { limit = 10, interval = "3600s" } run_analysis = { limit = 5, interval = "3600s" }

Match intervals to operation characteristics:

[mcp.servers.api.rate_limits.tools] # Burst protection - short interval quick_query = { limit = 100, interval = "10s" } # Sustained load - medium interval normal_operation = { limit = 500, interval = "300s" } # Daily quotas - long interval expensive_job = { limit = 10, interval = "86400s" }

Verify limits work as expected:

# Test tool-specific limit for i in {1..20}; do curl -X POST http://localhost:8000/mcp \ -d '{"tool": "expensive_operation"}' sleep 1 done # Should see 429 after limit is reached

Rate limits use the configured storage backend:

# Memory storage (default) [server.rate_limits] storage = "memory" # Redis storage (recommended for production) [server.rate_limits] storage = { type = "redis", url = "redis://localhost:6379" }

See Server Rate Limiting for storage configuration details.

  • Verify tool names match exactly
  • Check interval format is correct
  • Ensure storage backend is configured
  • Review debug logs for rate limit evaluation
  • Check all applicable rate limits (tool, server, IP, global)
  • Verify interval and limit values
  • Look for typos in tool names
  • Monitor actual usage patterns
  • Use Redis for distributed deployments
  • Consider longer intervals for expensive checks
  • Monitor rate limiter overhead
  • Optimize storage backend configuration
© Grafbase, Inc.