Nexus provides rate limiting capabilities to protect your server from abuse and ensure fair resource usage. You can configure global and per-IP rate limits at the server level.
Enable rate limiting in your nexus.toml
:
[server.rate_limits]
enabled = true
storage = "memory" # or use Redis for distributed rate limiting
[server.rate_limits.global]
limit = 1000
interval = "60s"
[server.rate_limits.per_ip]
limit = 100
interval = "60s"
enabled
: Enable or disable rate limiting (default:false
)storage
: Storage backend - either"memory"
(default) or a Redis configuration
Global Limits (server.rate_limits.global
)
limit
: Maximum requests across all clientsinterval
: Time window (e.g., "60s", "5m", "1h")
Per-IP Limits (server.rate_limits.per_ip
)
limit
: Maximum requests per IP addressinterval
: Time window
Uses an in-memory rate limiter, suitable for single-instance deployments:
[server.rate_limits]
storage = "memory"
For distributed rate limiting across multiple Nexus instances:
[server.rate_limits]
storage = { type = "redis", url = "redis://localhost:6379" }
url
: Redis connection URL (default:"redis://localhost:6379/0"
)key_prefix
: Prefix for rate limit keys (default:"nexus:rate_limits:"
)pool.max_size
: Maximum connection pool size (default: 16)pool.min_idle
: Minimum idle connections (default: 0)pool.timeout_create
: Timeout for creating connections (optional)pool.timeout_wait
: Timeout for waiting for a connection (optional)pool.timeout_recycle
: Timeout for recycling connections (optional)tls.enabled
: Enable TLS connection (default:false
)tls.insecure
: Skip TLS certificate verification (optional)tls.ca_cert_path
: Path to CA certificate (optional)tls.client_cert_path
: Path to client certificate (optional)tls.client_key_path
: Path to client private key (optional)response_timeout
: Timeout for Redis responses (optional)connection_timeout
: Timeout for Redis connections (optional)
[server.rate_limits]
storage = {
type = "redis",
url = "redis://username:password@redis.example.com:6379/0",
key_prefix = "nexus:rate_limits:prod:",
response_timeout = "10s",
connection_timeout = "10s"
}
# With connection pool configuration
[server.rate_limits.storage.pool]
max_size = 20
min_idle = 5
timeout_create = "5s"
timeout_wait = "5s"
# With TLS configuration
[server.rate_limits.storage.tls]
enabled = true
ca_cert_path = "/etc/ssl/certs/redis-ca.pem"
When a client exceeds the rate limit, Nexus responds with:
- HTTP status code
429 Too Many Requests
Retry-After
header indicating when the client can retry (in seconds)
For more granular control, you can also configure:
Limit requests to specific MCP servers:
[mcp.servers.expensive_api]
url = "https://api.example.com/mcp"
[mcp.servers.expensive_api.rate_limits]
limit = 10
interval = "60s"
Limit usage of specific tools within MCP servers:
[mcp.servers.my_api.rate_limits.tools]
expensive_operation = { limit = 5, interval = "300s" }
bulk_process = { limit = 2, interval = "600s" }
See MCP rate limiting for details.
Limit token consumption for AI models:
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000
interval = "60s"
See LLM rate limiting for details.
Rate limits are evaluated in the following order:
Server-level limits (checked first via middleware):
- Global limits - total requests across all clients
- Per-IP limits - requests per IP address
Module-specific limits (checked after server limits pass):
For MCP requests:
- Tool-specific limits (most specific)
- MCP server limits (least specific)
For LLM requests:
- Model-specific token limits with user group (most specific)
- Model-specific token limits
- Provider-level token limits with user group
- Provider-level token limits (least specific)
All applicable limits are enforced - a request must pass all rate limit checks to succeed. Server middleware limits are checked first; if they pass, then the request proceeds to module-specific limit checks.
- Start Conservative: Begin with lower limits and increase based on usage patterns
- Monitor Usage: Track rate limit hits to identify patterns
- Use Redis in Production: For multi-instance deployments
- Different Intervals: Use appropriate time windows for different resources
- Test Limits: Verify configuration before production deployment
- Log Rate Limit Events: Monitor for potential abuse or misconfiguration
Monitor these metrics to optimize rate limiting:
- Rate limit hit frequency by IP
- Top IPs hitting rate limits
- Rate limit effectiveness (blocked vs allowed requests)
- Redis connection pool utilization (if using Redis)
- Enable Client Identification for per-user rate limiting
- Configure CORS for browser-based clients
- Set up CSRF Protection for additional security