Nexus provides rate limiting capabilities to protect your server from abuse and ensure fair resource usage. You can configure global and per-IP rate limits at the server level.

Enable rate limiting in your nexus.toml:

[server.rate_limits] enabled = true storage = "memory" # or use Redis for distributed rate limiting [server.rate_limits.global] limit = 1000 interval = "60s" [server.rate_limits.per_ip] limit = 100 interval = "60s"
  • enabled: Enable or disable rate limiting (default: false)
  • storage: Storage backend - either "memory" (default) or a Redis configuration

Global Limits (server.rate_limits.global)

  • limit: Maximum requests across all clients
  • interval: Time window (e.g., "60s", "5m", "1h")

Per-IP Limits (server.rate_limits.per_ip)

  • limit: Maximum requests per IP address
  • interval: Time window

Uses an in-memory rate limiter, suitable for single-instance deployments:

[server.rate_limits] storage = "memory"

For distributed rate limiting across multiple Nexus instances:

[server.rate_limits] storage = { type = "redis", url = "redis://localhost:6379" }
  • url: Redis connection URL (default: "redis://localhost:6379/0")
  • key_prefix: Prefix for rate limit keys (default: "nexus:rate_limits:")
  • pool.max_size: Maximum connection pool size (default: 16)
  • pool.min_idle: Minimum idle connections (default: 0)
  • pool.timeout_create: Timeout for creating connections (optional)
  • pool.timeout_wait: Timeout for waiting for a connection (optional)
  • pool.timeout_recycle: Timeout for recycling connections (optional)
  • tls.enabled: Enable TLS connection (default: false)
  • tls.insecure: Skip TLS certificate verification (optional)
  • tls.ca_cert_path: Path to CA certificate (optional)
  • tls.client_cert_path: Path to client certificate (optional)
  • tls.client_key_path: Path to client private key (optional)
  • response_timeout: Timeout for Redis responses (optional)
  • connection_timeout: Timeout for Redis connections (optional)
[server.rate_limits] storage = { type = "redis", url = "redis://username:password@redis.example.com:6379/0", key_prefix = "nexus:rate_limits:prod:", response_timeout = "10s", connection_timeout = "10s" } # With connection pool configuration [server.rate_limits.storage.pool] max_size = 20 min_idle = 5 timeout_create = "5s" timeout_wait = "5s" # With TLS configuration [server.rate_limits.storage.tls] enabled = true ca_cert_path = "/etc/ssl/certs/redis-ca.pem"

When a client exceeds the rate limit, Nexus responds with:

  • HTTP status code 429 Too Many Requests
  • Retry-After header indicating when the client can retry (in seconds)

For more granular control, you can also configure:

Limit requests to specific MCP servers:

[mcp.servers.expensive_api] url = "https://api.example.com/mcp" [mcp.servers.expensive_api.rate_limits] limit = 10 interval = "60s"

Limit usage of specific tools within MCP servers:

[mcp.servers.my_api.rate_limits.tools] expensive_operation = { limit = 5, interval = "300s" } bulk_process = { limit = 2, interval = "600s" }

See MCP rate limiting for details.

Limit token consumption for AI models:

[llm.providers.openai.rate_limits.per_user] input_token_limit = 100000 interval = "60s"

See LLM rate limiting for details.

Rate limits are evaluated in the following order:

Server-level limits (checked first via middleware):

  1. Global limits - total requests across all clients
  2. Per-IP limits - requests per IP address

Module-specific limits (checked after server limits pass):

For MCP requests:

  1. Tool-specific limits (most specific)
  2. MCP server limits (least specific)

For LLM requests:

  1. Model-specific token limits with user group (most specific)
  2. Model-specific token limits
  3. Provider-level token limits with user group
  4. Provider-level token limits (least specific)

All applicable limits are enforced - a request must pass all rate limit checks to succeed. Server middleware limits are checked first; if they pass, then the request proceeds to module-specific limit checks.

  1. Start Conservative: Begin with lower limits and increase based on usage patterns
  2. Monitor Usage: Track rate limit hits to identify patterns
  3. Use Redis in Production: For multi-instance deployments
  4. Different Intervals: Use appropriate time windows for different resources
  5. Test Limits: Verify configuration before production deployment
  6. Log Rate Limit Events: Monitor for potential abuse or misconfiguration

Monitor these metrics to optimize rate limiting:

  • Rate limit hit frequency by IP
  • Top IPs hitting rate limits
  • Rate limit effectiveness (blocked vs allowed requests)
  • Redis connection pool utilization (if using Redis)
© Grafbase, Inc.