Today we're excited to announce Nexus 0.3.0, a significant release that introduces stateless user identification and token-based rate limiting for LLM providers. This release gives administrators fine-grained control over AI resource consumption while maintaining the flexibility and scalability that Nexus is known for.
A key addition in this release is our completely stateless client identification system. Unlike traditional session-based approaches, Nexus extracts all user context directly from request headers, eliminating the need for server-side session storage.
This stateless architecture brings several benefits:
- Horizontal scaling without session synchronization
- Zero session storage requirements (no Redis or database needed for sessions)
- Resilience to server restarts with no session loss
- Perfect fit for containerized and serverless deployments
The system automatically extracts identity from JWT tokens, or custom headers, with support for standard claims like sub
and email
, as well as custom group claims for membership validation.
The centerpiece of this release is our new token-based rate limiting system for LLM providers. As AI usage grows within organizations, controlling costs and preventing abuse becomes critical. Nexus 0.3.0 addresses this with a sophisticated hierarchical rate limiting system that works at both the provider and individual model level.
For the quota system to work correctly, you must first configure client identification:
[server.client_identification]
enabled = true
client_id.http_header = "X-User-Id"
This configuration works well in private installations where you have control over the request headers. Alternatively the client_id
, and group_id
can be taken from JWT claims:
[server.client_identification]
enabled = true
client_id.jwt_claim = "sub"
Additionally, you can configure a set of groups/tiers for the user:
[server.client_identification]
enabled = true
client_id.http_header = "X-Group-Id"
# or
group_id.jwt_claim = "grp"
[server.client_identification.validation]
group_values = ["premium", "basic"]
Now every request must either have a valid client_id, and group_id that is one of the configured values. Without these values, the request will get rejected with HTTP status code 400.
After defining the client identification settings, you can configure rate limits for different groups:
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 1000
interval = "60s"
[llm.providers.openai.rate_limits.per_user.groups.premium]
input_token_limit = 10000
interval = "60s"
[llm.providers.openai.rate_limits.per_user.groups.basic]
input_token_limit = 500
interval = "60s"
The system uses a hierarchical resolution order, checking model-specific limits first, then falling back to provider-level defaults. This gives you maximum flexibility in how you allocate AI resources across your organization.
The token counting is an approximation. We calculate the input token count with the cl100k_base
tokenizer, which gives us a close enough estimate for most use cases. Keep in mind that the output tokens are not counted, so keep your token limits a bit lower and monitor usage closely.
Security and cost control get another boost with our new requirement for explicit model configuration. Every LLM provider must now have its models explicitly configured, preventing accidental access to unintended or expensive models:
[llm.providers.openai]
type = "openai"
api_key = "sk-..."
# Models must be explicitly configured
[llm.providers.openai.models."gpt-4o"]
rename = "gpt-4o-0123" # Map to the actual model name
[llm.providers.openai.models."gpt-4o-mini"]
This ensures that only approved models are accessible through your Nexus router, giving you complete control over which AI models your organization uses.
This release also brings enhanced configuration validation with clearer error messages at startup. When something is misconfigured, you'll know exactly what needs to be fixed before your service even starts.
We've also added over 1,000 lines of integration tests covering edge cases, boundary conditions, and distributed scenarios, ensuring that these new features work reliably under all conditions.
We're continuing to enhance Nexus's capabilities as an intelligent AI router. Future releases will focus on advanced routing strategies, enhanced monitoring and observability, and deeper integration with enterprise authentication systems.
Nexus 0.3.0 is available now. Check out the documentation for complete details, including migration steps and configuration examples.
We'd love to hear your feedback as you put these new features to use. Join us on GitHub to report issues, contribute, or just let us know how Nexus is working for your organization.
Happy routing!