User Identification and LLM Rate Limits

Today we're excited to announce Nexus 0.3.0, a significant release that introduces stateless user identification and token-based rate limiting for LLM providers. This release gives administrators fine-grained control over AI resource consumption while maintaining the flexibility and scalability that Nexus is known for.

Stateless Client Identification

A key addition in this release is our completely stateless client identification system. Unlike traditional session-based approaches, Nexus extracts all user context directly from request headers, eliminating the need for server-side session storage.

This stateless architecture brings several benefits:

Horizontal scaling without session synchronization
Zero session storage requirements (no Redis or database needed for sessions)
Resilience to server restarts with no session loss
Perfect fit for containerized and serverless deployments

The system automatically extracts identity from JWT tokens, or custom headers, with support for standard claims like sub and email, as well as custom group claims for membership validation.

Token-Based Rate Limiting

The centerpiece of this release is our new token-based rate limiting system for LLM providers. As AI usage grows within organizations, controlling costs and preventing abuse becomes critical. Nexus 0.3.0 addresses this with a sophisticated hierarchical rate limiting system that works at both the provider and individual model level.

Per-User and Group-Based Quotas

For the quota system to work correctly, you must first configure client identification:


[server.client_identification]
enabled = true
client_id.http_header = "X-User-Id"

This configuration works well in private installations where you have control over the request headers. Alternatively the client_id, and group_id can be taken from JWT claims:


[server.client_identification]
enabled = true
client_id.jwt_claim = "sub"

Additionally, you can configure a set of groups/tiers for the user:


[server.client_identification]
enabled = true

client_id.http_header = "X-Group-Id"
# or
group_id.jwt_claim = "grp"

[server.client_identification.validation]
group_values = ["premium", "basic"]

Now every request must either have a valid client_id, and group_id that is one of the configured values. Without these values, the request will get rejected with HTTP status code 400.

After defining the client identification settings, you can configure rate limits for different groups:


[llm.providers.openai.rate_limits.per_user]
input_token_limit = 1000
interval = "60s"

[llm.providers.openai.rate_limits.per_user.groups.premium]
input_token_limit = 10000
interval = "60s"

[llm.providers.openai.rate_limits.per_user.groups.basic]
input_token_limit = 500
interval = "60s"

The system uses a hierarchical resolution order, checking model-specific limits first, then falling back to provider-level defaults. This gives you maximum flexibility in how you allocate AI resources across your organization.

The token counting is an approximation. We calculate the input token count with the cl100k_base tokenizer, which gives us a close enough estimate for most use cases. Keep in mind that the output tokens are not counted, so keep your token limits a bit lower and monitor usage closely.

Explicit Model Configuration

Security and cost control get another boost with our new requirement for explicit model configuration. Every LLM provider must now have its models explicitly configured, preventing accidental access to unintended or expensive models:


[llm.providers.openai]
type = "openai"
api_key = "sk-..."

# Models must be explicitly configured
[llm.providers.openai.models."gpt-4o"]
rename = "gpt-4o-0123"  # Map to the actual model name

[llm.providers.openai.models."gpt-4o-mini"]

This ensures that only approved models are accessible through your Nexus router, giving you complete control over which AI models your organization uses.

Happy routing!

User Identification and LLM Rate Limits

Stateless Client Identification

Token-Based Rate Limiting

Per-User and Group-Based Quotas

Explicit Model Configuration

Improved Developer Experience

What's Next

Try It Out