The LLM router acts as a unified gateway that provides access to multiple AI model providers through a single OpenAI-compatible API. Configure providers, models, rate limits, and more.
- Provider Configuration - Set up OpenAI, Anthropic, Google, and AWS Bedrock
- Model Management - Configure models and create aliases
- Token Rate Limiting - Control token consumption per user and model
- Token Forwarding - Allow users to provide their own API keys
- Header Rules - Transform and manage HTTP headers for providers
Enable LLM routing in your nexus.toml
:
[llm]
enabled = true # Enable LLM functionality
path = "/llm" # API endpoint path
# Configure OpenAI provider
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Must explicitly configure models
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models."gpt-3.5-turbo"]
# Configure Anthropic provider
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.anthropic.models."claude-3-5-sonnet-20241022"]
# List available models
curl http://localhost:8000/llm/v1/models
# Chat completion
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Models are prefixed with their provider name:
- Format:
{provider_name}/{model_id}
- Examples:
openai/gpt-4
,anthropic/claude-3-5-sonnet-20241022
Note: All models must be explicitly configured. Models that are not configured will return a 404 error.
Here's a comprehensive configuration showing multiple providers and features:
[llm]
enabled = true
path = "/llm"
# OpenAI with multiple models
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
forward_token = true # Allow user-provided keys
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models."gpt-3.5-turbo"]
[llm.providers.openai.models.smart]
rename = "gpt-4" # Alias: "openai/smart" → "gpt-4"
# Token rate limiting for OpenAI
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000
interval = "60s"
# Anthropic configuration
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.anthropic.models."claude-3-5-sonnet-20241022"]
[llm.providers.anthropic.models.fast]
rename = "claude-3-haiku-20240307-v1:0"
# AWS Bedrock
[llm.providers.bedrock]
type = "bedrock"
region = "us-east-1"
[llm.providers.bedrock.models.claude]
rename = "anthropic.claude-3-sonnet-20240229-v1:0"
The LLM router is compatible with any OpenAI client library:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/llm/v1",
api_key="not-used"
)
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'http://localhost:8000/llm/v1',
apiKey: 'not-used'
});
- Unified API: Single endpoint for all providers
- OpenAI Compatible: Works with existing OpenAI clients
- Model Aliases: Create custom names for models
- Token Rate Limiting: Control usage per user and model
- Token Forwarding: Users can provide their own API keys
- Header Transformation: Forward, insert, remove, and rename HTTP headers
- Streaming Support: Real-time responses via SSE
- Multiple Providers: Mix models from different vendors
- Explicit Model Configuration: Only configure models you need
- Use Environment Variables: Never hardcode API keys
- Configure Rate Limits: Protect against excessive usage
- Create Meaningful Aliases: Simplify model names for users
- Monitor Usage: Track token consumption and costs
- Test Thoroughly: Verify models before production
For debugging, check:
- Model availability:
GET /llm/v1/models
- Nexus logs:
nexus --log debug
- Provider authentication
- Rate limit configuration
- Network connectivity to providers
- Start with Provider Configuration
- Set up Model Management
- Configure Token Rate Limiting
- Learn how to use the API