The LLM router acts as a unified gateway that provides access to multiple AI model providers through a single OpenAI-compatible API. Configure providers, models, rate limits, and more.

  1. Provider Configuration - Set up OpenAI, Anthropic, Google, and AWS Bedrock
  2. Model Management - Configure models and create aliases
  3. Token Rate Limiting - Control token consumption per user and model
  1. Token Forwarding - Allow users to provide their own API keys
  2. Header Rules - Transform and manage HTTP headers for providers

Enable LLM routing in your nexus.toml:

[llm] enabled = true # Enable LLM functionality path = "/llm" # API endpoint path # Configure OpenAI provider [llm.providers.openai] type = "openai" api_key = "{{ env.OPENAI_API_KEY }}" # Must explicitly configure models [llm.providers.openai.models.gpt-4] [llm.providers.openai.models."gpt-3.5-turbo"] # Configure Anthropic provider [llm.providers.anthropic] type = "anthropic" api_key = "{{ env.ANTHROPIC_API_KEY }}" [llm.providers.anthropic.models."claude-3-5-sonnet-20241022"]
# List available models curl http://localhost:8000/llm/v1/models # Chat completion curl -X POST http://localhost:8000/llm/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4", "messages": [{"role": "user", "content": "Hello!"}] }'

Models are prefixed with their provider name:

  • Format: {provider_name}/{model_id}
  • Examples: openai/gpt-4, anthropic/claude-3-5-sonnet-20241022

Note: All models must be explicitly configured. Models that are not configured will return a 404 error.

Here's a comprehensive configuration showing multiple providers and features:

[llm] enabled = true path = "/llm" # OpenAI with multiple models [llm.providers.openai] type = "openai" api_key = "{{ env.OPENAI_API_KEY }}" forward_token = true # Allow user-provided keys [llm.providers.openai.models.gpt-4] [llm.providers.openai.models."gpt-3.5-turbo"] [llm.providers.openai.models.smart] rename = "gpt-4" # Alias: "openai/smart" → "gpt-4" # Token rate limiting for OpenAI [llm.providers.openai.rate_limits.per_user] input_token_limit = 100000 interval = "60s" # Anthropic configuration [llm.providers.anthropic] type = "anthropic" api_key = "{{ env.ANTHROPIC_API_KEY }}" [llm.providers.anthropic.models."claude-3-5-sonnet-20241022"] [llm.providers.anthropic.models.fast] rename = "claude-3-haiku-20240307-v1:0" # AWS Bedrock [llm.providers.bedrock] type = "bedrock" region = "us-east-1" [llm.providers.bedrock.models.claude] rename = "anthropic.claude-3-sonnet-20240229-v1:0"

The LLM router is compatible with any OpenAI client library:

from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/llm/v1", api_key="not-used" )
import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: 'http://localhost:8000/llm/v1', apiKey: 'not-used' });
  • Unified API: Single endpoint for all providers
  • OpenAI Compatible: Works with existing OpenAI clients
  • Model Aliases: Create custom names for models
  • Token Rate Limiting: Control usage per user and model
  • Token Forwarding: Users can provide their own API keys
  • Header Transformation: Forward, insert, remove, and rename HTTP headers
  • Streaming Support: Real-time responses via SSE
  • Multiple Providers: Mix models from different vendors
  1. Explicit Model Configuration: Only configure models you need
  2. Use Environment Variables: Never hardcode API keys
  3. Configure Rate Limits: Protect against excessive usage
  4. Create Meaningful Aliases: Simplify model names for users
  5. Monitor Usage: Track token consumption and costs
  6. Test Thoroughly: Verify models before production

For debugging, check:

  • Model availability: GET /llm/v1/models
  • Nexus logs: nexus --log debug
  • Provider authentication
  • Rate limit configuration
  • Network connectivity to providers
© Grafbase, Inc.