Nexus exports distributed traces using OpenTelemetry, providing detailed visibility into request flows across all components. Traces help you understand latency, identify bottlenecks, and debug issues by showing the complete execution path of each request.
To enable tracing, configure telemetry in your nexus.toml
:
[telemetry.tracing]
enabled = true
sampling = 0.1 # Sample 10% of requests
parent_based_sampler = false # Use parent-based sampling (default: false)
See the complete telemetry configuration guide for all options including:
- Sampling strategies and recommendations
- Parent-based sampling for distributed systems
- Trace context propagation (W3C and AWS X-Ray)
- Integration with OTLP exporters
- Performance tuning for different environments
Nexus creates a hierarchical span structure that represents the complete request flow:
HTTP Request (root span)
├── Redis Rate Limit Check - HTTP-level rate limits
├── MCP Operation
│ ├── Redis Rate Limit Check - MCP-specific rate limits
│ ├── Tool Search
│ └── Tool Execution
└── LLM Operation
└── Token Rate Limit Check (redis:check_and_consume_tokens) - Token-based rate limits
The root span for all incoming HTTP requests.
Span Name: {method} {path}
(e.g., POST /mcp
, GET /health
)
Attributes:
http.request.method
- HTTP method (GET, POST, etc.)http.route
- Request route patternhttp.response.status_code
- Response status codehttp.request.body.size
- Request body size in byteshttp.response.body.size
- Response body size in bytesclient.id
- Client identifier from rate limitingclient.group
- Client group for rate limitingurl.path
- Full URL pathurl.query
- Query parametersuser_agent.original
- Client user agent string
Spans for Model Context Protocol operations.
Span Name: MCP method names like tools/list
, tools/call
Attributes:
mcp.method
- MCP method name (tools/list, tools/call, etc.)mcp.tool.name
- Tool being calledmcp.tool.type
- Tool type (builtin/downstream)mcp.transport
- Transport type (stdio/http)mcp.auth_forwarded
- Whether auth was forwardedclient.id
- Client identifier (if configured)client.group
- Client group (if configured)mcp.error.code
- Error code if operation failed
Spans for Language Model operations.
Span Name: llm:chat_completion
or llm:chat_completion_stream
Attributes:
gen_ai.request.model
- Model identifiergen_ai.request.max_tokens
- Max tokens requestedgen_ai.request.temperature
- Temperature parametergen_ai.request.has_tools
- Whether tools were providedgen_ai.request.tool_count
- Number of tools providedgen_ai.response.model
- Model used for responsegen_ai.response.finish_reason
- Completion reasongen_ai.usage.input_tokens
- Input token countgen_ai.usage.output_tokens
- Output token countgen_ai.usage.total_tokens
- Total token countllm.stream
- Whether streaming was usedllm.auth_forwarded
- Whether auth was forwardedclient.id
- Client identifier (if configured)client.group
- Client group (if configured)error.type
- Error type if operation failed
Spans for Redis operations including rate limiting.
Span Names:
redis:check_and_consume:global
- Global rate limitredis:check_and_consume:ip
- Per-IP rate limitredis:check_and_consume:server
- Per-server rate limitredis:check_and_consume:tool
- Per-tool rate limitredis:check_and_consume_tokens
- Token-based rate limit
Attributes:
redis.operation
- Operation type (check_and_consume or check_and_consume_tokens)rate_limit.scope
- Scope (global/ip/server/tool/token)rate_limit.limit
- Request/token limitrate_limit.interval_ms
- Time window in millisecondsrate_limit.tokens
- Number of tokens (for token operations)rate_limit.allowed
- Whether request was allowedrate_limit.retry_after_ms
- Retry delay if rate limitedredis.pool.size
- Connection pool sizeredis.pool.available
- Available connectionsredis.pool.in_use
- Connections in useclient.address_hash
- Hashed IP for privacy (per-IP limits)llm.provider
- Provider name (token limits)llm.model
- Model name (token limits)
Nexus supports multiple trace context propagation formats:
- W3C Trace Context (default): Standard trace propagation using
traceparent
andtracestate
headers - AWS X-Ray: For AWS environments using
X-Amzn-Trace-Id
header format
See the telemetry configuration guide for propagation setup.
Parent-based sampling ensures consistent trace sampling across distributed systems. When enabled, Nexus respects the sampling decision made by upstream services.
[telemetry.tracing]
sampling = 0.15 # Local sampling rate (15%)
parent_based_sampler = true # Enable parent-based sampling
When parent_based_sampler = true
:
- If an incoming request has a sampled parent trace, Nexus will sample it
- If an incoming request has an unsampled parent trace, Nexus will not sample it
- If no parent trace exists, Nexus uses the local
sampling
rate
When parent_based_sampler = false
(default):
- Nexus always uses the local
sampling
rate - Parent trace sampling decisions are ignored
Parent-based sampling is recommended for:
- Microservice architectures: Ensures complete traces across all services
- API gateways: Maintains sampling consistency from edge to backend
- Multi-tier applications: Prevents partial traces with missing segments
-
Complete distributed traces: With parent-based sampling, if a frontend service samples a request, all downstream services (including Nexus) will also sample it, ensuring a complete trace.
-
Consistent sampling: In a chain of services A → B → Nexus, if A decides to sample, both B and Nexus will honor that decision, preventing trace fragmentation.
-
Fallback behavior: For direct requests to Nexus without parent context, the local sampling rate applies.
Traces are exported via the OTLP exporter to any compatible backend. See the telemetry configuration guide for detailed setup with:
- Jaeger
- Grafana Tempo
- AWS X-Ray
- Datadog
- New Relic
- Zipkin
Look for spans with high duration:
- Filter:
duration > 1s
- Group by:
http.route
Find failed requests:
- Filter:
http.response.status_code >= 500
- Or:
error = true
Track LLM token consumption:
- Filter:
span.name = "LLM*"
- Sum:
gen_ai.usage.total_tokens
Find rate-limited requests:
- Filter:
rate_limit.blocked = true
- Group by:
client.id
,rate_limit.scope
- Sampling rates: Use 1-5% in production, higher in development
- Span cardinality: High-cardinality attributes are automatically limited
- Network overhead: Controlled via batch export settings
See the telemetry configuration guide for optimization strategies.
Common issues and solutions are covered in the telemetry configuration guide, including:
- Traces not appearing
- Missing spans
- Context propagation issues
- Performance optimization
- Telemetry Overview - General telemetry concepts
- Metrics - Available metrics
- Configuration Guide - Full configuration reference