Traces

Nexus exports distributed traces using OpenTelemetry, providing detailed visibility into request flows across all components. Traces help you understand latency, identify bottlenecks, and debug issues by showing the complete execution path of each request.

Configuration

To enable tracing, configure telemetry in your nexus.toml:


[telemetry.tracing]
enabled = true
sampling = 0.1              # Sample 10% of requests
parent_based_sampler = false # Use parent-based sampling (default: false)

See the complete telemetry configuration guide for all options including:

Sampling strategies and recommendations
Parent-based sampling for distributed systems
Trace context propagation (W3C and AWS X-Ray)
Integration with OTLP exporters
Performance tuning for different environments

Span Hierarchy

Nexus creates a hierarchical span structure that represents the complete request flow:


HTTP Request (root span)
├── Redis Rate Limit Check - HTTP-level rate limits
├── MCP Operation
│   ├── Redis Rate Limit Check - MCP-specific rate limits
│   ├── Tool Search
│   └── Tool Execution
└── LLM Operation
    └── Token Rate Limit Check (redis:check_and_consume_tokens) - Token-based rate limits

Available Spans

HTTP Request Span

The root span for all incoming HTTP requests.

Span Name: {method} {path} (e.g., POST /mcp, GET /health)

Attributes:

http.request.method - HTTP method (GET, POST, etc.)
http.route - Request route pattern
http.response.status_code - Response status code
http.request.body.size - Request body size in bytes
http.response.body.size - Response body size in bytes
client.id - Client identifier from rate limiting
client.group - Client group for rate limiting
url.path - Full URL path
url.query - Query parameters
user_agent.original - Client user agent string

MCP Operation Spans

Spans for Model Context Protocol operations.

Span Name: MCP method names like tools/list, tools/call

Attributes:

mcp.method - MCP method name (tools/list, tools/call, etc.)
mcp.tool.name - Tool being called
mcp.tool.type - Tool type (builtin/downstream)
mcp.transport - Transport type (stdio/http)
mcp.auth_forwarded - Whether auth was forwarded
client.id - Client identifier (if configured)
client.group - Client group (if configured)
mcp.error.code - Error code if operation failed

LLM Operation Spans

Spans for Language Model operations.

Span Name: llm:chat_completion or llm:chat_completion_stream

Attributes:

gen_ai.request.model - Model identifier
gen_ai.request.max_tokens - Max tokens requested
gen_ai.request.temperature - Temperature parameter
gen_ai.request.has_tools - Whether tools were provided
gen_ai.request.tool_count - Number of tools provided
gen_ai.response.model - Model used for response
gen_ai.response.finish_reason - Completion reason
gen_ai.usage.input_tokens - Input token count
gen_ai.usage.output_tokens - Output token count
gen_ai.usage.total_tokens - Total token count
llm.stream - Whether streaming was used
llm.auth_forwarded - Whether auth was forwarded
client.id - Client identifier (if configured)
client.group - Client group (if configured)
error.type - Error type if operation failed

Redis Operation Spans

Spans for Redis operations including rate limiting.

Span Names:

redis:check_and_consume:global - Global rate limit
redis:check_and_consume:ip - Per-IP rate limit
redis:check_and_consume:server - Per-server rate limit
redis:check_and_consume:tool - Per-tool rate limit
redis:check_and_consume_tokens - Token-based rate limit

Attributes:

redis.operation - Operation type (check_and_consume or check_and_consume_tokens)
rate_limit.scope - Scope (global/ip/server/tool/token)
rate_limit.limit - Request/token limit
rate_limit.interval_ms - Time window in milliseconds
rate_limit.tokens - Number of tokens (for token operations)
rate_limit.allowed - Whether request was allowed
rate_limit.retry_after_ms - Retry delay if rate limited
redis.pool.size - Connection pool size
redis.pool.available - Available connections
redis.pool.in_use - Connections in use
client.address_hash - Hashed IP for privacy (per-IP limits)
llm.provider - Provider name (token limits)
llm.model - Model name (token limits)

Trace Context Propagation

Nexus supports multiple trace context propagation formats:

W3C Trace Context (default): Standard trace propagation using traceparent and tracestate headers
AWS X-Ray: For AWS environments using X-Amzn-Trace-Id header format

See the telemetry configuration guide for propagation setup.

Parent-Based Sampling

Parent-based sampling ensures consistent trace sampling across distributed systems. When enabled, Nexus respects the sampling decision made by upstream services.

Configuration


[telemetry.tracing]
sampling = 0.15                  # Local sampling rate (15%)
parent_based_sampler = true      # Enable parent-based sampling

Sampling Behavior

When parent_based_sampler = true:

If an incoming request has a sampled parent trace, Nexus will sample it
If an incoming request has an unsampled parent trace, Nexus will not sample it
If no parent trace exists, Nexus uses the local sampling rate

When parent_based_sampler = false (default):

Nexus always uses the local sampling rate
Parent trace sampling decisions are ignored

Use Cases

Parent-based sampling is recommended for:

Microservice architectures: Ensures complete traces across all services
API gateways: Maintains sampling consistency from edge to backend
Multi-tier applications: Prevents partial traces with missing segments

Example Scenarios

Complete distributed traces: With parent-based sampling, if a frontend service samples a request, all downstream services (including Nexus) will also sample it, ensuring a complete trace.
Consistent sampling: In a chain of services A → B → Nexus, if A decides to sample, both B and Nexus will honor that decision, preventing trace fragmentation.
Fallback behavior: For direct requests to Nexus without parent context, the local sampling rate applies.