Nexus exports distributed traces using OpenTelemetry, providing detailed visibility into request flows across all components. Traces help you understand latency, identify bottlenecks, and debug issues by showing the complete execution path of each request.

To enable tracing, configure telemetry in your nexus.toml:

[telemetry.tracing] enabled = true sampling = 0.1 # Sample 10% of requests parent_based_sampler = false # Use parent-based sampling (default: false)

See the complete telemetry configuration guide for all options including:

  • Sampling strategies and recommendations
  • Parent-based sampling for distributed systems
  • Trace context propagation (W3C and AWS X-Ray)
  • Integration with OTLP exporters
  • Performance tuning for different environments

Nexus creates a hierarchical span structure that represents the complete request flow:

HTTP Request (root span) ├── Redis Rate Limit Check - HTTP-level rate limits ├── MCP Operation │ ├── Redis Rate Limit Check - MCP-specific rate limits │ ├── Tool Search │ └── Tool Execution └── LLM Operation └── Token Rate Limit Check (redis:check_and_consume_tokens) - Token-based rate limits

The root span for all incoming HTTP requests.

Span Name: {method} {path} (e.g., POST /mcp, GET /health)

Attributes:

  • http.request.method - HTTP method (GET, POST, etc.)
  • http.route - Request route pattern
  • http.response.status_code - Response status code
  • http.request.body.size - Request body size in bytes
  • http.response.body.size - Response body size in bytes
  • client.id - Client identifier from rate limiting
  • client.group - Client group for rate limiting
  • url.path - Full URL path
  • url.query - Query parameters
  • user_agent.original - Client user agent string

Spans for Model Context Protocol operations.

Span Name: MCP method names like tools/list, tools/call

Attributes:

  • mcp.method - MCP method name (tools/list, tools/call, etc.)
  • mcp.tool.name - Tool being called
  • mcp.tool.type - Tool type (builtin/downstream)
  • mcp.transport - Transport type (stdio/http)
  • mcp.auth_forwarded - Whether auth was forwarded
  • client.id - Client identifier (if configured)
  • client.group - Client group (if configured)
  • mcp.error.code - Error code if operation failed

Spans for Language Model operations.

Span Name: llm:chat_completion or llm:chat_completion_stream

Attributes:

  • gen_ai.request.model - Model identifier
  • gen_ai.request.max_tokens - Max tokens requested
  • gen_ai.request.temperature - Temperature parameter
  • gen_ai.request.has_tools - Whether tools were provided
  • gen_ai.request.tool_count - Number of tools provided
  • gen_ai.response.model - Model used for response
  • gen_ai.response.finish_reason - Completion reason
  • gen_ai.usage.input_tokens - Input token count
  • gen_ai.usage.output_tokens - Output token count
  • gen_ai.usage.total_tokens - Total token count
  • llm.stream - Whether streaming was used
  • llm.auth_forwarded - Whether auth was forwarded
  • client.id - Client identifier (if configured)
  • client.group - Client group (if configured)
  • error.type - Error type if operation failed

Spans for Redis operations including rate limiting.

Span Names:

  • redis:check_and_consume:global - Global rate limit
  • redis:check_and_consume:ip - Per-IP rate limit
  • redis:check_and_consume:server - Per-server rate limit
  • redis:check_and_consume:tool - Per-tool rate limit
  • redis:check_and_consume_tokens - Token-based rate limit

Attributes:

  • redis.operation - Operation type (check_and_consume or check_and_consume_tokens)
  • rate_limit.scope - Scope (global/ip/server/tool/token)
  • rate_limit.limit - Request/token limit
  • rate_limit.interval_ms - Time window in milliseconds
  • rate_limit.tokens - Number of tokens (for token operations)
  • rate_limit.allowed - Whether request was allowed
  • rate_limit.retry_after_ms - Retry delay if rate limited
  • redis.pool.size - Connection pool size
  • redis.pool.available - Available connections
  • redis.pool.in_use - Connections in use
  • client.address_hash - Hashed IP for privacy (per-IP limits)
  • llm.provider - Provider name (token limits)
  • llm.model - Model name (token limits)

Nexus supports multiple trace context propagation formats:

  • W3C Trace Context (default): Standard trace propagation using traceparent and tracestate headers
  • AWS X-Ray: For AWS environments using X-Amzn-Trace-Id header format

See the telemetry configuration guide for propagation setup.

Parent-based sampling ensures consistent trace sampling across distributed systems. When enabled, Nexus respects the sampling decision made by upstream services.

[telemetry.tracing] sampling = 0.15 # Local sampling rate (15%) parent_based_sampler = true # Enable parent-based sampling

When parent_based_sampler = true:

  • If an incoming request has a sampled parent trace, Nexus will sample it
  • If an incoming request has an unsampled parent trace, Nexus will not sample it
  • If no parent trace exists, Nexus uses the local sampling rate

When parent_based_sampler = false (default):

  • Nexus always uses the local sampling rate
  • Parent trace sampling decisions are ignored

Parent-based sampling is recommended for:

  • Microservice architectures: Ensures complete traces across all services
  • API gateways: Maintains sampling consistency from edge to backend
  • Multi-tier applications: Prevents partial traces with missing segments
  1. Complete distributed traces: With parent-based sampling, if a frontend service samples a request, all downstream services (including Nexus) will also sample it, ensuring a complete trace.

  2. Consistent sampling: In a chain of services A → B → Nexus, if A decides to sample, both B and Nexus will honor that decision, preventing trace fragmentation.

  3. Fallback behavior: For direct requests to Nexus without parent context, the local sampling rate applies.

Traces are exported via the OTLP exporter to any compatible backend. See the telemetry configuration guide for detailed setup with:

  • Jaeger
  • Grafana Tempo
  • AWS X-Ray
  • Datadog
  • New Relic
  • Zipkin

Look for spans with high duration:

  • Filter: duration > 1s
  • Group by: http.route

Find failed requests:

  • Filter: http.response.status_code >= 500
  • Or: error = true

Track LLM token consumption:

  • Filter: span.name = "LLM*"
  • Sum: gen_ai.usage.total_tokens

Find rate-limited requests:

  • Filter: rate_limit.blocked = true
  • Group by: client.id, rate_limit.scope
  • Sampling rates: Use 1-5% in production, higher in development
  • Span cardinality: High-cardinality attributes are automatically limited
  • Network overhead: Controlled via batch export settings

See the telemetry configuration guide for optimization strategies.

Common issues and solutions are covered in the telemetry configuration guide, including:

  • Traces not appearing
  • Missing spans
  • Context propagation issues
  • Performance optimization
© Grafbase, Inc.