Metrics

Nexus provides comprehensive OpenTelemetry metrics for monitoring all aspects of the system including server operations, LLM interactions, and MCP tool executions. All metrics follow OpenTelemetry semantic conventions and can be exported to any OpenTelemetry-compatible backend.

Notes:

All histograms are delta temporality histograms that also function as counters (the count field tracks number of observations)
Delta histograms report the change since the last export, not cumulative values
Many metrics include client.id and client.group attributes when Client Identification is enabled. These attributes allow you to track usage per client and implement tiered access controls.

Configuration

To enable metrics, configure telemetry in your nexus.toml:


[telemetry.exporters.otlp]
enabled = true
endpoint = "http://localhost:4317"

See the complete telemetry configuration guide for all options including:

Service identification and resource attributes
Protocol selection (gRPC vs HTTP)
Batch export optimization
Integration examples for popular backends

Server Metrics

HTTP Server Metrics

Request Duration

Metric: http.server.request.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the duration of HTTP server requests

Attributes:

http.request.method: HTTP method (GET, POST, etc.)
http.response.status_code: HTTP response status code
http.route: The matched route pattern

Use Case: Monitor API latency, identify slow endpoints, track error rates

Redis Metrics

Available when using Redis as the rate limiting backend. See Rate Limiting Configuration for setup details.

Command Duration

Metric: redis.command.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks execution time of Redis operations

Attributes:

operation: Type of Redis operation
- check_and_consume: HTTP rate limit checking
- check_and_consume_tokens: Token-based rate limit checking
status: Operation status (success or error)
tokens: Number of tokens (only for token operations)

Use Case: Monitor Redis performance, identify bottlenecks in rate limiting

Connections In Use

Metric: redis.pool.connections.in_use
Type: Gauge
Description: Current number of connections checked out from the pool
Attributes: None
Use Case: Monitor connection pool utilization

Connections Available

Metric: redis.pool.connections.available
Type: Gauge
Description: Current number of connections available in the pool
Attributes: None
Use Case: Ensure adequate pool capacity

LLM Metrics

Following OpenTelemetry GenAI semantic conventions for AI model operations. See LLM Configuration for provider setup details.

Operation Metrics

Operation Duration

Metric: gen_ai.client.operation.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the total duration of LLM chat completion operations

Attributes:

gen_ai.system: Always "nexus.llm"
gen_ai.operation.name: Always "chat.completions"
gen_ai.request.model: The model identifier (e.g., "openai/gpt-4")
gen_ai.response.finish_reason: How the response ended (stop/length/tool_calls/content_filter)
client.id: Client identifier (from x-client-id header, see Client Identification)
client.group: Client group (from x-client-group header, see Client Identification)
error.type: Error type for failed requests:
- invalid_request - Malformed request
- authentication_failed - Invalid API key
- insufficient_quota - Quota exceeded
- model_not_found - Unknown model
- rate_limit_exceeded - Provider or token rate limit hit
- streaming_not_supported - Streaming unavailable for model
- invalid_model_format - Incorrect model name format
- provider_not_found - Unknown provider
- internal_error - Server error
- provider_api_error - Upstream provider error
- connection_error - Network failure

Use Case: Monitor LLM latency, compare performance across providers and models

Time to First Token

Metric: gen_ai.client.time_to_first_token
Type: Delta Histogram with Counter (milliseconds)
Description: Duration until the first token is received in streaming responses

Attributes:

gen_ai.system: Always "nexus.llm"
gen_ai.operation.name: Always "chat.completions"
gen_ai.request.model: The model identifier
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)

Use Case: Monitor streaming response latency, critical for user experience

Token Usage Metrics

Input Token Usage

Metric: gen_ai.client.input.token.usage
Type: Counter
Description: Cumulative count of input tokens consumed

Attributes:

gen_ai.system: Always "nexus.llm"
gen_ai.request.model: The model identifier
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)

Output Token Usage

Metric: gen_ai.client.output.token.usage
Type: Counter
Description: Cumulative count of output tokens generated

Attributes:

gen_ai.system: Always "nexus.llm"
gen_ai.request.model: The model identifier
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)

Total Token Usage

Metric: gen_ai.client.total.token.usage
Type: Counter
Description: Cumulative total tokens (input + output)

Attributes:

gen_ai.system: Always "nexus.llm"
gen_ai.request.model: The model identifier
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)

MCP Metrics

Metrics for Model Context Protocol operations and tool executions. See MCP Configuration for server setup details.

Tool Call Metrics

Tool Call Duration

Metric: mcp.tool.call.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the duration of MCP tool invocations including both built-in and downstream tools

Attributes:

tool_name: Name of the tool being called
tool_type: Type of tool (builtin or downstream)
status: Operation status (success or error)
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)
Additional for search operations:
- keyword_count: Number of keywords in search query
- result_count: Number of results returned
Additional for execute operations on downstream tools:
- server_name: Name of the downstream MCP server
Additional for errors:
- error.type: Specific error type:
  - parse_error - Invalid JSON (-32700)
  - invalid_request - Not a valid request (-32600)
  - method_not_found - Method/tool does not exist (-32601)
  - invalid_params - Invalid method parameters (-32602)
  - internal_error - Internal server error (-32603)
  - rate_limit_exceeded - Rate limit hit (-32000)
  - server_error - Other server errors (-32001 to -32099)
  - unknown - Any other error code

Use Case: Monitor tool performance, identify slow or failing tools, track usage patterns

Tools List Duration

Metric: mcp.tools.list.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the duration of listing available tools from MCP servers

Attributes:

method: Always "list_tools"
status: Operation status (success or error)
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)

Use Case: Monitor tool discovery performance and server responsiveness

Prompt and Resource Metrics

Prompt Request Duration

Metric: mcp.prompt.request.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the duration of prompt-related operations (list/get)

Attributes:

method: Operation type (list_prompts or get_prompt)
status: Operation status (success or error)
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)
Additional for errors:
- error.type: Same error types as tool call duration

Use Case: Monitor prompt template retrieval and listing performance

Resource Request Duration

Metric: mcp.resource.request.duration
Type: Delta Histogram with Counter (milliseconds)
Description: Tracks the duration of resource-related operations (list/read)

Attributes:

method: Operation type (list_resources or read_resource)
status: Operation status (success or error)
client.id: Client identifier (see Client Identification)
client.group: Client group (see Client Identification)
Additional for errors:
- error.type: Same error types as tool call duration

Use Case: Monitor resource access patterns and performance

Telemetry Configuration - Complete configuration guide
Telemetry Overview - All telemetry types
Server Configuration - Server settings
LLM Configuration - LLM provider setup
MCP Configuration - MCP server setup

Metrics

Configuration

Server Metrics

HTTP Server Metrics

Request Duration

Redis Metrics

Command Duration

Connections In Use

Connections Available

LLM Metrics

Operation Metrics

Operation Duration

Time to First Token

Token Usage Metrics

Input Token Usage

Output Token Usage

Total Token Usage

MCP Metrics

Tool Call Metrics

Tool Call Duration

Tools List Duration

Prompt and Resource Metrics

Prompt Request Duration

Resource Request Duration

Related Documentation