Skip to content

Observability

Cognition provides a three-pillar observability stack — distributed traces, time-series metrics, and experiment tracking — all independently toggleable and all with graceful degradation when the underlying packages are not installed.


Three Pillars

Pillar Technology Purpose Toggle
Distributed traces OpenTelemetry → OTLP → Collector Request and LLM call tracing COGNITION_OTEL_ENABLED
Time-series metrics Prometheus Counters and histograms COGNITION_METRICS_PORT
Experiment tracking MLflow LLM evaluation and run history COGNITION_MLFLOW_ENABLED

All three subsystems are initialised in the FastAPI lifespan in server/app/main.py. Disabling any one of them requires only an environment variable change — no code changes.


OpenTelemetry Traces

Implemented in server/app/observability/__init__.py:setup_tracing().

What Gets Traced

  • HTTP requests — every inbound request with method, path, status code, and duration (via FastAPI auto-instrumentation)
  • LLM calls — provider, model, input/output token counts, latency (via LangChain auto-instrumentation)
  • Tool executions — tool name, duration, success/failure

Configuration

Variable Default Description
COGNITION_OTEL_ENABLED true Enable/disable tracing
COGNITION_OTEL_ENDPOINT null OTLP collector URL (e.g. http://localhost:4317)

Transport is auto-detected: gRPC for http://host:4317-style endpoints, HTTP for /v1/traces paths. When COGNITION_OTEL_ENDPOINT is null, traces are not exported but the instrumentation is still active (useful for local development with a local Jaeger or similar).

Manual Instrumentation

from server.app.observability import traced, span, get_tracer

# Decorator form
@traced("my_operation")
async def do_work():
    ...

# Context manager form
async def process():
    async with span("processing_step", {"input_size": len(data)}):
        result = await heavy_computation(data)

Docker Compose Stack

The docker-compose.yml ships a full OTel pipeline:

Cognition → OTel Collector (port 4317) → Jaeger (traces)
                                        → Loki (logs via Promtail)

See the Deployment guide for the complete stack.


Prometheus Metrics

Implemented in server/app/observability/__init__.py. All metrics are defined at module level and imported by the middleware and agent layers.

Defined Metrics

Metric Type Labels Description
cognition_requests_total Counter method, endpoint, status Total HTTP requests
cognition_request_duration_seconds Histogram method, endpoint HTTP request latency
cognition_llm_call_duration_seconds Histogram provider, model LLM API call latency
cognition_tool_calls_total Counter tool_name, status Tool invocations (success/error)
cognition_active_sessions Gauge Currently active sessions

When prometheus_client is not installed, all metrics fall back to DummyMetric — a no-op object that accepts any call without error.

Scrape Configuration

Metrics are served on a separate port from the API:

COGNITION_METRICS_PORT=9090

Prometheus prometheus.yml scrape target:

scrape_configs:
  - job_name: cognition
    static_configs:
      - targets: ["cognition:9090"]

Where Metrics Are Recorded

  • REQUEST_COUNT and REQUEST_DURATIONserver/app/api/middleware.py:ObservabilityMiddleware (every request)
  • LLM_CALL_DURATIONserver/app/agent/middleware.py:CognitionObservabilityMiddleware (every LLM invocation)
  • TOOL_CALL_COUNTserver/app/agent/middleware.py:CognitionObservabilityMiddleware (every tool invocation, labelled success or error)
  • SESSION_COUNT — updated by server/app/api/routes/sessions.py on session create and delete

Decorator Form

from server.app.observability import timed, TOOL_CALL_COUNT

@timed(TOOL_CALL_COUNT, {"tool_name": "my_tool"})
async def my_tool_handler(args):
    ...

MLflow Experiment Tracking

Implemented in server/app/observability/mlflow_config.py.

How It Works

MLflow receives traces via the OTel Collector — there is no direct MLflow SDK call in the hot path. The flow is:

Cognition (OTel SDK) → OTel Collector → MLflow Tracking Server

setup_mlflow_tracing(settings) performs two things at startup: 1. Sets the MLflow tracking URI 2. Creates or sets the MLflow experiment (default name: cognition)

Configuration

Variable Default Description
COGNITION_MLFLOW_ENABLED false Enable MLflow integration
COGNITION_MLFLOW_TRACKING_URI null MLflow server URL (e.g. http://localhost:5000)
COGNITION_MLFLOW_EXPERIMENT_NAME cognition Experiment name

What Gets Recorded

Each agent turn becomes an MLflow run with: - Trace of every LLM call (model, tokens, latency) - Trace of every tool invocation (name, args, output, duration) - Nested delegation spans when subagents are involved - Custom evaluation runs from the offline evaluation pipeline

Offline Evaluation Pipeline

The evaluation pipeline runs independently of the live server. It replays sessions from the StorageBackend, scores them with built-in scorers, and logs results as MLflow runs.

Built-in scorers: - Faithfulness — LLM-as-judge scoring whether the response is grounded in sources - Relevance — Whether the response addresses the question - Tool efficiency — Whether the agent used the minimum necessary tool calls

See MLFLOW-INTEROPERABILITY.md in the repository root for the full evaluation workflow.


Structured Logging

server/app/observability/__init__.py:setup_logging() configures structlog.

In development (log_level: debug), logs are rendered in a human-readable console format with coloured level names and formatted timestamps.

In production (log_level: info or warning), logs are rendered as JSON for ingestion by Loki, Datadog, CloudWatch, or any structured log aggregator.

COGNITION_LOG_LEVEL=info    # debug | info | warning | error

Log output from the Docker Compose stack is collected by Promtail and forwarded to Loki, then queryable in Grafana.


Grafana Dashboards

The docker/grafana/dashboards/ directory ships pre-built Grafana dashboard JSON for:

  • Cognition Overview — Request rate, latency, error rate, active sessions
  • LLM Performance — Per-provider call latency, token usage, circuit breaker state
  • Tool Execution — Tool call rate by name, success/error ratio
  • Session Activity — Session creation rate, message throughput

Dashboards are provisioned automatically when starting the Docker Compose stack.


Graceful Degradation

All three observability subsystems are wrapped in conditional imports:

try:
    import prometheus_client
    PROMETHEUS_AVAILABLE = True
except ImportError:
    PROMETHEUS_AVAILABLE = False
    # All metric objects become DummyMetric()

The server starts and runs normally regardless of whether prometheus_client, opentelemetry-sdk, or mlflow are installed. Observability is additive, not required.