Sessions & Messages¶

A session is the unit of conversation in Cognition. It owns a thread of messages, binds to a specific agent, carries optional tenant scope, and persists across server restarts. Every message sent to a session streams back over Server-Sent Events (SSE).

Session Lifecycle¶

POST /sessions                →  Session created (status: active)
    │
POST /sessions/{id}/messages  →  User message persisted, agent streams response
    │                              (token events, tool calls, tool results, ...)
    │                              done event → assistant message persisted
    │
POST /sessions/{id}/abort     →  In-progress stream cancelled gracefully
    │
DELETE /sessions/{id}         →  Session and all messages deleted

Sessions start with status: active. The active/inactive/error states are tracked in server/app/models.py:SessionStatus.

Session Model¶

Defined in server/app/models.py:Session:

Field	Type	Description
`id`	`str` (UUID)	Unique session identifier
`thread_id`	`str` (UUID)	LangGraph checkpoint thread ID; one-to-one with session
`title`	`str`	Human-readable session name
`status`	`SessionStatus`	`active`, `inactive`, or `error`
`agent_name`	`str`	Name of the bound agent (default: `"default"`)
`config`	`SessionConfig`	Per-session LLM overrides (provider, model, temperature)
`scopes`	`dict[str, str]`	Builder-defined scope key-value pairs (e.g. `{"user": "alice", "project": "proj-123"}`)
`metadata`	`dict[str, str]`	Arbitrary builder-defined correlation metadata
`created_at`	`datetime`	Creation timestamp
`updated_at`	`datetime`	Last-modified timestamp
`message_count`	`int`	Running count of messages in the session
`workspace_path`	`str`	Absolute path to the agent's workspace

Creating a Session¶

# Default agent
curl -X POST http://localhost:8000/sessions \
  -H "Content-Type: application/json" \
  -d '{"title": "My session"}'

# Specific agent
curl -X POST http://localhost:8000/sessions \
  -H "Content-Type: application/json" \
  -d '{"title": "Code review", "agent_name": "readonly"}'

# Session with orchestration metadata
curl -X POST http://localhost:8000/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "title": "PR review session",
    "metadata": {
      "workflow_id": "pr-review",
      "repository": "myorg/myrepo",
      "pr_number": "42"
    }
  }'

Per-Session LLM Override¶

A session can override the server's default LLM:

curl -X PATCH http://localhost:8000/sessions/{id} \
  -H "Content-Type: application/json" \
  -d '{"config": {"model": "gpt-4o-mini", "temperature": 0.2}}'

If only model is provided, Cognition resolves the provider from the ConfigRegistry (first enabled provider by priority). Use provider_id to pin a specific provider config.

Session Metadata¶

Builders can attach arbitrary flat string metadata to a session at creation time or replace it later with PATCH /sessions/{id}. Cognition does not interpret these values; they are intended for correlation with external systems such as workflow engines, repositories, tickets, or billing units.

Example patch:

curl -X PATCH http://localhost:8000/sessions/{id} \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "workflow_id": "pr-review",
      "repository": "myorg/myrepo",
      "pr_number": "42"
    }
  }'

Filtering Sessions by Metadata¶

GET /sessions accepts query parameters of the form metadata.<key>=<value>.

Examples:

# Find sessions for a repository
curl "http://localhost:8000/sessions?metadata.repository=myorg/myrepo"

# Find a specific orchestration target
curl "http://localhost:8000/sessions?metadata.repository=myorg/myrepo&metadata.pr_number=42"

All supplied metadata predicates must match for a session to be returned.

Message Model¶

Defined in server/app/models.py:Message:

Field	Type	Description
`id`	`str` (UUID)	Unique message identifier
`session_id`	`str`	Parent session
`role`	`str`	`user`, `assistant`, `system`, or `tool`
`content`	`str`	Message text
`tool_calls`	`list[ToolCall]`	Tool invocations made by this assistant turn
`tool_call_id`	`str \\| None`	For `tool` role messages, the call this responds to
`token_count`	`int \\| None`	Token count for the message
`model_used`	`str \\| None`	Model that produced this message
`parent_id`	`str \\| None`	Parent message for threaded structure
`created_at`	`datetime`	Creation timestamp

Persistence Model¶

Cognition stores conversation history in two related forms:

LangGraph checkpoint state — the authoritative runtime state for a thread. This is what the agent relies on to resume, continue multi-step execution, and survive reconnects or restarts.
Messages table — a read-optimized projection used by the REST API for pagination, timestamps, token/model metadata, and per-message lookups.

Normal writes still happen in two stages:

User message — Written to the StorageBackend immediately when POST /sessions/{id}/messages is called, before the agent starts.
Assistant message — Accumulated as token events stream in. Written to the StorageBackend when the done event fires.

The important contract is that the checkpoint is authoritative and the messages table is derived. If a projection write is missed or becomes inconsistent, Cognition can rebuild the API-facing message projection from checkpoint state for that session/thread.

This gives Cognition a recovery path for cases like interrupted writes, projection drift, or future maintenance/migration tasks, while still preserving a query-friendly messages API.

SSE Streaming¶

Cognition uses Server-Sent Events for streaming. Every call to POST /sessions/{id}/messages returns a text/event-stream response.

Wire Format¶

Each event follows the SSE protocol:

id: 42
event: token
data: {"content": "Here"}

id: 43
event: token
data: {"content": " is"}

id: 44
event: done
data: {"assistant_data": {...}}

Event Types¶

All event types are defined in server/app/agent/runtime.py and serialized to SSE via server/app/api/sse.py:EventBuilder:

Event	`event:` field	Key `data` fields	Description
Token	`token`	`content: str`	A single LLM output token
Tool call	`tool_call`	`name: str`, `args: dict`, `id: str`	Agent invoking a tool
Tool result	`tool_result`	`tool_call_id: str`, `output: str`, `exit_code: int`	Tool execution result
Planning	`planning`	`todos: list[str]`	Agent creating a task plan
Step complete	`step_complete`	`step_number: int`, `total_steps: int`, `description: str`	A plan step finished
Delegation	`delegation`	`target_agent: str`, `task: str`	Primary agent delegating to a subagent
Status	`status`	`status: "thinking" \\| "idle"`	Agent status change
Usage	`usage`	`input_tokens: int`, `output_tokens: int`, `estimated_cost: float`, `provider: str`, `model: str`	Token accounting
Error	`error`	`message: str`, `code: str`	Recoverable error
Done	`done`	`assistant_data: dict`	Stream complete; contains the full assistant message

Reading an SSE Stream¶

# -N disables curl buffering so tokens appear as they arrive
curl -N -X POST http://localhost:8000/sessions/${SESSION}/messages \
  -H "Content-Type: application/json" \
  -d '{"content": "List the files in this directory."}'

Python example using httpx:

import httpx, json

with httpx.stream(
    "POST",
    f"http://localhost:8000/sessions/{session_id}/messages",
    json={"content": "List files."},
) as r:
    for line in r.iter_lines():
        if line.startswith("data:"):
            event = json.loads(line[5:])
            print(event)

Async Completion Callback¶

If you do not want to keep the SSE connection open until the run completes, you can provide callback_url in POST /sessions/{id}/messages. Cognition will still stream SSE to the caller, but it will also send a best-effort POST to the callback URL when the run finishes.

Example:

curl -X POST http://localhost:8000/sessions/{id}/messages \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Review this PR and summarize the key risks.",
    "callback_url": "https://example.com/cognition-callback"
  }'

Completion payload shape:

{
  "session_id": "sess_abc123",
  "message_id": "msg_xyz",
  "status": "done",
  "output": "The review has been posted to the PR.",
  "token_usage": { "input": 1840, "output": 412 },
  "model_used": "claude-3-5-sonnet",
  "completed_at": "2026-03-21T12:34:56Z"
}

Current behavior notes: - delivery is best-effort and logged on failure - retries, signatures, and persistent delivery tracking are not yet implemented

Reconnection¶

The SSE stream implements automatic reconnection via the Last-Event-ID mechanism, implemented in server/app/api/sse.py.

How It Works¶

Every event is assigned a sequential numeric ID and sent as id: <n>.
The server maintains an EventBuffer (default capacity: 100 events) per session in memory.
The client sends the Last-Event-ID header on reconnection.
The server replays any buffered events with IDs greater than Last-Event-ID.
A reconnected event is sent first to confirm the reconnection, followed by replayed events.

Heartbeat¶

The server sends a keepalive comment every 15 seconds (configurable via COGNITION_SSE_HEARTBEAT_INTERVAL):

:heartbeat

This prevents proxies and load balancers from closing idle SSE connections. The comment is invisible to application-level event handlers.

Reconnection Configuration¶

Setting	Environment Variable	Default
SSE retry hint	`COGNITION_SSE_RETRY_INTERVAL`	`3000` ms
Heartbeat interval	`COGNITION_SSE_HEARTBEAT_INTERVAL`	`15.0` s
Event buffer size	`COGNITION_SSE_BUFFER_SIZE`	`100` events

Message Pagination¶

GET /sessions/{id}/messages returns messages in reverse-chronological order with offset-based pagination:

# First page (most recent 50)
curl "http://localhost:8000/sessions/${SESSION}/messages"

# Second page
curl "http://localhost:8000/sessions/${SESSION}/messages?limit=50&offset=50"

Response:

{
  "messages": [...],
  "total": 142,
  "has_more": true
}

Abort¶

POST /sessions/{id}/abort cancels any in-progress agent operation. The streaming response stops emitting events, the assistant message is persisted with whatever content was accumulated, and the session returns to idle.

curl -X POST http://localhost:8000/sessions/${SESSION}/abort

Response:

{"success": true, "message": "Operation aborted"}

Abort is implemented via a thread-ID-based cancellation set in DeepAgentRuntime. On the next event-processing iteration, the runtime detects the abort flag and exits the stream loop cleanly.