Skip to content

Storage & Execution

Cognition decouples where state is stored from where code runs through two independent protocol abstractions: StorageBackend (Layer 2) and the execution backends (Layer 3). Both are pluggable — swap implementations via configuration with no code changes in any layer above.


StorageBackend Protocol

Defined in server/app/storage/backend.py. The protocol is composed of three sub-protocols, each responsible for a distinct concern:

SessionStore

class SessionStore(Protocol):
    async def create_session(
        self,
        thread_id: str,
        config: SessionConfig,
        title: str | None = None,
        scopes: dict[str, str] | None = None,
        agent_name: str = "default",
        metadata: dict[str, str] | None = None,
    ) -> Session: ...

    async def get_session(self, session_id: str) -> Session | None: ...

    async def list_sessions(
        self,
        filter_scopes: dict[str, str] | None = None,
        metadata_filters: dict[str, str] | None = None,
    ) -> list[Session]: ...

    async def update_session(
        self,
        session_id: str,
        title: str | None = None,
        status: str | None = None,
        config: SessionConfig | None = None,
        agent_name: str | None = None,
        metadata: dict[str, str] | None = None,
    ) -> Session | None: ...

    async def update_message_count(self, session_id: str, delta: int) -> None: ...

    async def delete_session(self, session_id: str) -> bool: ...

MessageStore

class MessageStore(Protocol):
    async def create_message(self, message: Message) -> Message: ...

    async def get_message(self, message_id: str) -> Message | None: ...

    async def get_messages_by_session(
        self,
        session_id: str,
        limit: int = 50,
        offset: int = 0,
    ) -> tuple[list[Message], int]: ...  # (messages, total_count)

    async def list_messages_for_session(self, session_id: str) -> list[Message]: ...

    async def delete_messages_for_session(self, session_id: str) -> int: ...

CheckpointerStore

class CheckpointerStore(Protocol):
    async def get_checkpointer(self) -> BaseCheckpointSaver: ...
    async def close_checkpointer(self) -> None: ...

The checkpointer is passed to LangGraph and stores agent state at every step — enabling resumable workflows that survive server restarts.

StoreBackend (Cross-Thread Memory)

class StorageBackend(Protocol):
    async def get_store(self) -> BaseStore | None: ...

Each storage backend also exposes get_store(), which returns a LangGraph BaseStore instance for cross-thread persistent memory. This is separate from the checkpointer: the checkpointer stores agent graph state (messages, tool call history) per thread; the Store holds long-lived structured data that spans threads and sessions.

Implementation Store Backend
MemoryStorageBackend InMemoryStore (ephemeral — suitable for tests and development)
SqliteStorageBackend AsyncSqliteStore (persisted to same database file as checkpointer)
PostgresStorageBackend AsyncPostgresStore (separate psycopg connection to same Postgres instance)

The Store is passed to create_deep_agent() and available inside agent nodes and middleware via runtime.store. Namespace scoping (via CognitionContext.effective_scope) ensures one tenant cannot read another's stored data. See CognitionContext and Cross-Thread Memory for details.

Unified StorageBackend

StorageBackend combines session, message, checkpoint, and Store operations plus lifecycle methods:

class StorageBackend(SessionStore, MessageStore, CheckpointerStore, Protocol):
    async def initialize(self) -> None: ...   # Create tables, pools, migrations
    async def close(self) -> None: ...        # Drain connections, release resources
    async def health_check(self) -> dict[str, Any]: ...

ArtifactStore

Artifacts are durable, scope-aware files managed separately from session/message data. The ArtifactStore provides CRUD and versioning for artifacts with six types: scratch, artifact, contract, eval, memory, policy.

Key properties: - Scope-aware — artifacts are filtered by effective_scope on every read - Versioned — content changes automatically increment the version number - Type-safe — artifact types control lifecycle semantics - Visibility-controlledprivate, run, or public visibility levels

Artifacts are accessible via GET/POST/PUT/DELETE /artifacts and are exposed to agents through the tool system.

Message Projection Recovery

The LangGraph checkpoint is the authoritative record of runtime conversation state. The messages table is a read-optimized projection used by the API. Storage backends therefore support rebuilding the message projection from checkpoint messages when API-visible message rows drift or must be recovered after an interrupted write path.

This lets Cognition repair user-visible session history without treating the messages table as the source of truth for runtime continuity.


Storage Implementations

server/app/storage/factory.pycreate_storage_backend(settings) creates the backend:

match settings.persistence_backend:
    case "sqlite":   return SqliteStorageBackend(settings)
    case "postgres": return PostgresStorageBackend(settings)
    case "memory":   return MemoryStorageBackend(settings)
    case _:          raise StorageBackendError(f"Unknown backend: ...")

No silent fallback. An unrecognised COGNITION_PERSISTENCE_BACKEND value raises immediately at startup.

SQLite (server/app/storage/sqlite.py)

Default development backend.

  • Async I/O via aiosqlite
  • LangGraph checkpoints via AsyncSqliteSaver
  • Database path resolved relative to workspace if not absolute
  • Parent directories created automatically
  • Suitable for single-node deployments; not safe for concurrent multi-process access

Configuration:

COGNITION_PERSISTENCE_BACKEND=sqlite
COGNITION_PERSISTENCE_URI=.cognition/state.db

PostgreSQL (server/app/storage/postgres.py)

Production backend for multi-node or high-availability deployments.

  • Async I/O via asyncpg connection pool (default: 1–10 connections)
  • LangGraph checkpoints via AsyncPostgresSaver
  • Schema managed by Alembic migrations
  • DSN normalisation: postgresql+asyncpg://postgresql:// for asyncpg compatibility

Configuration:

COGNITION_PERSISTENCE_BACKEND=postgres
COGNITION_PERSISTENCE_URI=postgresql://user:pass@host:5432/cognition

Memory (server/app/storage/memory.py)

In-process dict-backed store used in unit tests.

  • Zero dependencies
  • State lost on process exit
  • Fastest possible; no I/O overhead

Configuration:

COGNITION_PERSISTENCE_BACKEND=memory


ExecutionBackend

Code execution is isolated from the server process. Cognition uses two backend types, both ultimately relying on DockerExecutionBackend for hard isolation.

DockerExecutionBackend (server/app/execution/backend.py)

Runs commands in a Docker container with full kernel-level isolation:

Security Control Value
Linux capabilities All dropped (cap_drop: ALL)
Privilege escalation Blocked (no-new-privileges: true)
Root filesystem Read-only (read_only: true)
Writable paths /tmp and /home via tmpfs mounts
Network Configurable; none by default
Memory limit Configurable (default: 512m)
CPU limit Configurable (default: 1.0 core)

Container lifecycle: the backend checks for an existing running container for the session before creating a new one. Containers are reused within a session for performance. Command output is truncated at 100 KB.

Sandbox Backends (server/app/agent/sandbox_backend.py)

The two sandbox backends are Cognition's concrete wrappers around the execution abstraction:

CognitionLocalSandboxBackend

Commands execute in the local process under the server's user.

  • Command parsing with shlex.split() + subprocess with shell=False — no shell injection possible
  • Protected paths list (.cognition/ by default): write operations that target protected paths are blocked before execution
  • File operations operate directly on the host filesystem
  • No process isolation from the Cognition server process

Best for: local development, trusted codebases, CI pipelines.

COGNITION_SANDBOX_BACKEND=local

CognitionDockerSandboxBackend

File operations run directly on the host filesystem (for performance); command execution is routed through DockerExecutionBackend (for isolation).

  • Each session gets its own container (lazy creation on first command)
  • Container is reused for the session lifetime
  • Requires Docker daemon and cognition-sandbox:latest image
  • host_workspace setting maps the workspace path into the container

Best for: production, multi-tenant deployments, any untrusted code.

COGNITION_SANDBOX_BACKEND=docker
COGNITION_DOCKER_IMAGE=cognition-sandbox:latest
COGNITION_DOCKER_NETWORK=none
COGNITION_DOCKER_MEMORY_LIMIT=512m
COGNITION_DOCKER_CPU_LIMIT=1.0
COGNITION_DOCKER_TIMEOUT=300

Factory

from server.app.agent.sandbox_backend import create_sandbox_backend

backend = create_sandbox_backend(settings)
# Returns CognitionLocalSandboxBackend or CognitionDockerSandboxBackend

How Storage and Execution Compose

A session involves both layers simultaneously:

Client sends message
Layer 6: API persists user message in StorageBackend
Layer 4: AgentRuntime streams events
        ├── Tool call: bash("ls -la")
        │       └── Layer 3: SandboxBackend.execute("ls -la")
        │               └── Returns ExecutionResult(output, exit_code)
        └── Stream complete (done event)
                └── Layer 6: API persists assistant message in StorageBackend

The storage and execution backends never call each other. Composition happens only at Layer 4 and Layer 6 — the correct level in the dependency hierarchy.


Built-in Tools

Beyond the sandbox backends, the agent has three built-in tools provided by server/app/agent/tools.py:

Tool Class Description
browser BrowserTool Fetch web pages as text, markdown, or HTML via httpx
search SearchTool DuckDuckGo web search, returns titles, links, and snippets
inspect_package InspectPackageTool Inspect Python packages: list submodules, classes, and functions

These run in the local process (not inside the Docker sandbox) and are always available regardless of sandbox_backend setting.


Circuit Breaker (server/app/execution/circuit_breaker.py)

The circuit breaker protects downstream services from cascading failures. It is used by the execution layer for Docker container management.

States:

CLOSED ──[failures ≥ threshold]──► OPEN
  ▲                                   │
  │                          [timeout expires]
  │                                   ▼
  └──[successes ≥ threshold]── HALF_OPEN

Default configuration: - failure_threshold: 5 consecutive failures to open - success_threshold: 3 consecutive successes to close from half-open - timeout_seconds: 60 s in open state before transitioning to half-open - half_open_max_calls: 3 test calls allowed in half-open state

Circuit breaker status is reported in /health:

{
  "circuit_breakers": {
    "openai": {
      "state": "closed",
      "total_calls": 142,
      "failed_calls": 1,
      "consecutive_failures": 0
    }
  }
}