Agent Runtime¶

The agent runtime is the Layer 4 component responsible for translating a declarative AgentDefinition into a running agent, driving its event stream, and managing its lifecycle.

AgentRuntime Protocol¶

Defined in server/app/agent/runtime.py, AgentRuntime is a Python Protocol that establishes the contract between the API layer and any agent implementation:

class AgentRuntime(Protocol):
    async def astream_events(
        self,
        input_data: str | dict[str, Any],
        thread_id: str | None = None,
    ) -> AsyncIterator[AgentEvent]: ...

    async def ainvoke(
        self,
        input_data: str | dict[str, Any],
        thread_id: str | None = None,
    ) -> AgentEvent: ...

    async def get_state(self, thread_id: str | None = None) -> dict[str, Any] | None: ...

    async def abort(self, thread_id: str | None = None) -> bool: ...

    async def get_checkpointer(self) -> BaseCheckpointSaver: ...

DeepAgentRuntime is the only production implementation. It wraps Deep Agents, transforms its event stream into the canonical AgentEvent types, and handles abort via a thread-ID cancellation set. The protocol boundary exists so the underlying framework can be swapped without touching any Layer 5 or Layer 6 code.

Canonical Event Types¶

All events emitted by the runtime are typed dataclasses defined in server/app/agent/runtime.py. The API layer (server/app/api/sse.py:EventBuilder) serializes them to SSE wire format.

@dataclass
class TokenEvent:
    content: str

@dataclass
class ToolCallEvent:
    name: str
    args: dict[str, Any]
    id: str

@dataclass
class ToolResultEvent:
    tool_call_id: str
    output: str
    exit_code: int

@dataclass
class PlanningEvent:
    todos: list[str]

@dataclass
class StepCompleteEvent:
    step_number: int
    total_steps: int
    description: str

@dataclass
class DelegationEvent:
    from_agent: str
    to_agent: str
    task: str

@dataclass
class InterruptEvent:
    tool_call_id: str
    tool_name: str
    args: dict[str, Any]
    session_id: str | None = None

@dataclass
class StatusEvent:
    status: str    # "thinking" | "idle"

@dataclass
class UsageEvent:
    input_tokens: int
    output_tokens: int
    estimated_cost: float
    provider: str
    model: str

@dataclass
class DoneEvent:
    assistant_data: dict[str, Any]

@dataclass
class ErrorEvent:
    message: str
    code: str

Every consumer of the runtime (streaming endpoint, tests, evaluators) deals only with these types — not with raw LangGraph or Deep Agents events.

AgentDefinition¶

AgentDefinition (server/app/agent/definition.py) is the declarative configuration for a single agent. It is a Pydantic v2 model:

class AgentDefinition(BaseModel):
    name: str
    system_prompt: str | PromptConfig | None = None
    tools: list[str] = []           # registry tool names
    skills: list[str] = []          # registry skill names
    memory: list[str] = []          # paths to instruction files (AGENTS.md)
    subagents: list[SubagentDefinition] = []
    interrupt_on: dict[str, bool] = {}
    middleware: list[str | dict] = []
    config: AgentConfig = AgentConfig()
    mode: Literal["primary", "subagent", "all"] = "primary"
    description: str | None = None
    hidden: bool = False
    native: bool = False            # True for built-in agents
    a2a_exposed: bool = False       # Expose via A2A protocol (default: off)

Agent Modes¶

Mode	Meaning
`primary`	Can be selected as the main agent for a session via `agent_name`
`subagent`	Can only be invoked by another agent via the `task` tool
`all`	Can function as either

A2A Exposure¶

The a2a_exposed field controls whether an agent is exposed via the A2A (Agent-to-Agent) protocol. When True:

The agent has an Agent Card at GET /a2a/{agent_name}/.well-known/agent-card.json
The agent appears in GET /.well-known/agent-card.json (scope-filtered list)
The agent gets a dedicated JSON-RPC endpoint at POST /a2a/{agent_name}
External A2A clients can discover and invoke the agent

Built-in agents (default, readonly, etc.) have a2a_exposed=False by default. Set it to True explicitly for agents you want to expose:

# .cognition/agents/deploy-agent.yaml
name: deploy-agent
mode: primary
a2a_exposed: true
system_prompt: |
  You are a deployment agent...

Or via the API:

curl -X POST http://localhost:8000/agents \
  -H "Content-Type: application/json" \
  -d '{"name": "deploy-agent", "system_prompt": "...", "a2a_exposed": true}'

A2A Exposure¶

The a2a_exposed field controls whether an agent is exposed via the A2A (Agent-to-Agent) protocol. When True:

The agent has an Agent Card at GET /a2a/{agent_name}/.well-known/agent-card.json
The agent appears in GET /.well-known/agent-card.json (scope-filtered list)
The agent gets a dedicated JSON-RPC endpoint at POST /a2a/{agent_name}
External A2A clients can discover and invoke the agent

Built-in agents (default, readonly, etc.) have a2a_exposed=False by default. Set it to True explicitly for agents you want to expose:

# .cognition/agents/deploy-agent.yaml
name: deploy-agent
mode: primary
a2a_exposed: true
system_prompt: |
  You are a deployment agent...

Or via the API:

curl -X POST http://localhost:8000/agents \
  -H "Content-Type: application/json" \
  -d '{"name": "deploy-agent", "system_prompt": "...", "a2a_exposed": true}'

System Prompt Sources¶

The system_prompt field accepts three forms via PromptConfig:

Source	Config	Description
Inline	`system_prompt: "You are..."`	Direct text in the definition
File	`{file: "deploy-agent"}`	Loaded from `.cognition/prompts/deploy-agent.md`
MLflow	`{mlflow: "my-prompt@v3"}`	Loaded from an MLflow prompt registry at startup

AgentConfig¶

Per-agent LLM configuration that overrides the server default:

class AgentConfig(BaseModel):
    provider: str | None = None
    model: str | None = None
    temperature: float | None = None
    max_tokens: int | None = None
    recursion_limit: int | None = None
    tool_token_limit_before_evict: int | None = None

Tool Resolution¶

Tools are referenced by registry name. At runtime, RuntimeResolver.build_tools() looks up each name in the ConfigRegistry and returns the corresponding callable. File-seeded tools (from tool_sources) and API-registered tools are both resolved this way. The allowed_tool_names parameter on build_tools() filters to only the tools attached to the agent definition.

Loading Agent Definitions¶

From YAML¶

# .cognition/agents/security-auditor.yaml
name: security-auditor
mode: subagent
description: Audits code for security vulnerabilities
system_prompt: |
  You are a security expert. Audit code for vulnerabilities.
  Report findings with severity ratings.
tools:
  - "run_semgrep"
  - "check_dependencies"
config:
  model: gpt-4o
  temperature: 0.1

From Markdown (with YAML frontmatter)¶

The file name becomes the agent name; the Markdown body becomes the system_prompt.

---
mode: subagent
description: Read-only research assistant
tools:
  - "web_search"
---

You are a research assistant. Gather information from the web and summarize findings.
Do not execute any code or modify files.

Programmatically¶

from server.app.agent.definition import AgentDefinition, AgentConfig

definition = AgentDefinition(
    name="my-agent",
    system_prompt="You are a helpful assistant.",
    tools=["analyze"],
    config=AgentConfig(model="gpt-4o-mini", temperature=0.3),
    mode="primary",
)

Agent Registry¶

AgentDefinitionRegistry (server/app/agent/agent_definition_registry.py) is the server-level catalog of all available agents.

Built-in Agents¶

Name	Mode	Description
`default`	`primary`	Full-access coding agent; all built-in tools enabled
`readonly`	`primary`	Analysis-only; write and execute tools disabled
`hitl_test`	`primary`	Manual HITL verification agent; attempts protected tool calls immediately

User-Defined Agents¶

On startup, the registry scans .cognition/agents/ for *.md and *.yaml files and loads each as an AgentDefinition. The file watcher (server/app/file_watcher.py) calls registry.reload() when files change, enabling hot-reload without a server restart.

Registry API¶

registry = get_agent_definition_registry()

# List all non-hidden agents
agents = registry.get_all()

# Get a specific agent
agent = registry.get("security-auditor")

# Only agents that can own sessions
primaries = registry.primaries()

# Only agents invocable by other agents
subs = registry.subagents()

# Check if a name is valid for session creation
registry.is_valid_primary("readonly")  # True
registry.is_valid_primary("my-subagent")  # False

REST Interface¶

# List all available agents
curl http://localhost:8000/agents

# Get a specific agent
curl http://localhost:8000/agents/readonly

Response fields include name, description, mode, hidden, native, a2a_exposed, model, temperature, response_format, interrupt_on, tools, skills, and a truncated system_prompt (max 500 characters).

Capability Discovery¶

# Get deployment capabilities and feature flags
curl http://localhost:8000/capabilities

Returns installed package versions, supported stream protocols, sandbox backends, feature flags (including a2a, mcp, artifacts), middleware names, and scope configuration. See API Reference for the full response schema.

Agent Factory¶

create_cognition_agent() (server/app/agent/cognition_agent.py) is the async factory that builds a Deep Agent from an AgentDefinition. It is called by DeepAgentRuntime when creating a new session's runtime.

The factory:

Selects the sandbox backend from settings (local or docker)
Loads built-in tools: BrowserTool, SearchTool, InspectPackageTool
Loads MCP tools from configured remote servers
Resolves tools from the ConfigRegistry by registry name (filtered by allowed_tool_names from the AgentDefinition)
Attaches the middleware stack:
ToolSecurityMiddleware — blocks tools on the COGNITION_BLOCKED_TOOLS deny-list
CognitionObservabilityMiddleware — tracks LLM and tool Prometheus metrics
CognitionStreamingMiddleware — emits thinking/idle status events
Loads upstream middleware specified in the definition (see Extending Agents)
Injects subagents as Deep Agents SubAgent dicts
Passes store= (LangGraph BaseStore) and context_schema=CognitionContext for cross-thread memory

Agent Caching¶

Agents are cached by an MD5 hash of their definition. If two sessions share the same AgentDefinition, they reuse the same agent object. The cache is invalidated by invalidate_agent_cache(name) and cleared entirely by clear_agent_cache(). The file watcher triggers cache invalidation on .cognition/ changes.

Multi-Agent Delegation¶

When a primary agent needs to delegate a task, it invokes the task tool with a target agent name and a task description. The runtime:

Emits a DelegationEvent to the stream (visible to the client)
Creates a DeepAgentRuntime for the target subagent
Runs the subagent to completion
Returns the result to the primary agent's context

Subagents have their own tool sets and system prompts but run within the same session's thread, preserving shared state. Clients can observe delegation via the delegation SSE event and the subsequent events from the subagent's execution.

CognitionContext and Cross-Thread Memory¶

CognitionContext (server/app/agent/cognition_agent.py) is a typed invocation context injected into every agent run. It is built from session.scopes and forwarded to astream() and ainvoke() so that nodes and middleware can access it via runtime.context.

@dataclass
class CognitionContext:
    effective_scope: dict[str, str]  # e.g. {"tenant": "acme", "project": "ios", "end_user": "user_123"}
    session_id: str | None = None
    thread_id: str | None = None
    agent_name: str | None = None
    metadata: dict[str, str] = field(default_factory=dict)

effective_scope carries the builder-authorized scope. Cognition does not hardcode a vocabulary — builders own the scope keys. The scope is propagated through the full pipeline: HTTP headers → SessionScope → effective_scope dict → CognitionContext → LangGraph runtime.context → middleware (trusted) → tools (via runtime context, not model-supplied arguments).

This context serves two purposes:

Store namespace scoping — runtime.store (a LangGraph BaseStore) is available inside agent nodes and middleware. effective_scope is the natural key for building per-tenant memory namespaces, ensuring one tenant cannot read another's stored memories.
Middleware access — any custom middleware can read runtime.context to branch on builder-defined scope dimensions without coupling to the HTTP layer.

# Inside a custom tool or middleware
async def save_to_memory(content: str, config: RunnableConfig) -> str:
    store = config["store"]
    context = config["configurable"].get("cognition_context")
    # Use a scope key as the namespace prefix
    tenant = context.effective_scope.get("tenant", "default")
    namespace = (tenant, "memories")
    await store.aput(namespace, key, {"content": content})
    return "Saved."

Note: Built-in memory tools (save_memory, search_memories) are planned but not yet shipped. See GitHub issue #45 for the design discussion. In the meantime, the Store infrastructure is fully wired and custom tools can use it today.