Agent Runtime¶
The agent runtime is the Layer 4 component responsible for translating a declarative AgentDefinition into a running agent, driving its event stream, and managing its lifecycle.
AgentRuntime Protocol¶
Defined in server/app/agent/runtime.py, AgentRuntime is a Python Protocol that establishes the contract between the API layer and any agent implementation:
class AgentRuntime(Protocol):
async def astream_events(
self,
input_data: str | dict[str, Any],
thread_id: str | None = None,
) -> AsyncIterator[AgentEvent]: ...
async def ainvoke(
self,
input_data: str | dict[str, Any],
thread_id: str | None = None,
) -> AgentEvent: ...
async def get_state(self, thread_id: str | None = None) -> dict[str, Any] | None: ...
async def abort(self, thread_id: str | None = None) -> bool: ...
async def get_checkpointer(self) -> BaseCheckpointSaver: ...
DeepAgentRuntime is the only production implementation. It wraps Deep Agents, transforms its event stream into the canonical AgentEvent types, and handles abort via a thread-ID cancellation set. The protocol boundary exists so the underlying framework can be swapped without touching any Layer 5 or Layer 6 code.
Canonical Event Types¶
All events emitted by the runtime are typed dataclasses defined in server/app/agent/runtime.py. The API layer (server/app/api/sse.py:EventBuilder) serializes them to SSE wire format.
@dataclass
class TokenEvent:
content: str
@dataclass
class ToolCallEvent:
name: str
args: dict[str, Any]
id: str
@dataclass
class ToolResultEvent:
tool_call_id: str
output: str
exit_code: int
@dataclass
class PlanningEvent:
todos: list[str]
@dataclass
class StepCompleteEvent:
step_number: int
total_steps: int
description: str
@dataclass
class DelegationEvent:
from_agent: str
to_agent: str
task: str
@dataclass
class InterruptEvent:
tool_call_id: str
tool_name: str
args: dict[str, Any]
session_id: str | None = None
@dataclass
class StatusEvent:
status: str # "thinking" | "idle"
@dataclass
class UsageEvent:
input_tokens: int
output_tokens: int
estimated_cost: float
provider: str
model: str
@dataclass
class DoneEvent:
assistant_data: dict[str, Any]
@dataclass
class ErrorEvent:
message: str
code: str
Every consumer of the runtime (streaming endpoint, tests, evaluators) deals only with these types — not with raw LangGraph or Deep Agents events.
AgentDefinition¶
AgentDefinition (server/app/agent/definition.py) is the declarative configuration for a single agent. It is a Pydantic v2 model:
class AgentDefinition(BaseModel):
name: str
system_prompt: str | PromptConfig | None = None
tools: list[str] = [] # registry tool names
skills: list[str] = [] # registry skill names
memory: list[str] = [] # paths to instruction files (AGENTS.md)
subagents: list[SubagentDefinition] = []
interrupt_on: dict[str, bool] = {}
middleware: list[str | dict] = []
config: AgentConfig = AgentConfig()
mode: Literal["primary", "subagent", "all"] = "primary"
description: str | None = None
hidden: bool = False
native: bool = False # True for built-in agents
a2a_exposed: bool = False # Expose via A2A protocol (default: off)
Agent Modes¶
| Mode | Meaning |
|---|---|
primary |
Can be selected as the main agent for a session via agent_name |
subagent |
Can only be invoked by another agent via the task tool |
all |
Can function as either |
A2A Exposure¶
The a2a_exposed field controls whether an agent is exposed via the A2A (Agent-to-Agent) protocol. When True:
- The agent has an Agent Card at
GET /a2a/{agent_name}/.well-known/agent-card.json - The agent appears in
GET /.well-known/agent-card.json(scope-filtered list) - The agent gets a dedicated JSON-RPC endpoint at
POST /a2a/{agent_name} - External A2A clients can discover and invoke the agent
Built-in agents (default, readonly, etc.) have a2a_exposed=False by default. Set it to True explicitly for agents you want to expose:
# .cognition/agents/deploy-agent.yaml
name: deploy-agent
mode: primary
a2a_exposed: true
system_prompt: |
You are a deployment agent...
Or via the API:
curl -X POST http://localhost:8000/agents \
-H "Content-Type: application/json" \
-d '{"name": "deploy-agent", "system_prompt": "...", "a2a_exposed": true}'
A2A Exposure¶
The a2a_exposed field controls whether an agent is exposed via the A2A (Agent-to-Agent) protocol. When True:
- The agent has an Agent Card at
GET /a2a/{agent_name}/.well-known/agent-card.json - The agent appears in
GET /.well-known/agent-card.json(scope-filtered list) - The agent gets a dedicated JSON-RPC endpoint at
POST /a2a/{agent_name} - External A2A clients can discover and invoke the agent
Built-in agents (default, readonly, etc.) have a2a_exposed=False by default. Set it to True explicitly for agents you want to expose:
# .cognition/agents/deploy-agent.yaml
name: deploy-agent
mode: primary
a2a_exposed: true
system_prompt: |
You are a deployment agent...
Or via the API:
curl -X POST http://localhost:8000/agents \
-H "Content-Type: application/json" \
-d '{"name": "deploy-agent", "system_prompt": "...", "a2a_exposed": true}'
System Prompt Sources¶
The system_prompt field accepts three forms via PromptConfig:
| Source | Config | Description |
|---|---|---|
| Inline | system_prompt: "You are..." |
Direct text in the definition |
| File | {file: "deploy-agent"} |
Loaded from .cognition/prompts/deploy-agent.md |
| MLflow | {mlflow: "my-prompt@v3"} |
Loaded from an MLflow prompt registry at startup |
AgentConfig¶
Per-agent LLM configuration that overrides the server default:
class AgentConfig(BaseModel):
provider: str | None = None
model: str | None = None
temperature: float | None = None
max_tokens: int | None = None
recursion_limit: int | None = None
tool_token_limit_before_evict: int | None = None
Tool Resolution¶
Tools are referenced by registry name. At runtime, RuntimeResolver.build_tools() looks up each name in the ConfigRegistry and returns the corresponding callable. File-seeded tools (from tool_sources) and API-registered tools are both resolved this way. The allowed_tool_names parameter on build_tools() filters to only the tools attached to the agent definition.
Loading Agent Definitions¶
From YAML¶
# .cognition/agents/security-auditor.yaml
name: security-auditor
mode: subagent
description: Audits code for security vulnerabilities
system_prompt: |
You are a security expert. Audit code for vulnerabilities.
Report findings with severity ratings.
tools:
- "run_semgrep"
- "check_dependencies"
config:
model: gpt-4o
temperature: 0.1
From Markdown (with YAML frontmatter)¶
The file name becomes the agent name; the Markdown body becomes the system_prompt.
---
mode: subagent
description: Read-only research assistant
tools:
- "web_search"
---
You are a research assistant. Gather information from the web and summarize findings.
Do not execute any code or modify files.
Programmatically¶
from server.app.agent.definition import AgentDefinition, AgentConfig
definition = AgentDefinition(
name="my-agent",
system_prompt="You are a helpful assistant.",
tools=["analyze"],
config=AgentConfig(model="gpt-4o-mini", temperature=0.3),
mode="primary",
)
Agent Registry¶
AgentDefinitionRegistry (server/app/agent/agent_definition_registry.py) is the server-level catalog of all available agents.
Built-in Agents¶
| Name | Mode | Description |
|---|---|---|
default |
primary |
Full-access coding agent; all built-in tools enabled |
readonly |
primary |
Analysis-only; write and execute tools disabled |
hitl_test |
primary |
Manual HITL verification agent; attempts protected tool calls immediately |
User-Defined Agents¶
On startup, the registry scans .cognition/agents/ for *.md and *.yaml files and loads each as an AgentDefinition. The file watcher (server/app/file_watcher.py) calls registry.reload() when files change, enabling hot-reload without a server restart.
Registry API¶
registry = get_agent_definition_registry()
# List all non-hidden agents
agents = registry.get_all()
# Get a specific agent
agent = registry.get("security-auditor")
# Only agents that can own sessions
primaries = registry.primaries()
# Only agents invocable by other agents
subs = registry.subagents()
# Check if a name is valid for session creation
registry.is_valid_primary("readonly") # True
registry.is_valid_primary("my-subagent") # False
REST Interface¶
# List all available agents
curl http://localhost:8000/agents
# Get a specific agent
curl http://localhost:8000/agents/readonly
Response fields include name, description, mode, hidden, native, a2a_exposed, model, temperature, response_format, interrupt_on, tools, skills, and a truncated system_prompt (max 500 characters).
Capability Discovery¶
Returns installed package versions, supported stream protocols, sandbox backends, feature flags (including a2a, mcp, artifacts), middleware names, and scope configuration. See API Reference for the full response schema.
Agent Factory¶
create_cognition_agent() (server/app/agent/cognition_agent.py) is the async factory that builds a Deep Agent from an AgentDefinition. It is called by DeepAgentRuntime when creating a new session's runtime.
The factory:
- Selects the sandbox backend from settings (
localordocker) - Loads built-in tools:
BrowserTool,SearchTool,InspectPackageTool - Loads MCP tools from configured remote servers
- Resolves tools from the ConfigRegistry by registry name (filtered by
allowed_tool_namesfrom the AgentDefinition) - Attaches the middleware stack:
ToolSecurityMiddleware— blocks tools on theCOGNITION_BLOCKED_TOOLSdeny-listCognitionObservabilityMiddleware— tracks LLM and tool Prometheus metricsCognitionStreamingMiddleware— emitsthinking/idlestatus events- Loads upstream middleware specified in the definition (see Extending Agents)
- Injects subagents as Deep Agents
SubAgentdicts - Passes
store=(LangGraphBaseStore) andcontext_schema=CognitionContextfor cross-thread memory
Agent Caching¶
Agents are cached by an MD5 hash of their definition. If two sessions share the same AgentDefinition, they reuse the same agent object. The cache is invalidated by invalidate_agent_cache(name) and cleared entirely by clear_agent_cache(). The file watcher triggers cache invalidation on .cognition/ changes.
Multi-Agent Delegation¶
When a primary agent needs to delegate a task, it invokes the task tool with a target agent name and a task description. The runtime:
- Emits a
DelegationEventto the stream (visible to the client) - Creates a
DeepAgentRuntimefor the target subagent - Runs the subagent to completion
- Returns the result to the primary agent's context
Subagents have their own tool sets and system prompts but run within the same session's thread, preserving shared state. Clients can observe delegation via the delegation SSE event and the subsequent events from the subagent's execution.
CognitionContext and Cross-Thread Memory¶
CognitionContext (server/app/agent/cognition_agent.py) is a typed invocation context injected into every agent run. It is built from session.scopes and forwarded to astream() and ainvoke() so that nodes and middleware can access it via runtime.context.
@dataclass
class CognitionContext:
effective_scope: dict[str, str] # e.g. {"tenant": "acme", "project": "ios", "end_user": "user_123"}
session_id: str | None = None
thread_id: str | None = None
agent_name: str | None = None
metadata: dict[str, str] = field(default_factory=dict)
effective_scope carries the builder-authorized scope. Cognition does not hardcode a vocabulary — builders own the scope keys. The scope is propagated through the full pipeline: HTTP headers → SessionScope → effective_scope dict → CognitionContext → LangGraph runtime.context → middleware (trusted) → tools (via runtime context, not model-supplied arguments).
This context serves two purposes:
-
Store namespace scoping —
runtime.store(a LangGraphBaseStore) is available inside agent nodes and middleware.effective_scopeis the natural key for building per-tenant memory namespaces, ensuring one tenant cannot read another's stored memories. -
Middleware access — any custom middleware can read
runtime.contextto branch on builder-defined scope dimensions without coupling to the HTTP layer.
# Inside a custom tool or middleware
async def save_to_memory(content: str, config: RunnableConfig) -> str:
store = config["store"]
context = config["configurable"].get("cognition_context")
# Use a scope key as the namespace prefix
tenant = context.effective_scope.get("tenant", "default")
namespace = (tenant, "memories")
await store.aput(namespace, key, {"content": content})
return "Saved."
Note: Built-in memory tools (
save_memory,search_memories) are planned but not yet shipped. See GitHub issue #45 for the design discussion. In the meantime, the Store infrastructure is fully wired and custom tools can use it today.