Skip to main content

Tools: Capabilities in the Environment

The system reduces to two things: memory and tools.

If it's not data (memory), it's a tool.

Tools are operations — verbs. Read, write, search, query, send, execute. The operation itself has no persistent state. It acts on memory and returns.

Memory is the nouns. Tools are the verbs. Clean binary.

Current Architectures are inefficient

Current architectures generally assume tools need a dedicated subsystem — registries, protocols, discovery frameworks, execution engines.

But your hand doesn't need a tool management department between you and your hammer. And your AI system doesn't need one either. That's why this architecture has no dedicated tool subsystem.

Tools are data. The AI model and the memory (the brain) tell the agent loop (the hand) what tool calls to execute. There's nothing left over.

Same Tools, Better Architecture

This architecture uses the same tools — MCP servers, CLI tools, APIs, whatever the ecosystem produces.

What's different is that it doesn't build dedicated infrastructure around them. Tools are just operations that act on data (memory) — and everything they need already has a home.

The architecture has four components, two APIs, and three external dependencies. Tools are one of the external dependencies — not a component, not an API. Tool calls flow through the Model API and are executed by the Agent Loop. No separate tool protocol needed.

                              ┌───────────────────────────────────────────────┐
│ YOUR MEMORY │
│ (the platform) │
└───────────────────────▲───────────────────────┘

tools (read/write)

Clients ──→ Gateway API ──→ Gateway ──→ Agent Loop ──→ Model API ──→ Models
(external) (API) (component) (component) (API) (external)

─── Auth ─── └──→ Tools (verbs) ──→ External Memory (nouns)
(cross-cutting ├── MCP servers ├── Salesforce data
component, ├── CLI tools ├── Weather services
applies to all └── Native functions └── The internet
requests)

Your Memory is the platform. Tools are how you interact with it.

Three things follow from this:

The architecture stays small. Everything this system processes is either a noun (memory) or a verb (tool). No third category. That binary is what keeps it at four components. New capabilities arrive by adding tools to the environment, not by adding components to the system.

Your Memory stays dependency-free. Memory never knows what's reading or writing to it — tools handle that. Swap the Agent Loop, swap the storage, swap the tools themselves — Memory doesn't change. That's what makes it truly portable and ownable.

The system adapts at the speed of AI. Because the architecture expands via tools rather than via new infrastructure, a new model capability, a new protocol, a new integration — all tool additions, not system changes. The thing that doesn't need to be rebuilt is the thing that survives.

The rest of this document shows why.

This is an Architecture document — it defines how tools work at the generic, unopinionated level. Product-specific tool choices (which tools ship, what defaults, MCP as default protocol) are implementation opinions.

Related documents: foundation-spec.md (architecture overview, links to all component specs)


The Memory/Tool Binary

In the physical world, a hammer is a distinct object from a filing cabinet, and a screwdriver is a separate thing from the book about screwdrivers. In the digital world, that distinction dissolves. A tool is its definition, its code, its instructions — all just data. The tool's definition comes from the tool itself — MCP tools self-describe via protocol, others provide manifest files. The tool's executable — the software that runs when the Agent Loop makes a call — lives in the environment. The only thing that isn't data is the act of execution itself — the moment the Agent Loop runs the call. That's runtime, not persistent.

The AI industry typically defines "tool" as an MCP server or function call — conflating the operation (the verb) with the data it accesses (the noun). This system is more precise. What looks like "a RAG tool" is actually a pattern that decomposes:

What people call itMemory (data/noun)Tool (operation/verb)
RAG systemVector store (indexed content)semantic_search(query)
DatabaseStored recordssql_query(statement)
Salesforce integrationCRM data (externally hosted)salesforce_query(filter)
Knowledge graphNodes and edgesgraph_traverse(start, relation)
Search indexIndexed contentfull_text_search(terms)
Email via GmailEmail history (Google-hosted)send_email tool calling Gmail API (tool calling tool)

Pure tools with no memory component do exist: sending a message, running a calculation, controlling a browser. These are verbs with side effects, not data.


Why Tools Are Not a Component

The Agent Loop is a component — it has a distinct job (run the agent loop), a contract, and an implementation. Your Memory is a component — it has a distinct job (persist and retrieve data), a contract, and an implementation. Auth is a component — it has a distinct job (control access), a contract, and an implementation.

Tools don't have a distinct job that isn't already covered by existing components. Every concern that seems like it belongs to a "Tools component" actually belongs somewhere else:

ConcernWhere it livesWhy not a Tools component
Tool definitions (name, description, parameters)The tools themselves — self-describing via MCP protocol or manifest filesTools know what they can do. Nobody writes definitions manually
Tool code / executablesThe environment — software the Agent Loop can callAvailable software isn't a component of this system
Instructions for when/how to use toolsYour Memory — skills, AGENT.mdThe model reads instructions from Your Memory — these are personal data
Tool preferences (always-send set, policies)Your Memory — personal data about how you work"Always load git tools" is about you, not about this desk
Tool plumbing (server addresses, ports, API keys)Environment config — deployment-specificPlumbing describes the desk, not the person sitting at it
Tool discovery (what's available)Configuration — which tool sources to search; tools self-describe once foundDiscovery is just knowing where to look, not a separate system
Tool executionAgent Loop — it executes callsPart of the agent loop — the Agent Loop already does this
Tool resultsAgent Loop — returned to the modelPart of the agent loop — the Agent Loop already does this
Tool permissions (who can use what)AuthAccess control is Auth's job
Tool installationThe environment — environment-specificInstalling a tool changes the deployment environment

Nothing is left over. There is no gap that requires a dedicated component.

Tools are also not an API. The foundation spec originally proposed a Tool Protocol API alongside the Gateway API and Model API. But trace where each part actually lives.

How tool calls are expressed — that's the Model API. The model's API format already defines tool call syntax. How tool calls are executed — that's the Agent Loop's implementation detail (MCP server, CLI command, native function). How results come back — that's the Model API again, in the next message to the model.

The Model API covers one side. The Agent Loop's implementation covers the other. There's no gap between them that needs a dedicated API. MCP, CLI tools, native functions — these aren't "the tool protocol." They're implementation options for how the Agent Loop executes specific tools.


How Scope Works

The expanding sphere — library folder → full filesystem → external services → inbound connections — is entirely driven by which tools exist in the environment and what Auth allows.

  • V1: The environment has library-scoped file tools. The Agent Loop can only operate within the library folder. Not because a scope enforcer limits it — but because those are the only tools that exist.
  • V2: System-level file tools are added to the environment. The Agent Loop can now access the full filesystem. The Agent Loop didn't change. Your Memory didn't change. More data (tool definitions) entered the environment.
  • V3: External API tools are added. Same pattern.
  • V4: Inbound integration tools are added. Same pattern.

Each expansion is more data entering the environment. Each contraction is data being removed. The Agent Loop doesn't know what scope it's operating in. It uses whatever tools exist. Scope equals what tools are available plus what Auth permits.

Three layers of tool availability

LayerWhat it controlsWho decides
Environment — what tools existWhich tools are physically present and executableConfiguration. On local: defaults + owner adds anything. On managed hosting: the provider curates + allow list.
Auth — who can use whatWhich of the available tools each actor can accessPermissions. Owner has full access. Collaborators, agents, and other actors have scoped access.
Granularity — how you can use themRead-only vs read-write, restricted parameters, conditional accessPermissions refined. Future concern — Auth getting more specific over time.

Configuration decides what exists. Auth decides who can use what. The model sees both. The Agent Loop only sees what it can execute.

The model benefits from knowing about tools that exist but aren't permissioned for the current actor. This lets the model say "that capability is available but your account doesn't have access" rather than pretending the capability doesn't exist. The Agent Loop doesn't need this distinction — if a tool isn't executable (for any reason), it can't run it.


Tools and Skills

Skills and tools are related but different things.

A skill is a recipe — instructions in Your Memory that tell the model what to do, step by step, with judgment. "Interview the user about this topic. Ask 10 rounds of questions. Challenge vague answers. Produce a spec at the end."

A tool is kitchen equipment — a capability in the environment that lets the model do something specific. Read a file. Write a file. Search for content. Call an API.

The skill references the tools. "Use the read tool to check AGENT.md first. Use the write tool to save the spec." But the skill is not a tool. The skill is data in Your Memory that the model reads and follows. The tool is a capability in the environment that the Agent Loop executes.

Skills can include code or scripts alongside natural language instructions — bash commands, configuration snippets, templates. These are still data. A script on disk is just bytes in Your Memory. The model reads it, understands what it does, and executes it through available tools (bash, shell, etc.).

The skill tool pattern

Every skill is a prompt — a markdown file in Your Memory. Every prompt is the same from the system's standpoint. There's no structural difference between an interview skill, a spec-generation skill, or any future skill. This means skills don't each need their own tool.

One generic skill tool handles all skills. The skill tool's job: load a skill prompt from Your Memory and inject it into the model's context. The model then follows the skill's instructions using whatever tools are available. The interview skill, the spec-generation skill, and the plan-generation skill all work through the same skill tool — the only difference is which markdown file gets loaded.

This has three consequences:

  1. Skills are owner-editable without developer involvement. Want to change 8 things about the interview prompt? Edit the markdown file. No code changes, no redeployment, no developer required.

  2. New skills are free. Adding a new skill is creating a new markdown file. The skill tool doesn't need to know about it in advance — it just loads whatever skill the model (or owner) selects.

  3. Coded tools are the escalation path, not the starting point. If a skill-as-prompt isn't sufficient — if prompt engineering hits a ceiling and the model can't handle a particular workflow through instructions alone — then a specific coded tool can be built. But not preemptively. Start with the simplest thing (prompt), see how it works, code only when necessary. As models improve, what requires coded tools today may be handled by prompts tomorrow.

Approval gates — why they can't be a prompt

One tool that must be a coded tool from the start: approval gates.

When the system is about to write to the user's files, it must pause and ask for confirmation. This cannot be handled by the system prompt alone — you can't guarantee the model will comply. A system prompt that says "always ask before writing" is a suggestion the model might follow. An approval gate tool that intercepts write operations is a guarantee enforced by software. The approval gate is still a tool — a capability in the environment that the Agent Loop executes — but when the consequence of non-compliance is writing to the user's files without permission, you don't rely on compliance.

Example: a minimal tool set

A deployment needs very few tools to be functional. Filesystem operations (read, write, edit, delete, search, list) are the foundation — they give the Agent Loop access to Your Memory. On top of those, three tools cover the core patterns:

ToolTypeWhat it illustrates
Skill toolGeneric loaderOne tool handles all skills — loads a prompt from Your Memory and injects it into context. No per-skill tools needed.
Git toolVersion controlA standard external tool wrapped for the Agent Loop — initializes repos, commits changes, provides diff.
Approval gateWrite guardA coded tool that must exist (see above) — intercepts write operations, pauses for owner confirmation.

This set is illustrative, not prescriptive. An implementation decides which tools ship. The point is that a small number of tools — filesystem access, a skill loader, version control, and a write guard — is enough to run a fully functional system.


Tool Context Management — What the Model Sees

Tool definitions are sent through the Model API alongside system instructions and conversation history. Every tool definition consumes context window space and costs tokens on every API call. With a handful of MVP tools, this is negligible. As the system expands — more filesystem tools, external API tools, marketplace tools — sending every registered tool definition with every prompt becomes wasteful and degrades model performance. Models make worse tool selections when presented with dozens of irrelevant options.

Two sets: always-send and discoverable

CategoryWhat it isHow it works
Always-send setTools the owner uses regularly — their daily driversDefinitions are sent with every prompt through the Model API. Zero overhead to use.
Discoverable setEverything else that's registered in the environmentAvailable on demand through a discovery tool. Not sent with every prompt.

The always-send set is owner-configured. The owner decides which tools belong in their daily set based on how they work. A developer might always-send filesystem, git, and deployment tools. A writer might always-send filesystem and research tools. A business owner might always-send calendar, email, and CRM tools. The set reflects the individual, not a product decision.

The discovery tool is a meta-tool that's always in the always-send set. When the model needs a capability that isn't in the current context, it searches available tools, finds what it needs, and loads it. The model decides when to search — the Agent Loop stays a pass-through.

Why the owner controls this

On big-tech AI platforms, the platform decides which tools every user gets. You can't remove tools you don't use (they still consume your context). You can't add specialized tools that matter to your workflow. You can't tune the tradeoff between tool availability and context efficiency.

In this architecture, the owner makes that call.

Scaling properties

This approach degrades gracefully in both directions:

  • Small tool count (MVP, V1): Put everything in the always-send set. No discovery needed. The discovery tool exists but rarely fires.
  • Medium tool count (V2): Owner's daily drivers in always-send, the rest discoverable. One extra round trip when the model needs something unusual.
  • Large tool count (marketplace, V3+): Discovery becomes essential. The always-send set stays lean. The long tail lives in discovery. Same architecture, no changes.

Discovery at scale — hundreds or thousands of tools

The two-set split handles context window scaling — don't send everything in every prompt. But when the discoverable set grows to hundreds or thousands of tools, the discovery tool itself needs to work at that scale. Three things make this work.

The tool catalog is just data. The Agent Loop collects self-descriptions from every tool in the environment into a searchable index. The catalog doesn't own the tools — it just reflects what's available. Tools appear and disappear; the catalog tracks that.

Discovery is two-step: search, then load. The model calls the discovery tool with a natural-language description of what it needs. The discovery tool searches the catalog and returns a short list of matches — names and descriptions only, not full definitions. The model picks the right one and loads its full definition into context. The model never sees every definition at once, so this stays lightweight at any scale.

Tools identify themselves by source and name. MCP servers expose a namespace, CLI tools have command names, native functions have function names. When tools come from different sources, the source is part of the identity — two tools named search from different sources are distinguishable without a central naming authority.

None of this changes the architecture. The discovery tool is still a tool. The catalog is data. The Agent Loop still executes whatever it's told.


How the Agent Loop Uses Tools

The previous sections explain what tools are and what they enable. This section explains how they work at runtime — how the Agent Loop finds them, calls them, and gets results back.

The execution loop

When the model decides to use a tool, the sequence is:

  1. Model produces a tool call — "call read with {path: '/library/finances/spec.md'}" — expressed in the model's completion format through the Model API
  2. Agent Loop receives the tool call — the Agent Loop sees a tool call in the model's response and executes it
  3. Agent Loop calls the tool — using whatever execution mechanism the tool requires (MCP protocol, CLI command, native function call)
  4. Tool returns a result — data flows back to the Agent Loop
  5. Agent Loop passes the result to the model — as the next message in the conversation
  6. Model continues — may call more tools or produce a text response

The Agent Loop doesn't interpret, filter, or modify tool calls or results. It's a pass-through executor. The model decides what to call. The Agent Loop handles how to call it. Auth controls whether the call is permitted.

Tool registration — how the Agent Loop knows what's available

There's no special registration system. The Agent Loop's normal loop (see engine-spec.md) already includes tool definitions as an input alongside the system prompt and conversation history. "Registration" is just how those definitions get there.

Tool plumbing — where tools live, what addresses to connect to — is environment configuration that travels with the deployment. Tool preferences — which tools to always-send, what policies to apply — are personal data in Your Memory. At startup, the Agent Loop reads environment config to know where to find tools, then discovers what they offer. How it gets the actual definitions depends on the tool:

MechanismHow definitions arriveExample
MCP serverAgent Loop connects to the configured server and discovers what tools it offers via the MCP protocol.Filesystem tools (read, write, search, list)
Native functionDefinition is part of the configuration data directly.Approval gate, audit
CLI toolWrapped in a tool definition that describes the command and its parameters.Git operations

Either way, the result is the same: tool definitions end up in the prompt, the model sees them, and the model decides when to use them. The Agent Loop doesn't understand what the tools do — it just includes their definitions and executes calls.

Where tool executables live

Tool definitions come from the tools themselves — they self-describe what they can do. But a definition alone doesn't do anything — somewhere, there's actual software that runs when the Agent Loop executes a tool call. That software — the tool executable — lives in the environment, not in Your Memory.

LocationWhat it meansWhen it appliesExample
In-processTool code runs inside the Agent Loop's process. Calls are function calls — fast, simple, no network overhead.MVP. The simplest thing that works.MCP tool server as a library, native functions (approval gate, audit)
Sibling containerTool code runs in a separate Docker container alongside the Agent Loop. Calls go over the network.V1+. When isolation, independent deployment, or language diversity matters.Per-tool or per-group containers
Remote serviceTool code runs on an external server.V2+. Third-party tools, marketplace tools, partner integrations.A hosted code execution service, a weather API

The Agent Loop treats all three identically. It calls the configured protocol, passes the input, gets a result. The difference between in-process, sibling container, and remote service is a deployment decision — it doesn't change how the Agent Loop works, how the model sees the tool, or how Auth permissions it.

Communication mechanisms

MechanismHow it worksWhen it's used
MCP over stdioAgent Loop and tool communicate through standard input/output streams within the same process or a child process.In-process tools, locally spawned MCP servers
MCP over networkAgent Loop sends requests to a tool's network endpoint (HTTP, WebSocket). Same MCP protocol, different transport.Sibling containers, remote MCP servers
CLI executionAgent Loop spawns a child process, passes arguments, reads stdout/stderr.Host binaries (Git, system utilities)
Native function callDirect function invocation within the Agent Loop process. No serialization, no protocol overhead.Built-in Agent Loop functions (approval gate, audit logging)

MCP is the default protocol — this is an implementation opinion. MCP provides structured tool definitions, typed parameters, and a standard discovery mechanism. But the architecture doesn't mandate it. Any mechanism that takes input and returns output works.

Tool isolation

Tools that run outside the Agent Loop process can be isolated in their own Docker containers. This is a V1+ concern, but the architecture supports it from the start.

Trust levelIsolationWhoExample
Full trust (in-process)None — runs in Agent Loop's process with same accessPlatform-provided tools — code reviewed and maintained by the platform teamFilesystem tools, skill loader, approval gate
Shared containerGrouped tools share a container, separated from Agent LoopRelated tools from the same source where inter-tool access is expectedA tool suite from one third-party vendor
Dedicated containerOwn filesystem, own network policy, own resource limitsUntrusted third-party or marketplace toolsCommunity-contributed tools, marketplace installs

Container isolation protects against four risks: filesystem access (tool sees only its own filesystem, specific directories mounted as needed), network access (restricted to specific hosts/ports or none), cross-tool interference (no shared state or process space), and resource exhaustion (CPU/memory/disk limits per container — a runaway tool is throttled or killed without affecting the Agent Loop or anything else).

Owner-installed tools — tools the owner writes or installs themselves — are isolated by default (separate container), but the owner can override. On local, the owner has full control over the isolation level. On managed hosting, the provider applies minimum isolation requirements.

Tool lifecycle, health, and upgrades

Starting: In-process tools start with the Agent Loop — no separate step. Containerized tools start at Agent Loop startup (via Docker Compose or equivalent) or on demand when first needed. Remote tools are independently managed — already running when the Agent Loop connects.

Stopping: In-process tools stop with the Agent Loop. Containerized tools can stop independently — the Agent Loop treats a stopped tool the same as a crashed tool and reports unavailability to the model. Remote tools may go down at any time — same handling.

Health: The Agent Loop detects unresponsive tools through configurable timeouts. Failures are reported to the model as tool results ("tool X is unavailable"), not as system errors. The model decides the next step — retry, alternative approach, or inform the owner. Container crashes don't take down the Agent Loop. In-process crashes are more serious (they can crash the Agent Loop), which is why in-process is reserved for trusted, well-tested code.

Upgrades: Containerized and remote tools can be upgraded without restarting the Agent Loop — stop the old container, start the new one, the Agent Loop reconnects and re-discovers capabilities. In-process tools require an Agent Loop restart (acceptable for platform-provided tools, which change alongside the Agent Loop).

Tools and the Gateway API — the parity principle

Any external caller — whether a human client or a programmatic tool/agent — should be able to use the same API to interact with the system.

The Gateway API is the single entry point for all clients. This creates a clean separation between two kinds of tool:

CategoryRelationship to Agent LoopExample
Tools the Agent Loop callsAgent Loop → Tool. The model decides to use a capability and the Agent Loop executes the call.Filesystem read/write, Git, skill loader
Agents/tools that call the systemTool → Gateway API → Agent Loop. An external program sends a message the same way a human interface does.Another AI agent, a scheduled job, a webhook handler

When software talks to the system (not for the system), it should go through the Gateway API — the same contract any client uses. This ensures that a programmatic caller has the same capabilities, the same auth checks, and the same conversation management as a human owner.

For the MVP, this distinction doesn't matter — there's one owner at a terminal, and tools are in-process. For V1+, maintaining this parity is a discipline checkpoint: the CLI must not have any special path to the Agent Loop that a programmatic caller couldn't also use.


Decisions Made

#DecisionRationale
D51Tools are not a component — they are capabilities in the environment (data executed by the Engine, permissioned by Auth)Every tool-related concern maps to an existing component: definitions are self-describing (from the tools themselves, D146), execution is the Engine's job, permissions are Auth's job, configuration is data, installation is data entering the system. There is no gap that requires a dedicated component. Same pattern as D41 (the original "Harness" dissolved into the other components working together).
D52Tools are data — in the digital world, a tool IS its definition, code, and instructionsIn the physical world, a screwdriver is a separate object from the book about screwdrivers. In the digital world, that distinction dissolves. Tool definitions, executable code, and usage instructions are all data. There's no special category of "tool stuff" separate from data. Refined by D141-refined: three-way split — tool definitions come from the tools themselves (self-describing, D146), tool plumbing (addresses, ports) is environment config, tool preferences (always-send, policies) are personal data in Your Memory (D145). Skills and instructions remain in Your Memory.
D53The Tool Protocol connector is not needed — tool calls flow through the Provider API, tool execution is an Engine implementation detailThe Provider API already defines how tool calls are expressed (model completions) and how results are returned (next message). The Engine already handles execution (MCP, CLI, native functions are implementation options). There's no gap between them that needs a separate connector. The architecture has two connectors, not three.
D54A tool is a tool — no categories, no taxonomies, no special handling for different kinds of toolsWhether a tool reads Your Memory, calls an external API, or does computation, the Engine treats it identically. Auth handles permissions. The tool system doesn't need to distinguish between "memory tools" and "capability tools" and "integration tools." A tool is a tool.
D55Scope is determined by what tools exist in the environment plus what Auth allows — the expanding sphere is just adding dataEach expansion (library → filesystem → external → inbound) is more tool data entering the environment. Each contraction is data being removed. The Engine doesn't know what scope it's in. No scope enforcer, no boundary manager, no special mechanism. Scope = available tools + permissions.
D56The model should know about tools that exist but aren't permissioned — the Engine doesn't need toThe model benefits from knowing "this tool exists but you can't use it" so it can give the owner a useful response. The Engine doesn't care why a tool is unavailable — it just knows it can't execute it. Richer information flows to the intelligence (model), simpler information flows to the executor (Engine).
D109Owner controls which tools are sent with every prompt (always-send set) vs. available on demand (discoverable set) — a discovery meta-tool bridges the gapSending every registered tool definition with every prompt wastes context, costs tokens, and degrades model quality as tool count grows. Splitting into owner-configured always-send + discoverable sets keeps the prompt lean while preserving access to everything. The owner — not the platform — decides the tradeoff. This is a concrete expression of user-owned AI: you control what your AI has at its fingertips.
D135Memory/tool binary — if it's data, it's memory; if it's not data, it's a toolAll data, anywhere, is memory — the only question is whose. Tools are the verbs, memory is the nouns. Clean binary that decomposes everything the system processes.
D141Tool configs are environment configuration, not personal data — they travel with the deployment, not with Your MemoryDave J identified that storing tool configs in Your Memory creates a false dependency: move your memory to another system and it references tools that don't exist. Tool plumbing (server addresses, ports) and installation describe the deployment environment, not the user's personal data. Same principle as .env files or settings.json. Refined by D141-refined: original scope was too broad. Tool definitions → provided by tools themselves (D146). Tool preferences (always-send, policies) → Your Memory (D145). Only plumbing stays in environment config. See configuration-spec.md §Impact on D141.
D141-refinedThree-way split: tool definitions → self-describing (D146), plumbing → environment config, preferences → Your Memory (D145)Original D141 scope was too broad — moving everything to environment config pulled personal preferences away from the owner. The refinement keeps only plumbing in environment config, returns preferences to Your Memory, and lets tools self-describe their own definitions.
D145Tool preferences are personal data in Your Memory — they travel with you"Always load git tools" is about you, not about this desk. Preferences (always-send set, policies, interaction style) belong in Your Memory because they describe the owner, not the deployment.
D146Tools self-describe — definitions come from the tools themselves, not from manual configurationMCP tools describe themselves via protocol. Other tools provide manifest files. Nobody writes tool definitions manually. This is the first leg of the D141-refined three-way split.

Open Questions

OQ-1: Tool discovery/registration path in V1+

In the MVP, tools are in-process — the Agent Loop calls them directly. In V1+, when tools run as separate services or containers, how does discovery work? Two options:

  1. Agent-Loop-direct — The Agent Loop connects to configured tool servers at startup and discovers their capabilities via MCP. Tools register with the Agent Loop, not the Gateway.
  2. Gateway-mediated — Tools register through the Gateway API, the same way any client connects. The Gateway tells the Agent Loop what tools are available.

Option 1 is simpler and matches how MCP works today. Option 2 enforces the parity principle (tools use the same API as clients) but adds complexity to the Gateway. This doesn't need to be resolved for the MVP — tools are in-process either way. Revisit for V1 Gateway design.

Source: 2026-02-25 foundation architecture discussion (Dave W + Dave J). Dave J described tools "talking to the agent loop through the API gateway." Current spec has the Agent Loop executing tools directly. Both are valid — the question is which path V1 takes.


Success Criteria

  • Adding a tool to the environment requires only adding data (configuration, files) — no code changes to Agent Loop, Your Memory, Client, or Auth
  • Removing a tool from the environment requires only removing data — no code changes
  • The Agent Loop executes tool calls without knowing or caring what the tools do
  • Auth controls tool access without the Agent Loop or Your Memory knowing about permissions
  • Scope expansion (library → filesystem → external → inbound) works by adding tools — no architectural changes
  • The model can distinguish between "tool doesn't exist" and "tool exists but not permissioned" — the Agent Loop doesn't need to
  • Skills (recipes in Your Memory) reference tools (equipment in the environment) and both work independently — changing a skill doesn't require changing tools, adding a tool doesn't require changing skills
  • The owner can configure which tools are in the always-send set — no code changes, just configuration
  • A discovery tool can find and load tools from the discoverable set at runtime — the model decides when to search, the Agent Loop stays a pass-through

Security Requirements

Per-component requirements from security-spec.md. Security-spec owns the "why"; this section owns the "what" for Tools.

  • Untrusted tools must run in isolated containers — mandatory, not configurable. Restricted filesystem, restricted network, resource limits
  • Tool isolation must be independent of Auth — even if Auth fails, the tool can't escape its container
  • The system must warn the owner when installing unverified tools on local deployment
  • Managed hosting must enforce a curated tool allow list — no unvetted tools
  • Tool crashes must not take down the Agent Loop — containerized tools fail independently

Changelog

DateChangeSource
2026-03-01Codex cross-reference audit fixes: (1) Untrusted tool isolation — "by default" → "mandatory, not configurable" in Security Requirements, aligns with security-spec "mandatory isolation." (2) Owner-installed tools — "fall wherever the owner decides" → "isolated by default, owner can override," aligns with security-spec language.Codex audit (Dave W + Claude)
2026-03-01"No users, only owners" language pass: user → owner throughoutOwnership model alignment (Dave W + Claude)
2026-02-28Added "Discovery at scale" subsection — how discovery works at hundreds/thousands of tools: catalog as data, two-step search-then-load, source-based tool identity. No new components or architectural changes.Working session (Dave W + Claude)
2026-02-27D141-refined: Three-way walkback of D141. Tool definitions → self-describing (D146). Tool preferences (always-send, policies) → back to Your Memory (D145). Only plumbing (addresses, ports) stays in environment config. Table, registration section, D52/D141 rationales, and closing summary updated.Configuration spec interview (Dave W + Claude + Codex)
2026-02-27D141: Tool definitions, configuration, and installation moved from "Memory" to "Engine/environment config." Skills and instructions remain in Your Memory. D52 rationale refined. Closing summary updated.Foundation architecture lock-in debate (Dave W + Dave J + Claude)
2026-02-27Reorder + trim: why→what→how flow, merged "What Tools Actually Are" into binary section, folded "Not a Connector" into "Not a Component", moved Scope/Skills/Context Management before implementation detail, condensed isolation (table replaces ASCII art), condensed lifecycle/health/upgrades, removed Level 1/2/3 section (covered in intro + context management). ~450 lines from 570.Spec maintenance (Dave W + Claude)
2026-02-27Added Security Requirements section — cross-referenced from security-spec.md per T-219T-219 (Dave W + Claude)
2026-02-23Initial Tools spec created from interview — established that Tools is not a component or connectorTools interview session (Dave W + Claude)
2026-02-23Consistency pass — updated "Impact on Architecture" to reflect final architecture (D64): 4 components, 2 connectors, 3 external dependencies. Replaced stale component/connector tables (had wrong counts, "Engine API" → "Gateway API", Interface/Models listed as components). Replaced "updates deferred" note with reconciliation confirmation.Cross-doc consistency audit (Dave W + Claude)
2026-02-25Added "How the Engine Discovers and Executes Tools" section — execution loop, registration model, MVP vs V1+ path, Gateway API parity principle, tool isolation. Expanded "Tools and Skills" with skill-tool pattern, approval gates rationale, MVP tool inventory. Added OQ-1 (tool discovery path for V1+).Foundation architecture deep dive (Dave W + Dave J)
2026-02-25Added "Tool Context Management" section — always-send set vs discoverable set, owner controls which tools are sent with every prompt, discovery meta-tool for the long tail. Added D109.Working session (Dave W + Claude)
2026-02-25Expanded "Where tools live & isolation model" — replaced thin MVP/V1+ and isolation subsections with comprehensive treatment: where tool executables live (in-process, sibling container, remote service), how the Engine communicates with tools (MCP stdio/network, CLI, native), expanded isolation model (trust levels, isolation spectrum, crash recovery), and tool lifecycle (start, stop, health, upgrades).Working session (Dave W + Claude)

Tools is the spec that explains why it doesn't need to be a spec. Every architecture assumes tools need a dedicated subsystem — registries, protocols, discovery frameworks, execution engines. This one doesn't. Tools are data — definitions come from the tools themselves (self-describing), plumbing (addresses, ports) is environment config, preferences (always-send, policies) and skills are personal data in Your Memory. The Agent Loop executes tool calls. Auth permissions them. There's nothing left over.