Skip to main content

Models: External Intelligence

Every AI system uses a model — ChatGPT runs GPT, Copilot runs its own models, agent frameworks wire up whatever LLM the developer chooses. So what's different here?

Nothing about how the model is used. What's different is where it sits in the architecture. Models are not a component of the system — they are external intelligence accessed through the Model API. The system doesn't contain a model. It calls one. The system has four components (Your Memory, Agent Loop, Auth, Gateway). Models aren't one of them.

In the human analogy that runs through this architecture: Memory is the persistent half of the brain. The model is the other half — the intelligence, the reasoning, the processing. Memory + Model = the brain (D44). Neither is complete alone. Memory without the model is a filing cabinet nobody's reading. The model without memory is a genius with amnesia.

But unlike a human brain, a digital brain can separate memory from intelligence — and that changes everything. In a biological brain, your memory and your intelligence are fused to the same hardware. You can't upgrade your neurons. You can't swap in better reasoning. You're stuck with both. A digital brain doesn't have that constraint. Your Memory persists as the platform. Intelligence arrives fresh through the Model API — a clean boundary that makes the model pluggable. Different models for different tasks, from any provider, swapped with a config change. The brain reconstitutes from the same Memory with upgraded intelligence every time a better model ships.

This is the superpower a digital brain has over a biological one, and it's why the architecture treats models as external intelligence rather than a component. The Model API is the mechanism that makes it real. Models are the most volatile part of AI — new capabilities, new providers, new paradigms arrive constantly. The fastest-changing part of AI is the cheapest thing to change in this system.

Models are external regardless of where they run:

Model locationHow the system calls itExternal?
Cloud API (OpenRouter, Anthropic, OpenAI)HTTPS request through Model APIYes — someone else's servers
Local (Ollama on your machine)HTTP request through Model APIYes — same interface, different endpoint. The model is third-party weights running locally.
Future (on-device, embedded)Through Model APIYes — the calling pattern is the same

This is an Architecture spec — it defines what models are at the generic, unopinionated level. Product-specific model choices (OpenRouter default, single-model V1, pricing/allowances) are implementation opinions.

Related documents: foundation-spec.md (architecture overview, links to all component specs), research/memory-tool-completeness.md (completeness proof — why the architecture needs no new components)


The Model API

The Model API is one of the system's two APIs. It defines how the Agent Loop calls models:

DirectionWhat flows
Agent Loop → ModelPrompt (system instructions + conversation history + tool definitions + context)
Model → Agent LoopStreamed completion (text + tool calls)

Prompts in, completions out. The pattern is the same regardless of which model, which provider, or what capabilities the model has. Provider-specific API formats are abstracted behind the adapter — a thin translation layer between the Agent Loop's internal interface and whatever format the provider expects. Switching models is a config change; switching providers is a config change plus an adapter swap. See adapter-spec.md §How Model Configuration Works in Practice for the concrete walkthrough.

The Model API is a pass-through — it doesn't decide what goes into the prompt (Your Memory provides instructions and context), which tools to use (the model decides), or where responses are stored (Gateway manages conversations). It connects the Agent Loop to whatever model is configured.


Why Models Are Not a Component

Every model-related concern maps to an existing component or configuration:

ConcernWhere it livesWhy not a Models component
Which model to call (including per-task selection)ConfigurationA config value, not a component
Provider routing (OpenRouter vs Anthropic vs Ollama)Agent Loop implementation or SDKHow the Agent Loop calls models is an implementation detail
Fallback if a provider is downAgent Loop implementationError handling is the Agent Loop's job
Context window awarenessAgent Loop or Model APIThe Agent Loop knows the limits of what it's calling

Nothing is left over. There is no gap that requires a dedicated component.

This follows the same pattern as Tools (D51) and Client (D57). Tools dissolved into Memory + Agent Loop + Auth. Client dissolved into Gateway + external clients. Models dissolves into the Model API + Agent Loop implementation + configuration. No operational concern requires a dedicated component.

But there's a harder question the architecture needs to answer.


Why Models Don't Dissolve Into Memory + Tools

The memory/tool binary says everything the system processes is either data (memory) or an operation (tool). Three proposed components were tested against this during the architecture interviews. Tools dissolved into Memory + Agent Loop + Auth. Client dissolved into Gateway + external clients. Why don't models dissolve the same way?

Conceptually, they do. Weights are the provider's memory — trained knowledge stored as parameters. Inference is a verb — the operation of generating a response. Noun and verb. The model decomposes.

But the architecture elevates it to an external dependency with its own API anyway. Three reasons.

Swappable intelligence needs its own boundary

If models dissolved into memory + tools, model access would flow through whatever internal mechanism the Agent Loop uses for tool execution — coupling it to Agent Loop implementation details. The Model API exists as a clean, swappable boundary separate from tool execution. That's what makes "swap your intelligence" a config change instead of a rebuild.

Storage is not computation

Weights are data — store them wherever you want. But running inference isn't a storage operation. Memory's interface is read, write, search, version — data operations. Inference is fundamentally different: take a prompt, apply learned patterns across billions of parameters, generate a response token by token. Putting inference inside Memory would add an execution capability to a data substrate. Memory would need to know how to run a model, not just store one.

Memory is inert. It sits there and waits to be read. The model is the opposite — it tells the Agent Loop what to do, decides which tools to call, what to read, what to write. It's an active participant in the execution loop, not a passive data store being accessed.

The sibling exception

Auth is the other thing the binary doesn't fully dissolve. Auth's data (policies, tokens) is memory. Auth's operations (enforcement, validation) are tools. It decomposes cleanly — but exists as a component anyway, because security can't depend on swappable intelligence. Swap to a weaker model, and your security breaks.

Both exceptions trace to the same root: making intelligence swappable. The Model needs its own API so you can swap it. Auth needs independence so swapping it doesn't break security. One cause, two consequences. See research/memory-tool-completeness.md §2 for the full argument.


Why the Model API Is a Contract, Not a Tool

The primary model can't be a tool — it is the intelligence making tool decisions. Tools are things the model decides to use. If "call the model" were itself a tool, you'd have a bootstrap problem: who decides to call it? You need intelligence to make tool decisions. You can't use a tool to call the thing that decides which tools to use.

The exception: sub-agents. Once a primary model is running, it can call a secondary model as a tool — cheap classification, summarization, a specialized task. The primary intelligence delegates, the way your brain can delegate a task to someone else. But the primary intelligence itself is not a tool. It's the thing doing the delegating.


Decisions Made

#DecisionRationale
D44Your Memory + Model = the brain — neither is complete aloneYour Memory is the persistent half (stored patterns, knowledge, skills). The model is the intelligence half. The brain reconstitutes every time — the model arrives fresh and reads Your Memory to become capable.
D63Models are not a component — they are external intelligence accessed through the Provider APIEvery model-related concern maps to existing components or configuration: provider routing (Engine implementation), API keys (configuration/Auth), model selection (configuration), fallback (Engine), context window awareness (Engine/Provider API). There is no gap. The system calls a model. It doesn't contain one.
D64The system has four components, two connectors, and three external dependenciesThe six component interviews resolved the architecture. Four components (Your Memory, Engine, Auth, Gateway), two connectors (Gateway API, Provider API), three external dependencies (Clients, Models, Tools). What started as six components and three connectors simplified as each interview revealed concerns mapping to existing components.
D108Provider API must remain a connector, not a toolIntelligence is a requirement — the model decides which tools to use, so "call the model" can't itself be a tool (circular dependency). Sub-agents are the exception: the primary model can delegate to a secondary model as a tool.
D135Memory/tool binary — if it's data, it's memory; if it's not data, it's a toolModels conceptually decompose (weights = memory, inference = tool) but are elevated to an external dependency because swappable intelligence needs a clean boundary, and storage is not computation.

Open Questions

None. Models is the thinnest spec because models are the most external thing in the system. The Model API handles the connection. Configuration handles the choices. The Agent Loop handles the calling. There's nothing else to define.


Success Criteria

  • Models are accessed exclusively through the Model API — no component depends on a specific model
  • Switching models requires only a configuration change — no code changes to Agent Loop, Your Memory, Auth, Gateway, or any client
  • The system works with any model that accepts prompts and returns completions — cloud, local, future
  • The Model API absorbs model evolution — new capabilities, new providers, new paradigms are configuration changes
  • Local models (Ollama) and cloud models (OpenRouter) are interchangeable from the Agent Loop's perspective

Changelog

DateChangeSource
2026-03-01Conciseness pass: 193→147 lines (~24%). Removed "Brain Reconstitutes" section (absorbed by intro). Collapsed Provider API subsections to two paragraphs. Not-a-Component table 8→4 rows. "Swappable intelligence" collapsed to coupling argument. "Connector Not Tool" leads with circular dependency. Source provenance removed.Dave W + Claude
2026-03-01Intro reshaped: biological vs digital brain argument integrated into "How we define models." Split brain analogy paragraph, added digital brain separation insight (biological brain fuses memory+intelligence, digital brain separates them via Provider API). Same length, stronger argument chain.Dave W + Claude
2026-03-01Reordered: Provider API definition moved before "Why" sections. "Why the Provider API Is a Connector, Not a Tool" promoted to standalone section. Reader now understands what the Provider API is before reading why it needs to exist.Dave W + Claude
2026-03-01Added "Why Models Don't Dissolve Into Memory + Tools" — binary decomposition (weights = memory, inference = tool), swappable intelligence boundary, storage vs computation, ownership case, Auth as sibling exception. D135 added to Decisions Made. Completeness doc added to Related Documents.Dave W + Claude
2026-02-27Reordered sections (why → what → how → reference). Merged "What This Document Is", "What Is a Model?", and "Models Are External" into single "How we define models" opener. Collapsed related docs table to single line. Removed "Impact on the Architecture" section (redundant with foundation-spec.md). Removed "Level 1/2/3 Distinction" section (Level 1 note in opener).Spec reorder + trim (Dave W + Claude)
2026-02-23Initial Models spec created from interview — established that Models is external intelligence, not a component. Final architecture resolved: 4 components, 2 connectors, 3 external dependencies.Models interview session (Dave W + Claude)
2026-02-25Added "Why the Provider API is a connector, not a tool" — circular dependency argument, model-calls-model exception.Foundation architecture deep dive (Dave W + Dave J)
2026-02-26Rewrote Provider API section — lead with brain analogy (intelligence is a requirement, swappable is the difference), sub-agents as when intelligence becomes a tool, circular dependency as technical confirmation.Dave W reframing session

Models are the most volatile part of AI and the most swappable part of this system. That's not a coincidence — it's the architecture working as designed. The thing that changes fastest in AI is the cheapest thing to change here. Your Memory persists. Models come and go. The brain reconstitutes every time.