# Real Anthropic SDK + prompt caching for /api/chat — deferred to CP-N+

`/api/chat` currently returns canned stub frames: a fixed greeting, a fixed thinking block, a fixed assistant message, all wrapped in the eventually-correct streaming SSE envelope shape so the frontend can be exercised end-to-end. The frontend's events drawer, slash-dispatch surface, and command palette are editorially functional today against this stub: the user can type, send, see the loop close, and watch BUS events route through the chat panel exactly as they will when the real LLM round-trip lands.

The audit flagged this in Cluster 5B as one of three "wired UI, stubbed engine" surfaces (chat being the largest of the three). The Console v2 fix scope explicitly excludes the LLM round-trip itself — the failure modes the fix targets (Cluster 1: hierarchy unwalkable, Cluster 2: chrome-as-fixture, Cluster 4: Python registry asymmetry) are upstream of the chat surface and would not be improved by wiring real Anthropic SDK calls.

The deferred work, when a CP picks it up:

1. Replace the stub generator in `recoil/api/chat.py` with `anthropic.AsyncAnthropic` calls.
2. Add prompt caching via `cache_control: {type: "ephemeral"}` on the system prompt + tool definitions (the engine-memory + bible payloads are large and stable across a session — the cache pays for itself within one back-and-forth).
3. Wire `ANTHROPIC_API_KEY` through `recoil.core.config` (already exists for other surfaces) — no new env var, no new credential shape.
4. Surface model selection (`claude-opus-4-7`, `claude-sonnet-4-6`) via the existing slash command `/model` whitelist (Cluster 5B / `jt-priority-slash-commands.md` will grow this command at the same time).
5. Stream tokens through the existing SSE wire shape — the frontend already reads the right envelope.

ADR-0006 (EventBus contract Day 1, SSE Phase 19) is the structural precedent: the contract was defined upfront, the transport landed later, no consumer change was required. The chat surface follows the same shape — wire is defined, stub speaks the wire, real engine swaps in later.

Cited: console-v2-audit-2026-05-04 Cluster 5B. See also `proposal-execution-glue.md` (sibling deferral for the proposal execution paths) and `jt-priority-slash-commands.md` (sibling deferral for the slash-dispatch wiring). All three are the same shape of "frontend is real, server stub is the next CP."
