# SYNTHESIS: Prompt Intelligence System Architecture

## Consultation Summary
- **Rounds:** 4 (Gemini 3.1 Pro) — Round 4 was a deep code review with full source (88K tokens)
- **Date:** 2026-03-04
- **Topic:** Should Starsend evolve from static prompt compilation into a prompt intelligence system with model-specific strategies, Flash enrichment, and learning from generation history?

## Round 4 Corrections (Code-Level Pressure Test)

Round 4 fed all 8,800 lines of source code to Gemini. Three critical bugs caught:

1. **Double-Enrichment Collision:** `keyframe_context.py` already does Flash enrichment via `build_smart_prompt()`. Adding Flash enrichment to `prompt_engine.py` would have double-enriched keyframe prompts. **Fix:** Externalize keyframe_context.py's existing system prompt into `flash_to_nbp_v1.0.txt` — don't build a second enrichment path.

2. **Database History Destruction:** Plan said add `final_prompt_text`/`seed` as new columns on `shots` table. But shots have multiple takes (JSON array). Root-level columns would be overwritten on every retry. **Fix:** Inject prompt/seed/version into each take's JSON object in the `takes` array. No schema change needed.

3. **Previz is Already Flash-Native:** Plan said previz had "no enrichment." Wrong — `previz_context.py` uses Flash as both the prompt writer AND the generator via a massive system instruction. Adding Flash enrichment would have been Flash writing a prompt for Flash. **Fix:** Skip enrichment for previz. Just move its system instruction to a versioned file.

Additional findings:
- 60+ hardcoded locations (we counted ~50 — undercounted by ~20%)
- Prompt order refactoring is HIGH effort (subject/environment tangled in `is_env` block)
- Regex coupling risk: `_visual_is_non_human()` regex could break if identity lock text changes
- Additional hidden constants: `_BEAUTY_PASS_ORGANIC_TEXTURE`, `_VISION_EXTRACTION_PROMPT`, `_KINETIC_FALLBACK`
- Legacy Recoil prompt_compiler.py: only consumers are killed Flux2 pipeline tools. Pre-Production Console never calls it. Wardrobe arcs fully migrated to Starsend's ref_selector.py.

---

## Agreed Decisions (Locked)

### D1: prompt_constants.json — Single Source of Truth
**What:** Create `starsend/config/prompt_constants.json` with canonical versions of ALL production and pre-production guard texts currently hardcoded across **8 files** (prompt_engine.py, ref_selector.py, screen_test_gen.py, generate_location_refs.py, prep_expressions.py, previz_context.py, keyframe_context.py, recoil/editors/serve.py).

**Contents — Production Constants:**
- `quality_guard` (anatomical positive embedding)
- `non_human_identity_lock` (full and short variants consolidated to ONE)
- `camera_direction_guard`
- `env_only_guard`
- `wide_shot_footer`
- `close_shot_footer`
- `medium_shot_footer`
- `film_style_suffix` (ONE canonical default)

**Contents — Pre-Production / Casting Constants:**
- `casting_camera` — Currently hardcoded as "Arri Alexa 65, 85mm f/2.8" in ref_selector.py and screen_test_gen.py (DIFFERENT from production "Arri Alexa Mini LF")
- `casting_lighting` — "5600K daylight-balanced or pure white diffusion"
- `casting_texture_human` — "Unretouched photorealism. Visible skin pores, peach fuzz, micro-imperfections, natural subsurface scattering, matte skin"
- `casting_texture_synthetic` — "Stan Winston Studio style practical effects, brushed polycarbonate, realistic weathering"
- `casting_background` — neutral gray (18%) for grids, white for final refs
- `casting_anti_airbrush` — "DO NOT AIRBRUSH" directive
- `expression_emotions` — the 9 emotion sets × 3 intensities
- `grid_diegetic_framing` — "photographic contact sheet" (NOT "character design sheet")

**Camera divergence note:** Production uses Alexa Mini LF (cinema camera for narrative frames). Pre-production uses Alexa 65 (medium format for studio reference photography). This is **intentionally different** — casting refs are studio photography, production frames are cinema. Both values must be config-driven, not hardcoded.

**Scope: 8 files, ~50+ hardcoded locations total:**
| File | Hardcoded locations | Pipeline |
|------|-------------------|----------|
| `lib/prompt_engine.py` | ~28 | Production |
| `lib/ref_selector.py` | ~6 | Pre-production (casting) |
| `tools/screen_test_gen.py` | ~5 | Pre-production (screen tests) |
| `tools/generate_location_refs.py` | ~3 | Pre-production (location refs) |
| `tools/prep_expressions.py` | ~3 | Pre-production (expressions) |
| `lib/previz_context.py` | ~3 | Production (previz) |
| `lib/keyframe_context.py` | ~4 | Production (keyframes) |
| `recoil/editors/serve.py` | ~2 | Legacy (prompt preview) |

**Depends on:** Nothing. Do first.

### D2: lexicon.json — Externalized Kinetic Descriptors
**What:** Move the 14 regex→descriptor mappings from prompt_engine.py into `starsend/config/lexicon.json`. Python detects semantic action via the patterns, then injects the corresponding descriptor as a strict instruction to Flash.

**Format:**
```json
{
  "kinetic_map": [
    {
      "pattern": "push|shov|pry|wrench|wedge|lever|haul|heav|strain|brace|forc",
      "descriptor": "muscles taut, unbalanced dynamic pose, off-axis framing"
    }
  ],
  "fallback": "natural posture, documentary framing, ambient atmosphere"
}
```

**Depends on:** D1 (so both config files ship together).

### D3: Flash Enrichment Layer (Python → Flash → Model)
**What:** Add a Flash 3.1 enrichment step between Python prompt compilation and model API calls. Covers BOTH production AND pre-production pipelines. Python handles business logic (wide-shot branching, ENV sanitization, identity resolution, kinetic injection). Flash rewrites into model-optimized prose.

**Architecture (Production — keyframes):**
```
Plan data → Python builder (business logic, lexicon injection, locked terms)
  → Intermediate payload (structured but not final prose)
  → Flash 3.1 (system prompt from flash_to_nbp_v1.0.txt)
  → Validation (LOCKED_TERMS check)
  → If fail: retry at temp=0 once, then fallback to Python output
  → Final prompt → NBP generation
```

**CORRECTED (Round 4):** `keyframe_context.py` already does Flash enrichment via `build_smart_prompt()` with a hardcoded 5-bracket system prompt. We do NOT add a second enrichment path. Instead, we externalize keyframe_context.py's existing system prompt into `flash_to_nbp_v1.0.txt` and wire it to read from `prompt_constants.json`.

**Architecture (Production — video):**
```
Plan data → Python builder (business logic, locked terms)
  → Flash 3.1 (model-specific: flash_to_kling/seeddance/veo)
  → Validation (word limit, LOCKED_TERMS)
  → Final prompt → Video model
```

**Architecture (Production — previz):**
```
SKIP FLASH ENRICHMENT. Previz is already Flash-native — previz_context.py uses Flash
as both prompt author AND image generator via a massive system instruction.
Just move the system instruction to a versioned file (flash_previz_v1.0.txt).
```

**Architecture (Pre-production — casting, screen tests):**
```
Bible + breakdown data → Python builder (character resolution, wardrobe phase, casting constants)
  → Intermediate payload (structured casting brief)
  → Flash 3.1 (casting-specific system prompt — already exists for continuity grids)
  → Validation (identity lock preserved, anti-airbrush preserved)
  → Final prompt → NBP/Flash generation
```

**Note:** ref_selector.py already HAS Flash enrichment for continuity grids (`_enrich_continuity_grid_prompt`). The change is to extend this pattern to casting grids, turnarounds, screen tests, and location refs — all of which currently use deterministic-only prompts.

**ENV-only auto-bypass (Round 4 correction):** ENV-only shots MUST bypass Flash enrichment entirely. Python sanitizes human-presence language via `_HUMAN_PATTERNS` regex, but Flash would hallucinate humans back into the prompt. Enforce at top of enrichment wrapper: `if is_env: return base_prompt`.

**Cost:** ~$0.01/shot × 1,800 production shots + ~$0.01 × ~200 casting assets = ~$20/season total

**Bypass rules:**
- Global flag: `skip_flash_enrichment: true` in project_config.json (for debugging/A/B testing)
- Auto-bypass: ENV-only shots skip Flash (LLMs hallucinate subjects into empty rooms)
- Auto-bypass: Expression matrix skips Flash (universal generic actor, no enrichment needed)

**Depends on:** D1 (prompt_constants.json), D2 (lexicon.json)

### D4: Model-Specific Flash System Prompts
**What:** Versioned system prompt files per model, with shared base instructions.

**File structure:**
```
starsend/config/prompts/
├── flash_base_instructions.txt        ← Universal rules (locked terms, lexicon, camera-artifact language)
│
│  Production (generation models):
├── flash_to_nbp_v1.0.txt             ← Externalized from keyframe_context.py build_smart_prompt()
├── flash_to_kling_i2v_v1.0.txt       ← 30 words max, action-focused
├── flash_to_kling_t2v_v1.0.txt       ← 75-100 words, balanced
├── flash_to_seeddance_v1.0.txt       ← Multi-shot JSON output, scene coherence
├── flash_to_veo_v1.0.txt             ← 1500 char max, ENV-friendly cinematic prose
├── flash_previz_v1.0.txt             ← Externalized from previz_context.py build_system_instruction()
│
│  Pre-production (casting/ref assets):
├── flash_casting_grid_v1.0.txt       ← Externalized from ref_selector.py _enrich_continuity_grid_prompt()
├── flash_casting_turnaround_v1.0.txt ← 4-angle consistency, wardrobe phase accuracy
├── flash_screen_test_v1.0.txt        ← Full-body 9:16 portraits, wardrobe detail
└── flash_location_ref_v1.0.txt       ← ENV moodboard, no people, atmosphere-focused
```

**Runtime:** `system_prompt = load("flash_base_instructions.txt") + load(f"flash_to_{model}_v{version}.txt")`

**SeedDance exception:** Flash receives the entire batch (3-8 shots) as a Scene Block and returns `response_mime_type="application/json"` — a JSON array of coherent, sequentially-aware prompts.

**Depends on:** D1, D3

### D5: NBP Prompt Order (T5-Optimized)
**What:** Restructure NBP prompt element order based on T5 text encoder attention weighting.

**Old order:** Camera → Film Stock → Subject → Action → Lighting → Emotion → Quality Guard

**New order:** Medium/Camera/Format (combined opening) → Subject + Identity Lock + Emotion → Action/Kinetic → Environment → Lighting → Quality Guards

**Key change:** Emotion moves from end to directly after subject (T5 drops late modifiers). Camera/film stock merge into a single opening clause.

**Depends on:** D4 (encoded in flash_to_nbp system prompt)

### D6: Generation Data Capture
**What:** Log prompt text, seed, and system prompt version for every generation attempt.

**~~New columns in `shots` table~~ CORRECTED (Round 4):** Do NOT alter the SQLite schema. Shots have multiple takes stored as a JSON array in the `takes` column. Adding root-level columns would overwrite on every retry, destroying history.

**Instead:** Modify `update_shot()` where `key == "append_take"` to inject into each take's JSON object:
```python
take = {
    "take_id": "take_003",
    "timestamp": 1709...,
    "prompt": "A cinematic 35mm film close-up...",    # NEW: exact prompt sent to model
    "seed": 42,                                        # NEW: generation seed
    "system_prompt_version": "v1.0",                   # NEW: Flash system prompt version
    "model": "gemini-3-pro-image-preview",
    "cost": 0.134,
    "output_path": "...",
    "status": "pending_review"
}
```

**CostTracker extension:**
- New `pass_type: "prompt_enrichment"` for Flash enrichment calls
- Logged against same shot_id/episode for total landed cost per shot

**Depends on:** Nothing. Can start Day 1.

### D7: Contextual Review UI
**What:** Upgrade Dailies review to show filmstrip context (Previous + Current + Next frames) and structured rejection tags.

**Rejection taxonomy (checkboxes):**
- Anatomy Failure (hands, proportions, deformities)
- Continuity Error (doesn't match adjacent frames)
- Camera/Composition (wrong angle, framing, movement)
- Style/Lighting (wrong color temperature, mood, grain)
- Optional free-text field

**Continuity reference:** When rejecting for continuity, UI shows adjacent frames and allows annotating which reference frame the current shot conflicts with.

**Depends on:** D6 (needs prompt logging to correlate decisions with prompts)

---

## Execution Order

| Phase | Items | Effort | Impact |
|-------|-------|--------|--------|
| **Phase 1 (Days 1-3)** | D1 (prompt_constants.json — production + pre-production) + D2 (lexicon.json) + D6 (logging) | Medium-High | Fix tech debt across 8 files (~50 locations) + enable data capture |
| **Phase 2 (Days 4-6)** | D3 (Flash enrichment — production pipeline) + D4 (production system prompts) + D5 (NBP order) | Medium-High | Production quality improvement |
| **Phase 3 (Days 7-8)** | D3 (Flash enrichment — pre-production pipeline: casting, screen tests, location refs) + D4 (casting system prompts) | Medium | Pre-production quality + consistency with production |
| **Phase 4 (Days 9+)** | D7 (review UI upgrade — filmstrip, rejection tags) | Medium | Human-in-the-loop learning |

---

## Rejected Ideas

| Idea | Why Rejected |
|------|-------------|
| Automated prompt learning from rejection patterns | Not enough data at 1,800 shots. Seed noise makes causal inference unreliable. Manual weekly review is sufficient. |
| Universal Shot Object / IR layer | Already exists as Plan Pass structured data. Adding another abstraction adds complexity without value. |
| Unifying Recoil prompt_compiler + Starsend prompt_engine | Different input contracts, output contracts, model targets. Zero overlap in models. Not worth the abstraction cost. |
| Hashing prompt versions | Unreadable. Semantic versioning (v1.0, v1.1) enables human-readable A/B comparison in SQL queries. |
| Flash enrichment for ENV-only shots | Flash hallucinates subjects into empty rooms. Deterministic Python output is better for ENV shots. |

## Model Behavior Assumptions

| Decision | Depends On | Model Behavior | Re-check Trigger |
|----------|-----------|----------------|------------------|
| T5 prompt order (D5) | NBP uses T5-XXL encoder | First 20-30 tokens weighted heaviest | New Gemini image model release |
| Seed determinism (D6) | Imagen 3 seed reproducibility | Same prompt + seed = same image | Google API changes |
| Flash writes good NBP prompts | Same-family model synergy | Flash and NBP share foundational training | New Flash or NBP version |
| SeedDance JSON output from Flash | Flash structured output | `response_mime_type="application/json"` works reliably | Flash API changes |
| ENV auto-bypass (D3) | Flash subject hallucination | LLMs insert people into empty rooms | Empirical testing |

## Files to Create/Modify

### New Files
- `starsend/config/prompt_constants.json` — ALL production + pre-production constants
- `starsend/config/lexicon.json` — Kinetic descriptor map
- `starsend/config/prompts/flash_base_instructions.txt` — Universal rules
- `starsend/config/prompts/flash_to_nbp_v1.0.txt` — Production (NBP keyframes)
- `starsend/config/prompts/flash_to_kling_i2v_v1.0.txt` — Production (Kling I2V)
- `starsend/config/prompts/flash_to_kling_t2v_v1.0.txt` — Production (Kling T2V)
- `starsend/config/prompts/flash_to_seeddance_v1.0.txt` — Production (SeedDance multi-shot)
- `starsend/config/prompts/flash_to_veo_v1.0.txt` — Production (Veo)
- `starsend/config/prompts/flash_casting_grid_v1.0.txt` — Pre-production (casting)
- `starsend/config/prompts/flash_casting_turnaround_v1.0.txt` — Pre-production (turnarounds)
- `starsend/config/prompts/flash_screen_test_v1.0.txt` — Pre-production (screen tests)
- `starsend/config/prompts/flash_location_ref_v1.0.txt` — Pre-production (location refs)

### Modified Files — Production Pipeline
- `starsend/lib/prompt_engine.py` — Replace ~28 hardcoded values with config reads; add Flash enrichment call at end of `build_prompt_from_plan()` (line 290) with ENV bypass; add validation/fallback
- `starsend/lib/previz_context.py` — Move system instruction (line 316) to `flash_previz_v1.0.txt`; replace hardcoded framing constants with config reads. NO enrichment added (Flash-native).
- `starsend/lib/keyframe_context.py` — Move `build_smart_prompt()` system prompt (line 160) to `flash_to_nbp_v1.0.txt`; replace hardcoded constants with config reads. This is the EXISTING enrichment path — centralize, don't duplicate.
- `starsend/lib/execution_store.py` — Modify `update_shot()` to inject prompt/seed/version into take JSON objects (NO schema change)
- `starsend/orchestrator/cost_tracker.py` — Add "prompt_enrichment" pass_type

### Modified Files — Pre-Production Pipeline
- `starsend/lib/ref_selector.py` — Replace ~6 hardcoded casting constants (Alexa 65, 85mm, 5600K, texture rules) with config reads; extend Flash enrichment from continuity grids to casting grids and turnarounds
- `starsend/tools/screen_test_gen.py` — Replace ~5 hardcoded constants with config reads; add Flash enrichment for phase portraits
- `starsend/tools/generate_location_refs.py` — Replace ~3 hardcoded constants with config reads; add Flash enrichment
- `starsend/tools/prep_expressions.py` — Replace ~3 hardcoded emotion/grid constants with config reads (expressions skip Flash enrichment — generic actor, no enrichment needed)

### Modified Files — Review UI
- `starsend/editors/review_server.py` — Filmstrip view endpoint, rejection tag storage
- `starsend/editors/tabs/dailies.js` — Filmstrip UI, rejection checkboxes

### Unchanged (Let Die)
- `recoil/lib/prompt_compiler.py` — Legacy pipeline. Only consumers are killed Flux2 tools. Pre-Production Console never calls it. Wardrobe arcs fully migrated to Starsend's ref_selector.py. Let it die.
- `recoil/editors/serve.py` — `/api/preview-prompt` and `/api/shot-lab` endpoints are defined but no UI module ever calls them. Dead endpoints.
- `starsend/config/model_profiles.json` — Capabilities stay as-is. Prompt strategy lives in system prompt files, not capability profiles.

### prompt_constants.json Canonical Schema (Gemini-Drafted from Round 4)

```json
{
  "production": {
    "camera_body": "Arri Alexa Mini LF",
    "film_stock": "Kodak Vision3 500T",
    "film_style_suffix": "visible grain, photorealistic",
    "quality_guard": "Correct human anatomy, anatomically correct proportions, five fingers per hand, sharp focus, clean detailed image, natural skin texture with pores",
    "wide_shot_footer": "Focus on full body silhouette, posture, and environmental scale. Facial features are indistinct at this distance. Do not attempt high-detail eyes or mouth.",
    "medium_shot_footer": "Anatomically flawless hands, exactly five fingers per hand. Natural body proportions, correct skeletal structure.",
    "close_shot_footer": "Anatomically flawless hands, perfect skeletal symmetry. Highly detailed facial features, accurate skin texture with pores.",
    "non_human_identity_lock": "This character is NOT a baseline human. Preserve whatever helmet, chassis, head covering, or non-human structure is visible in the reference. Do NOT add human hair where none exists. Do NOT infer a bare human head.",
    "camera_direction_guard": "Camera direction: Subject does not look directly into the lens.",
    "env_only_guard": "CRITICAL: This is an ENVIRONMENT-ONLY shot. ABSOLUTELY NO PEOPLE in this image."
  },
  "casting": {
    "casting_camera": "Arri Alexa 65, 85mm f/2.8",
    "casting_lighting": "5600K daylight-balanced or pure white diffusion",
    "casting_texture_human": "Unretouched photorealism. Visible skin pores, peach fuzz, micro-imperfections, natural subsurface scattering, matte skin. Cinematic makeup test.",
    "casting_texture_synthetic": "Stan Winston Studio style practical effects, brushed polycarbonate, realistic weathering, mechanical micro-details. Tangible, physical materials shot in-camera.",
    "casting_background": "18% neutral gray seamless studio backdrop",
    "casting_anti_airbrush": "DO NOT AIRBRUSH. NO ILLUSTRATION. NO 3D RENDER. NO CONCEPT ART.",
    "grid_diegetic_framing": "photographic contact sheet"
  },
  "shared": {
    "kinetic_fallback": "natural posture, documentary framing, ambient atmosphere",
    "universal_expression_subject": "A generic, bald, androgynous human actor with no distinct features, no makeup, and no styling."
  }
}
```

### Risk Mitigations (from Round 4)

| Risk | Severity | Mitigation |
|------|----------|------------|
| Double-enrichment (keyframe_context.py + prompt_engine.py) | HIGH | Externalize keyframe_context's existing Flash call — don't add a second one |
| Take history destruction (SQLite schema change) | HIGH | Inject into takes JSON array, don't add root columns |
| ENV Flash hallucination (humans reappearing) | MEDIUM | Auto-bypass: `if is_env: return base_prompt` |
| Regex coupling (`_visual_is_non_human`) | MEDIUM | Keep detection regex separate from display text; change display text in constants, leave detection patterns in code |
| Prompt order refactoring breaks `is_env` block | MEDIUM | Extract to parts dict first, then reorder — don't just swap `sections.append()` calls |