# Backlog

> Last current as of: 2026-02-28 (verify before relying on for current architecture)

Upcoming changes, bugs, and improvements to track.

---

## Bugs

- [ ] **`recoil/lib/` and `recoil/pipeline/lib/` package-name collision** (latent, surfaced 2026-05-03)
  - *Reported:* 2026-05-03 (during `consult.py` post-Phase-D fix)
  - *Symptom:* Three tools fail at `from lib.recoil_bridge` / `from lib.ref_resolver`: `pipeline/tools/batch_gate2_test.py`, `pipeline/tools/migrate_refs.py`, `pipeline/tools/recoil_checks/execution_health.py`. The bootstrap path is fixed (post the consult.py root-cause fix), but these tools then fail at the next layer.
  - *Root cause:* Both `recoil/lib/` and `recoil/pipeline/lib/` exist with overlapping file names (e.g., both contain `config_schema.py`). `recoil/lib/` is a Python namespace package (no `__init__.py`); `recoil/pipeline/lib/` is a regular package. When `core/paths.py` imports `from lib.config_schema` at module-load time, Python locks `lib` to whichever resolves first based on sys.path order. Once locked, files that exist in only the OTHER `lib/` (e.g., `recoil_bridge.py` only in `pipeline/lib/`) become unreachable.
  - *Why it's been latent:* Tools that don't reach the failing imports work fine. The collision only triggers when a tool needs files from BOTH `lib/` directories in the same process. Phase D's caller migration created the conditions where this could surface but didn't itself cause it.
  - *Fix options (decision required, not just code):*
    1. **Rename one of the two `lib/` directories.** Cleanest. Suggested: `recoil/lib/` → `recoil/recoil_lib/`. ~Few dozen file edits to update imports.
    2. **Add `__init__.py` to `recoil/lib/`** AND adopt a discipline that no file name overlaps. Doesn't really solve the collision — overlapping names like `config_schema.py` still pick one based on sys.path order.
    3. **Move all imports to fully-qualified form** (`from recoil.lib.X` vs `from recoil.pipeline.lib.X`). Requires `recoil/__init__.py` and ~100 file edits.
  - *Recommendation:* (1) is the right structural fix. Worth its own /consult round to size the blast radius before harnessing. Flag for the architectural-laws SYNTHESIS as a candidate "no overlapping package names across major directories" anti-pattern.
  - *Workaround until fixed:* `consult.py` and `single_take_sh03.py` work fine. The other 3 tools were already broken on this axis before — bootstrap fix didn't make them worse.

- [x] **`/autogenerate` stops after batch completion instead of continuing**
  - *Reported:* 2026-01-20
  - *Fixed:* 2026-01-21
  - *Solution:* Added "AUTONOMOUS GENERATION LOOP" section with explicit loop instruction and "DO NOT STOP" guidance
  - *Location:* `.claude/skills/autogenerate/SKILL.md`

- [x] **`/generate` and `/autogenerate` don't explicitly call `/load-context` at start**
  - *Reported:* 2026-01-20
  - *Fixed:* 2026-01-21
  - *Solution:* Added mandatory `/load-context [project] generate` invocation as Step 1/3 in both skills
  - *Location:* `.claude/skills/generate/SKILL.md`, `.claude/skills/autogenerate/SKILL.md`

- [x] **`/autogenerate` has no mandatory context reload between batches**
  - *Reported:* 2026-01-20
  - *Fixed:* 2026-01-21
  - *Solution:* Added "MANDATORY CONTEXT RELOAD BETWEEN BATCHES" section requiring re-read of treatment.md, characters.md, format_v12/SKILL.md after each batch
  - *Location:* `.claude/skills/autogenerate/SKILL.md`

- [ ] **`validate_behavioral_dna.py` character name regex overwrites characters with similar prefixes**
  - *Reported:* 2026-01-21
  - *Description:* The header regex `^##\s+([A-Z][A-Za-z0-9\s\-\_]+?)(?:\s*[-—].*)?$` uses non-greedy `+?` which captures only "DAVID" from both `## DAVID — The Prodigal Son` and `## DAVID-AS-JASON — The Mirror`. Since both match as "DAVID", the second section overwrites the first in the characters dict.
  - *Impact:* DAVID-AS-JASON's content is labeled as DAVID, causing DAVID's actual behavioral DNA (signature line, orthogonal trait, contradiction) to be incorrectly flagged as missing.
  - *Workaround:* Rename character sections to avoid prefix collisions (e.g., `## JASON (DAVID IN LOOP)` instead of `## DAVID-AS-JASON`)
  - *Suggested Fix:* Change regex to greedy `+` or add logic to treat hyphenated names as distinct characters
  - *Location:* `.claude/hooks/validate_behavioral_dna.py` (line 127)

---

## Improvements

- [ ] **Quality gate continuity check produces false positives due to string matching**
  - *Reported:* 2026-01-22
  - *Description:* The continuity check in `validate_batch.py` compares the `**NEXT:**` line from episode N with the hook text of episode N+1 using simple string comparison. It flags "possible continuity break" when the strings don't textually match, even when continuity is semantically correct.
  - *Example:* Episode 10's NEXT says `"The bird — Audrey's first wonder"` and Episode 11's hook is `"EXT. ABANDONED FARMHOUSE - DAWN"`. The validator flags this as a break, but the episode IS about Audrey seeing a bird — the slugline just describes WHERE it happens, not WHAT happens.
  - *Impact:* Generates noisy warnings during checkpoint validation that must be mentally filtered as false positives.
  - *Suggested Fix Options:*
    1. Remove the continuity check entirely (it's not catching real issues)
    2. Make it an optional/verbose-only warning
    3. Implement semantic comparison (compare NEXT content to full hook section, not just slugline)
    4. Check for keyword overlap between NEXT and the first 50 words of the episode
  - *Location:* `.claude/hooks/validate_batch.py` (quality gate, continuity check section)

- [x] **Consolidate HTML editors into a single app**
  - *Reported:* 2026-02-01
  - *Completed:* 2026-02-09
  - *Description:* Replace separate HTML editor files with a single SPA. Shared sidebar navigation, shared project context, URL routing per tool.
  - *Solution:* Built **Production Console** (`editors/production_console.html`) — unified 6-tab app (Grammar, Breakdown, Storyboard, Shotlist, Revision, Dailies). Shared state event bus (`modules/app.js`), API client (`modules/api.js`), per-tab modules, CSS variables theme (`styles/console.css`). Grammar tab shows corpus grammar targets from Visual Grammar Bible research. Standalone editors remain functional at original URLs for backward compatibility. 3 new serve.py API endpoints: `/api/project/<name>/corpus-summary`, `/score/<episode>`, `/pipeline-status`.
  - *Location:* `editors/production_console.html`, `editors/modules/`, `editors/styles/`

- [ ] **Panel Fix Tool — selective inpainting editor for triptych/grid images**
  - *Reported:* 2026-02-06
  - *Description:* When generating triptych strips (3 panels) or multi-camera grids (4 panels), individual panels sometimes need correction (wrong action physics, character facing wrong direction, etc.). Build an HTML editor where the user can: (1) load a triptych/grid image, (2) click which panel is wrong, (3) optionally edit the prompt for that panel, (4) click "Regenerate Panel" which auto-generates a mask for that panel region and runs ComfyUI inpainting (`SetLatentNoiseMask`) on just the masked area, preserving the good panels pixel-identical. Should integrate with the existing Editor Hub (`serve.py` on port 8420). Mask generation is trivial — divide image into equal regions (thirds for triptych, quadrants for grid) and mask the selected region.
  - *Context:* See `INNOVATIONS.md` — Triptych Strip Generation (Innovation #4) and Panel Fix section (Innovation #5)
  - *Priority:* High — this is the correction loop for the core generation pipeline
  - *Location:* Production Console → Shotlist tab (`editors/_standalone/shotlist_editor.html`)

- [x] **Rename `templates/dev/` to `dev_templates/`**
  - *Reported:* 2026-02-02
  - *Completed:* 2026-02-06
  - *Description:* The `dev/` subfolder inside `templates/` is ambiguous — could mean "development environment" or anything else. Should be `dev_templates/` to match the naming pattern of sibling files (`series_bible_template.md`, `episode_arc_template.md`, etc.).
  - *Location:* `templates/dev_templates/`

- [x] **Retrain Jinx LoRA — remove skeletal training images**
  - *Reported:* 2026-02-07
  - *Completed:* 2026-02-07 (Z-Image retrain submitted)
  - *Description:* Actual training data was 28 scene stills from `Jinx_Lora_training_stills/` (NOT the white-bg `images/` folder). 7 images pulled: 4 cryo chamber stills with skeletal corpses in chairs (#18, 20, 21, 23), 3 off-setting stills (#16 saw-blade drones, #17/#19 desert). Clean dataset: 21 images. Submitted to Z-Image Turbo Trainer V2 at 2000 steps ($1.70). Also A/B testing Z-Image vs Flux 2 for anatomy quality.
  - *Request ID:* `ea2f6923-2c8d-45e7-b1a3-64dbc071c8c0`
  - *Location:* `~/Desktop/jinx_lora_training/z_image_clean/`, `leviathan/visual/lora_registry.json`

- [ ] **Build visual QC gate system — anatomy, eye-line, consistency, character ID**
  - *Reported:* 2026-02-07
  - *Priority:* HIGH — without automated QC, every frame requires manual human review for deformations
  - *Description:* Expand the existing `visual_gate.py` (Gate 1: artifacts, Gate 2: semantic alignment) into a comprehensive multi-gate visual QC system. New gates needed:
    - **Gate: Anatomy** — Gemini 2.5 Flash vision check for: correct limb count (2 arms, 2 legs), proportional body (no skeletal/stretched limbs), natural hand rendering (5 fingers, correct wrist attachment), correct head/neck/torso proportions. Score 1-10, auto-reject < 5.
    - **Gate: Eye-line & Screen Direction** — Does character look where prompt specifies? Does camera angle match metadata (low/high/eye-level)? Character facing correct direction for scene blocking? Score 1-10.
    - **Gate: First/Last Consistency** — Pairwise comparison of first_frame and last_frame for same shot: same character identity, same costume, same location, logical physical progression. Flag identity drift > threshold.
    - **Gate: Character ID vs Reference** — Compare generated frame against hero_portrait reference image. Is this the same person? Same hair color/length, same facial structure, same key props (debt counter, rebreather)?
  - *Architecture:* Each gate = separate function in `visual_gate.py`, called sequentially. Results aggregate into per-shot JSON with gate scores, overall pass/fail, and specific failure descriptions. Integrate with dailies editor for color-coded status display.
  - *Model:* Gemini 2.5 Flash vision (free tier sufficient for QC volume). Structured JSON output with rubric scoring.
  - *Location:* `tools/visual_gate.py`, Production Console → Dailies tab (`editors/_standalone/dailies_editor.html`)

- [ ] **Evaluate T2I model alternatives to Flux 2 for anatomy quality**
  - *Reported:* 2026-02-07
  - *Priority:* Medium — Flux 2 has consistently worse anatomy than Z-Image/Lumina 2
  - *Description:* Flux 2 produces more hand deformations, eye-line errors, and leg placement problems than Z-Image Turbo (Lumina 2 architecture). Z-Image handles anatomy better but doesn't support LoRA. Investigate: (1) Does fal.ai support LoRA on any Lumina/Z-Image variant? (2) SDXL + LoRA as anatomy-safe alternative? (3) Hybrid approach: Z-Image for wide/body shots, Flux 2+LoRA for face close-ups? (4) IP-Adapter as LoRA alternative for character lock on non-Flux models? (5) Wait for Flux 3 which may fix anatomy issues?
  - *Location:* `tools/generate_storyboard_keyframes.py`

- [x] **Re-submit Kian T2I LoRA training on fal.ai**
  - *Reported:* 2026-02-06
  - *Completed:* 2026-02-07
  - *Description:* Kian Flux 2 LoRA training completed. 317MB, 7 images, 1000 steps. Solo shots (14, 16, 19) render well. Dual-character shots work at 0.5/0.5 LoRA scale (total 1.0). Scale above 1.3 total causes artifact corruption.
  - *Location:* `~/Desktop/kian_lora_training/kian_lora_v1.safetensors`, `leviathan/visual/lora_registry.json`

- [ ] **Test Jinx WAN 2.2 video LoRA with FLF generation**
  - *Reported:* 2026-02-06
  - *Priority:* High — validates video LoRA pipeline before scaling to more characters
  - *Description:* Jinx video LoRA (dual high_noise + low_noise adapters) trained successfully on 2026-02-06 but never tested due to fal.ai balance exhaustion. Balance now topped off. Need to run a test: generate a Jinx keyframe via Flux 2+LoRA, then run WAN 2.2 FLF with video LoRA applied. Compare character identity drift with vs without video LoRA.
  - *Endpoint:* `fal-ai/wan/v2.2-a14b/image-to-video/lora` with `loras=[{path: HIGH_URL, transformer: "high"}, {path: LOW_URL, transformer: "low"}]`
  - *Location:* URLs in `leviathan/visual/lora_registry.json` under `jinx.video`

- [ ] **Spatial Blocking System — scene geography + 180° line enforcement**
  - *Reported:* 2026-02-07
  - *Priority:* HIGH — without spatial metadata, every shot generates with random character orientation
  - *Description:* Current pipeline has zero spatial data — each shot's prompt describes characters independently with no reference to scene geography. This causes: (1) screen direction violations (character facing wrong way between shots), (2) 180° rule violations (hands entering from wrong side of frame), (3) spatial incoherence in multi-shot sequences (Jinx faces away from Kian when she should face him). Need a **scene blocking schema** that defines per-scene: character positions, facing directions, line of action, camera side. Then a **spatial prompt prefix** system that prepends explicit direction to every T2I prompt (e.g., "Screen direction: subject faces camera-right. Camera low angle looking up."). The storyboard agent should generate blocking metadata, and the prompt builder should enforce it.
  - *Architecture:* Scene block JSON → per-shot spatial directives → prompt prefix → generation. QC gate checks screen direction compliance after generation.
  - *Reference:* 180° rule in cinema — imaginary line between two subjects, camera stays on one side. If character A is screen-left in one shot, they stay screen-left in all shots of that scene.
  - *Location:* `templates/storyboard_schema.json` (add blocking fields), `tools/generate_storyboard_keyframes.py` (spatial prefix), `agents/storyboard_agent.md` (blocking generation)

- [ ] **Prompt Engine — systematic T2I prompt builder with validation**
  - *Reported:* 2026-02-07
  - *Priority:* HIGH — prompts are currently ad-hoc prose with no systematic template or validation
  - *Description:* Build a core prompt engine that acts as a source of truth for T2I prompt construction. Should be a template-based system that: (1) takes structured shot metadata (character, action, emotion, camera, spatial direction) and builds a prompt from a consistent template, (2) enforces rules: active verbs over static descriptions, peak-action frozen moments for triptychs, no VFX/digital overlay language (HUDs, data streams, reticles — those go in motion prompts only), camera/film stock always included, (3) validates prompts before generation: word count 150-180, no LoRA triggers in prose body, no HEX codes, spatial prefix present. The template should be inspectable, debuggable, and intentionally departable — if we need to break a rule for a specific shot, we can, but by default every prompt follows the same structure.
  - *Rules to enforce:* Action prompts with kinetic verbs (not posed descriptions), explicit full-body grounding, no "targeting reticles/data streams/holographic" VFX language in T2I prompts, spatial direction prefix from blocking metadata, camera/lens/film stock DNA
  - *Validation hooks:* Pre-generation check that prompt meets all rules. Post-generation QC that verifies the rendered image matches spatial directives.
  - *Location:* `tools/prompt_engine.py` (new), integrated into `generate_storyboard_keyframes.py`

- [x] **Dailies Editor — take comparison + annotation export**
  - *Reported:* 2026-02-07
  - *Completed:* 2026-02-07
  - *Description:* Side-by-side take comparison (all takes visible simultaneously as columns), per-take annotations with tag system (approved/retake/fix-prompt/fix-lora/custom), generation metadata display (endpoint, seed, steps, guidance, LoRA scales, full prompt) from cumulative manifest.json, export retake annotations as JSON for batch regeneration. Also: cumulative manifest (new takes merge, don't overwrite), mp4/html MIME types in serve.py.
  - *Location:* Production Console → Dailies tab (`editors/_standalone/dailies_editor.html`), `editors/serve.py`, `tools/generate_storyboard_keyframes.py` (cumulative manifest)

- [ ] **Named Reference Image Composition — research + integration**
  - *Reported:* 2026-02-07
  - *Priority:* Medium — could solve spatial composition for multi-character shots
  - *Description:* Some models (Kling, Midjourney, potentially Flux Kontext) allow assigning names to reference images and using those names in prompts to control spatial composition (e.g., "image_1 on camera left, image_2 in background"). Research which models support this via API (not just web UI), whether they support LoRA simultaneously, and whether they can handle cinematic spatial directives. If viable, integrate as an alternative generation path for complex two-character compositions that pure text prompting struggles with.
  - *Context:* Dual-LoRA works at 0.5/0.5 total scale but spatial control is still limited by text-only prompting.

- [ ] **Dynamic Action Prompts — enforce kinetic prose for keyframes**
  - *Reported:* 2026-02-07
  - *Priority:* HIGH — current prompts produce static/posed images instead of kinetic action
  - *Description:* Early E-style prompting tests produced much more dynamic shots (different angles, kinetic action, less posed). Current pipeline prompts describe static positions instead of frozen peak-action moments. Need to enforce: (1) All triptych hero frames MUST use peak-action verbs (lunges, pivots, slams, catches, etc.), (2) Standard FLF first/last frames must describe the START and END of motion, not a posed middle, (3) Validation hook checks for static verbs (stands, holds, looks, sits) and flags them. This should be a rule in the prompt engine with validation enforcement, not just a guideline.
  - *Location:* `tools/prompt_engine.py`, `.claude/hooks/` (validation)

- [ ] **Middle frame necessity testing — does every shot need 3 keyframes?**
  - *Reported:* 2026-02-06
  - *Priority:* Medium — informs generation cost and quality tradeoffs
  - *Description:* Not every shot may benefit from 3-keyframe (first/middle/last) generation. Observation areas: (1) Very short action shots (fight hits, impacts < 1s) may only need first frame — no interpolation gap to fill. (2) Standard dialogue/motion: does adding a middle frame always improve quality over first+last only, or only for complex action/occlusion? (3) Speed factor: fast action with simple A-to-B motion may decohere MORE with a constrained middle keyframe than with free interpolation. Test: generate same shot with 2-keyframe FLF vs 3-keyframe split FLF, compare quality. Start with held-frame-push and standard-flf shots to see if middle frame adds value outside triptych territory.

- [ ] **WAN 2.2 Video LoRA — frontier model training data pipeline**
  - *Reported:* 2026-02-06
  - *Priority:* Low — test in next few weeks, not blocking production
  - *Description:* Train WAN 2.2 character video LoRAs (dual high_noise + low_noise adapters) using training clips generated by frontier video models (Sora 2, Veo 3, Kling 3) mixed with WAN 2.2 FML outputs as aesthetic anchors. This gives character identity lock in video generation, not just T2I. Frontier clips provide high-quality motion diversity; WAN 2.2 FML clips prevent overfitting to aesthetics the target model can't reproduce.
  - *Test first:* Before adding to pipeline permanently, run ONE proof-of-concept with Jinx: generate ~30 frontier I2V clips from LoRA-locked Flux 2 keyframes + ~10 WAN 2.2 FML clips → train via fal.ai or AI Toolkit → compare identity consistency vs base WAN 2.2. If the test shows measurable improvement in character lock during video generation, proceed to full pipeline.
  - *Build after test passes:* Reusable script/workflow that takes any character's hero images + LoRA-locked keyframes as input and outputs a trained WAN 2.2 video LoRA. Should handle: (1) generating varied keyframes from hero images, (2) sending to frontier I2V APIs for training clip generation, (3) generating WAN 2.2 FML anchor clips, (4) curating/filtering clips where identity drifts, (5) submitting to video LoRA trainer. Goal: any new character gets a video LoRA from their existing T2I LoRA + reference images.
  - *Estimated cost:* ~$25-60 per character (frontier I2V clips + training)
  - *Training requirements:* 150-200+ clips, 41-81 frames each. AI Toolkit and WaveSpeedAI trainers support WAN 2.2 video LoRA.
  - *Context:* See memory notes on WAN 2.2 FLF and LoRA training

---

## Visual Pipeline Rethink (from inbox 2026-02-08)

- [ ] **Unified Pre-Production Workflow — multi-tab Resolve-style interface**
  - *Reported:* 2026-02-08
  - *Priority:* HIGH — architectural
  - *Description:* Storyboard, shotlist, breakdown editors are connected but built ad-hoc. Need a cohesive, automated pipeline with intuitive human intervention points. Vision: multi-tab interface (like DaVinci Resolve) where changes ripple through stages automatically. Flag at any step → changes propagate upstream/downstream through breakdown → storyboard → shotlist → keyframes → video.
  - *Related:* Consolidate HTML editors backlog item (intermediate step already started with Editor Hub)

- [ ] **Script-to-Shot Translation Engine — deterministic emotional cinematography**
  - *Reported:* 2026-02-08
  - *Priority:* HIGH — core gap
  - *Description:* Build a deterministic process for translating scripts into emotionally compelling shots. Study anime grammar/pacing as a model for the right visual device at each dramatic moment. Need to distill patterns from film/television/anime into a codified grammar: what shot type, camera angle, movement, and framing conveys what emotion at what story beat. Currently the storyboard agent makes these decisions but without a systematic visual grammar.

- [ ] **Directing Styles — parameterized genre-specific visual translation**
  - *Reported:* 2026-02-08
  - *Priority:* Medium — depends on script-to-shot translation engine
  - *Description:* Once the script-to-shot translation is deterministic, parameterize it into directing styles. Same script → different shot plans depending on genre (action thriller, sci-fi horror, drama, etc.). Each style defines preferred shot types, pacing rhythms, camera movement vocabulary, and lighting philosophy.

- [ ] **Hybrid Actor Workflow — test and integrate live-action performance capture**
  - *Reported:* 2026-02-08
  - *Priority:* Medium — parallel track
  - *Description:* Test the hybrid actor workflow: actor footage (webcam/phone) → DWPose skeletal extraction → WAN 2.2 Animate Replace mode → AI character with actor's performance. Already researched (WAN 2.2 Animate $0.04-0.08/s, no LoRA support in Replace mode). Need end-to-end test with real actor footage.

- [ ] **Content Slate Generation — research successful properties, generate inspired scripts**
  - *Reported:* 2026-02-08
  - *Priority:* Low — parallel creative track
  - *Description:* Research most successful properties in action, sci-fi, thriller genres. Come up with inspired story concepts (e.g., "Indiana Jones in space") and generate a slate of genuinely decent project pitches to fill the production pipeline. Run through /develop for each.

- [ ] **Shot-to-Shot Cutting Grammar — how to prompt shots that cut together**
  - *Reported:* 2026-02-08
  - *Priority:* HIGH — core gap
  - *Description:* The fundamental unsolved problem: how do you prompt individual shots so they cut together into a coherent sequence that builds pacing and emotional immersion? This is the gap between generating good individual frames and generating a watchable scene. Needs research into editing grammar (continuity editing, match cuts, eye-line match, action/reaction), then translation into prompt-level constraints and validation hooks.

- [x] **Visual Grammar Bible — cinematic grammar codification (Phase 1+2 COMPLETE)**
  - *Reported:* 2026-02-09
  - *Completed:* 2026-02-09 (Phase 1 corpus + Phase 2 findings)
  - *Description:* Research pipeline to extract, codify, and test cinematic grammar patterns for storyboard generation.
  - *Results:* 33 scenes, 1,955 shots, 14 microdrama (4 series), 13 anime, 5 cinema. Two-Peak confirmed at 93% prevalence. 10 production rules derived. FINDINGS.md written as shareable research document. Grammar tab in Production Console visualizes corpus grammar targets.
  - *Tool:* `tools/analyze_reference_scene.py` (7 modes: --file, --url, --batch, --validate, --summary, --patterns, --detect-episodes)
  - *Corpus:* `_research/visual_grammar_bible/corpus/` (33 scenes)
  - *Findings:* `_research/visual_grammar_bible/FINDINGS.md`
  - *Remaining:* Phase 3 (draft Visual Grammar Bible for Leviathan), Phase 4 (A/B test), Phase 5 (engine integration)

- [ ] **3.7.21 Rule — microdrama opening engagement pattern**
  - *Reported:* 2026-02-09
  - *Priority:* Medium — integrate after Visual Grammar Bible Phase 1 validates
  - *Description:* From Davenport production breakdown: By second 3 = visual hook (theme), by second 7 = verbal hook (character states theme), by second 21 = emotional engagement (addiction trigger). Maps to opening funnel before the Two-Peak structure (Spike/Button). Consider: should this become a timing constraint in the Kill Box structure? Does Leviathan need tighter opening hooks? Current episodes read more like feature film scenes than microdrama hooks.
  - *Source:* "All the W's of Micro-Dramas" (C. Neil Davenport, Medium, Oct 2025)
  - *Related:* Visual Grammar Bible Phase 1, Kill Box beat timing

- [ ] **Microdrama licensing slate — conventional tone series for platform sales**
  - *Reported:* 2026-02-09
  - *Priority:* Medium — parallel creative track
  - *Description:* Even if Recoil's aspirational projects (Leviathan, ASI Bridge) use anime/K-drama aesthetics, the licensing strategy for ReelShort/DramaBox/FlexTV may require series in the conventional microdrama mold: romance/revenge/betrayal, PG-13, female protagonist, melodramatic tone. Worth exploring what Leviathan would look like with Two-Peak + 3.7.21 imposed, and developing 2-3 conventional-tone series as "foot in the door" revenue alongside the differentiated properties.
  - *Context:* Davenport breakdown confirms: female 20-60 demo, $30K-$300K budgets, non-union, 60-70 eps/season, freemium monetization ($15-$50/season). Prediction: action and horror are next big genres.

- [ ] **Prompt Engine Workflow Integration — modular, testable, maintainable**
  - *Reported:* 2026-02-08
  - *Priority:* HIGH — already built (`tools/prompt_engine.py`) but needs testing, refinement, and proper integration
  - *Description:* The prompt engine (10-layer system) exists but needs: (1) systematic testing with real storyboard data, (2) refinement based on generation results, (3) proper integration into the storyboard→keyframe pipeline, (4) maintenance protocol for model updates. Should be modular like the still and video engines — swappable layers, configurable per model.

---

## Tech Debt — Deferred from engine-fix Phase D (2026-05-03)

These items were captured in the Phase D build log's "Items deferred to follow-up CP" section and are tracked here so they don't get lost. None blocked Phase D closure; all need follow-on CPs to address.

- [ ] **DEBT-3 helper consolidation in `sidecar.py` + `sidecar_writer.py`**
  - *Reported:* Phase D build log (2026-05-03)
  - *Description:* The two modules share several helpers via duplication. `SIDECAR_VALID_SOURCES` is duplicated as a frozenset in both places. Phase D scope didn't permit body consolidation; deferred to a follow-on CP.
  - *Suggested fix:* Lift the duplicated helpers + `SIDECAR_VALID_SOURCES` into a third module (`workspace/_sidecar_internal.py` or similar) and re-import from both. Delete the duplicates.
  - *Location:* `recoil/workspace/sidecar.py`, `recoil/core/sidecar_writer.py`

- [ ] **`reload` name collision in `core/model_profiles.py` + `core/prompt_config.py`**
  - *Reported:* Phase D build log (2026-05-03)
  - *Description:* Both modules export a top-level `reload` function. If anything ever does `from core.model_profiles import *` followed by `from core.prompt_config import *` (or vice versa), the second `reload` silently shadows the first. Latent — no current caller does this — but the collision is a Tenet-5 boundary smell.
  - *Suggested fix:* Rename one of them (`reload_model_profiles`, `reload_prompt_config`) and update callers. Keep the old names as one-cycle deprecation shims.
  - *Location:* `recoil/core/model_profiles.py`, `recoil/core/prompt_config.py`

- [ ] **Dead `SidecarFieldError` + `_SIDECAR_EXTRA_ALLOWED` in `workspace/sidecar.py`**
  - *Reported:* Phase D build log (2026-05-03), post-Phase-4b
  - *Description:* Both symbols became unused after Phase 4b's sidecar refactor. Phase D scope didn't permit deletion; deferred for cleanup.
  - *Suggested fix:* Delete the symbols + any lingering references. `git grep` across the repo to confirm zero usages first.
  - *Location:* `recoil/workspace/sidecar.py`

- [ ] **Pre-existing 289 ruff errors (consistent across Phase D baseline)**
  - *Reported:* Phase D build log (2026-05-03), Phase 9 verification
  - *Description:* `ruff check` reports 289 errors against the post-D tree. Same count as the pre-D baseline — Phase D introduced zero new lint errors. The errors are pre-existing F401 (unused imports) and E402 (imports not at top of file) issues across the codebase.
  - *Suggested fix:* A dedicated lint-cleanup CP. Not urgent — these are warnings, not bugs — but the gradient should bend toward zero, not stay flat.
  - *Tools:* `ruff check recoil/`, fix in batches by error type.

- [ ] **EvalRegistry test migration off `_ClassOrInstanceMethod` descriptor**
  - *Reported:* Phase D build log (2026-05-03)
  - *Description:* `EvalRegistry` tests use class-level static-style calls that require a custom `_ClassOrInstanceMethod` descriptor. The descriptor exists solely to satisfy these tests. Migrating the tests to use `_default_eval_registry` directly would let us delete the descriptor entirely.
  - *Suggested fix:* Update the tests to instantiate `_default_eval_registry` (or a fresh `EvalRegistry()`) and call instance methods. Delete the `_ClassOrInstanceMethod` descriptor.

- [ ] **`consult.py` sys.path order issue**
  - *Reported:* Phase D build log (2026-05-03) + earlier sessions
  - *Status:* Root cause is the `lib/` package-name collision tracked under **Bugs** at the top of this file. The bootstrap path itself was fixed in a same-day patch — `consult.py` works again. The remaining `lib/` blast radius (3 other tools) is the bigger open question.
  - *Cross-reference:* See "`recoil/lib/` and `recoil/pipeline/lib/` package-name collision" entry above.

---

## Completed

- [x] **`/autogenerate` stops after batch completion** — Fixed 2026-01-21
- [x] **`/generate` and `/autogenerate` don't call `/load-context`** — Fixed 2026-01-21
- [x] **`/autogenerate` missing mandatory context reload** — Fixed 2026-01-21
- [x] **Documentation audit and WORKFLOW_GUIDE.md update** — Completed 2026-01-26
  - Added Hooks Configuration section
  - Added `/promote` and `/autogenerate` to command reference
  - Reorganized Python Scripts into Validation/Workflow/Utility categories
  - Verified all values consistent with CONSTANTS.md (validate_docs.py PASSED)

- [x] **Visual Gate QC Pipeline — two-gate automated frame review** — Completed 2026-02-06
  - `visual_gate.py` — Gate 1 (artifact detection) + Gate 2 (semantic alignment via Gemini 2.5 Flash vision)
  - `storyboard_review.html` — gate-aware frame review tool: auto-loads gate scores, pre-fills statuses/issues, "Needs Review" filter, human override
  - `serve.py` — POST `/api/project/<name>/run-gate/<episode>` route to trigger gate batch from review tool
  - Output: `storyboards/reviews/visual_gate_ep_NNN.json` per episode
  - Auto-pass (all >= 8), auto-reject (any < 5), edge_case (mixed) routing
  - Phase 2 planned: Gate 3 multi-agent edge case resolution, continuity gate, pairwise comparison

- [x] **EP1 Storyboard Pipeline Upgrade + Naming Convention + Dailies Editor** — Completed 2026-02-06
  - `asset_naming.py` — Naming convention utility with `{PRJ}_EP{NNN}_S{NN}_T{NN}_{CHAR}[_{suffix}].{ext}` format, project code from ORCHESTRATION.md
  - Storyboard schema v3 — `generation_approach`, `hero_frame`, `triptych_prompt`, `asset_name`, `characters_in_shot` fields
  - EP001 storyboard upgraded — all 21 shots classified (6 triptych, 9 standard FLF, 4 held push, 2 held static), 6 E-style hero prompts, 6 triptych strip prompts, all motion prompts upgraded with timing cues
  - `generate_storyboard_keyframes.py` — triptych strip generation + auto-split, asset naming paths, flat output directory
  - `dailies_editor.html` — Timeline panel, shot card, asset gallery, thumbnail strip, lightbox, keyboard shortcuts, status workflow
  - `serve.py` — `GET /api/project/<name>/dailies/<episode>` route for asset scanning
  - `index.html` — Dailies button added to project cards

- [x] **Editor Hub — local dev server + landing page for visual editors** — Completed 2026-02-01
  - serve.py (Python stdlib HTTP server, 127.0.0.1:8420) with project scanning and data APIs
  - index.html landing page with project card grid and editor launch buttons
  - Auto-load snippets added to all 4 editors (breakdown, references_editor, storyboard, revision)
  - launch.sh startup script + `/editors` skill for CLI launch
  - Intermediate step toward full SPA consolidation — API designed for migration

- [x] **WORKFLOW_GUIDE.md + .html updated with visual pipeline progress** — Completed 2026-02-06, updated 2026-02-09
  - Storyboard v3 schema, generation approach classification, asset naming convention
  - Cloud-first strategy (fal.ai), triptych generation, split FLF, E-style prompts
  - Dailies editor + API route, updated technology stack, 8 new glossary terms
  - Visual production phase rewritten for cloud-first pipeline
  - 2026-02-09: Section 6.5 rewritten — Production Console as primary, Editor Hub as legacy

- [x] **Visual Grammar Bible Phase 1+2 — corpus analysis + FINDINGS.md** — Completed 2026-02-09
  - 33 scenes, 1,955 shots across 14 microdrama, 13 anime, 5 cinema
  - Two-Peak confirmed at 93% prevalence, 10 production rules derived
  - `_research/visual_grammar_bible/FINDINGS.md` — shareable research document

- [x] **Production Console — unified tabbed visual pipeline app** — Completed 2026-02-09
  - 7 tabs: Revision, Grammar, Visual Bible, Breakdown, Storyboard, Shotlist, Dailies
  - Grammar tab: corpus grammar visualization, Two-Peak timeline, deviation warnings
  - Visual Bible tab: camera/film stock, lens package, color palettes, characters, locations, lighting guides. Saves directly to visual_bible.md (added 2026-02-14)
  - Shared state event bus, modular architecture, serve.py API endpoints
  - Standalone editors remain functional for backward compatibility

- [x] **VISUAL_PIPELINE_STATUS.md overhaul** — Completed 2026-02-06
  - Updated from 2026-02-04 to reflect 2 days of major progress
  - Cloud-first architecture, LoRA status table, model comparison results
  - Production pipeline v3, all 4 generation approaches documented
  - Updated models table (Qwen, Z-Image, WAN low-noise added)
  - Next steps rewritten, lessons learned expanded

- [x] **Orchestrated generation architecture** — Completed 2026-01-26
  - Two-tier architecture: Orchestrator Agent + Batch Sub-Agents
  - Added `/generate-orchestrated` command
  - Created verification tools: `verify_thread_continuity.py`, `verify_emotional_beats.py`, `verify_pattern_variety.py`, `orchestrator_verify.py`
  - Created state management: `init_orchestrator_state.py`, `update_orchestrator_state.py`, `generate_batch_summary.py`
  - Created agent protocols: `orchestrator_agent.md`, `batch_agent.md`
  - Wired `baseline_comparison.py` into orchestrator verification at batches 3,6,9,12
  - Updated WORKFLOW_SPEC.md, WORKFLOW_GUIDE.md, CLAUDE.md, ENGINE_GUIDE.html

---

## Documentation Status

**Last audit:** 2026-02-09

**Consistency check:** PASSED (Production Console + FINDINGS.md documented across CLAUDE.md, WORKFLOW_SPEC.md, WORKFLOW_GUIDE.md, VISUAL_PIPELINE_STATUS.md, README.md)

**Single sources of truth verified:**
- `/CONSTANTS.md` — All numeric values
- `/WORKFLOW_SPEC.md` — Authoritative workflow
- `/skills/format_v12/SKILL.md` — Format rules
- `/tools/engine_constants.py` — Python parser for CONSTANTS.md

**Hard gates verified:**
- Promotion blocked without 34/34 + behavioral DNA validation
- Treatment blocked without promotion
- Generation blocked without treatment.md + context loaded

**Orchestrator verification (NEW):**
- Thread continuity: plant → advance → payoff tracking
- Emotional beats: 11 beats at scheduled episodes (±2)
- Pattern variety: no 4+ consecutive same type
- Voice contamination: batches 3,6,9,12

---

*Check off items as they're fixed. Move to Completed section with date.*
