# Visual Production Findings — Living Document

**Last updated:** 2026-02-23
**Purpose:** Persistent record of all visual production discoveries, decisions, and open questions. Updated each session to prevent knowledge loss across context windows.

---

## Session Log

| Date | Session | Key Findings |
|------|---------|-------------|
| Feb 4-8 | Lab testing | 18 experiments, ~277 frames, ~$15 R&D. Z-Image Turbo wins as default engine. LoRA essential. |
| Feb 14-15 | Three-pass pipeline | Qwen MA → NBP → SeedVR2 recommended. SeedVR2 73% win rate. MidJourney originals 80%. |
| Feb 15 | LoRA candidate shootout | 76+ runs across JINX/KIAN/VAREK. Three-pass final: 46% win rate (lowest). |
| Feb 22 | Production keyframe test | EP001 Flux 2 Dev + LoRA: 66 keyframes, 0 failures, identity holds. First/last prompt bug found and fixed. |
| Feb 22 | Mid-frame coherence fix | Mid frames bypassed prompt compiler and had no img2img conditioning. Both issues fixed. |
| Feb 22 | Transcript knowledge recovery | Searched 6+ transcript files. Documented undocumented findings below. |
| Feb 23 | FMLF anchor workflow fix | Generation order was backwards (first→mid→last). Fixed to mid→first→last. Mid is anchor. |
| Feb 23 | Triptych prompt gap | EP001 triptych shots had null triptych_prompt — silently fell to per-frame gen. Auto-composition fallback added + storyboard agent updated. |
| Feb 23 | Knowledge recovery | Searched 182 transcript files (764 MB). Codified 21 findings from LoRA/identity and pipeline architecture agents. 7 immediate, 5 high-priority pipeline decisions, 5 medium-priority future items. |

---

## Confirmed Decisions

| Decision | Winner | Date | Evidence |
|----------|--------|------|----------|
| Default T2I engine | Z-Image Turbo | Feb 7 | A/B test: anatomy, speed, cost (10x cheaper) |
| Override engine | Flux 2 Dev | Feb 22 | Multi-char dual LoRA, hero stills with LoRA, identity-critical shots |
| Identity lock method | LoRA (per character) | Feb 6 | With/without comparison conclusive |
| LoRA scale (Z-Image) | 1.3 solo, 0.5 dual | Feb 7 | Lab testing |
| LoRA scale (Flux 2 Dev) | 1.0 solo, 0.5 dual | Feb 22 | 1.3 causes blowout (muddy textures, destroyed faces) |
| Prompt style (Z-Image) | BFL structured + HEX + verbs | Feb 4 | 7-strategy comparison |
| Prompt style (Flux 2 Dev) | Narrative E-style (~170 words) | Feb 22 | Structured prompts fail, narrative works |
| LoRA candidate pipeline | Three-pass: Qwen MA → NBP → SeedVR2 | Feb 15 | Redesigned from two-pass. NBP added for identity/expression. |
| Hero stills | MidJourney + NBP (manual) | Feb 22 | JT preference. Pipeline-generated images too "presentational." |
| Triptych format | 1536x912 wide strip, split 3 panels | Feb 4 | Best panel separation, ~195 words for 3 panels |
| Generation order | Hero-first (if hero_action exists) | Feb 6 | Innovation tests confirmed |
| Dual LoRA max scale | 1.0 total (0.5/0.5) | Feb 7 | Corruption above 1.0 |
| Training data quality | Single-source, face-heavy (33-40% close-ups) | Feb 8 | Mixed-source dilutes geometry |
| LoRA caption length | 2-7 words optimal (Z-Image Turbo) | Feb 12 | CivitAI "Minimalist/Clean Label Method." Rich captions (60+ words) poison training — noise swamps signal. 25-40 words marginal. Trigger + class word + angle is the formula. |
| Z-Image attention window | ~75 tokens / ~55 words | Feb 4 | Content past ~55 words increasingly ignored. Prompt compiler generates 200+ words — only first ~55 matter. |
| Prompt framing position | Layer 0 (before LoRA triggers) | Feb 12 | Was Layer 8 of 10 — past Z-Image's attention window. Shot type (ECU/CU/WIDE) encoded as prose in `first_frame`, not as structured data. Camera angle/lens at Layer 8 largely ignored. |
| LoRA training steps | 2000-3000 recommended, 1200 current | Feb 12 | fal.ai recommendation. 5000+ risks overfitting. Current LoRAs at 1200 steps — may benefit from 2000. |
| Presentational bias cause | LoRA face-heavy prior + prompt layer ordering | Feb 12 | LoRA trained on 33-40% frontal close-ups creates face-forward prior. Framing buried in Layer 8 past attention window compounds the problem. Fix: short prompts with natural camera angles, framing at position 0. |
| VFX language in T2I | Poisons output — literal spirals/noise on faces | Feb 22 | "Targeting reticles," "data streams," "holographic HUD" interpreted literally. Save VFX language for motion prompts only. |
| Kinetic verbs vs static poses | Kinetic verbs produce better results | Feb 22 | "A woman mid-stride pivots" > "A woman stands in a corridor." VerbStrengthValidator enforces this. |

---

## Bug Fixes Applied

### 1. First/Last Frame Prompts Identical (Feb 22)
- **Symptom:** First and last keyframes produced nearly identical images
- **Root cause:** `_build_action_pose()` in `prompt_compiler.py` expected `anticipation_action`/`aftermath_action` fields that storyboard doesn't populate. Both frame types fell through to same `action` field.
- **Fix:** Added fallback to `shot["last_frame"]` for last frames and `shot["hero_frame"]` for hero frames in `_build_action_pose()`
- **Verification:** Shot 2 now produces different prompt hashes (1f71dd vs 712a9f)
- **File:** `lib/prompt_compiler.py:495-521`

### 2. Mid Frames Bypass Prompt Compiler (Feb 22)
- **Symptom:** Mid/hero frames had radically different lighting, wardrobe, and skin from first frames
- **Root cause (a):** Mid frames used raw `shot["hero_frame"]` prose as prompt, skipping all 10 prompt compiler layers (camera, lens, wardrobe, environment, lighting, etc.)
- **Root cause (b):** Mid frames had no img2img conditioning — generated completely fresh with no visual anchor to first frame
- **Fix (a):** Mid frames now route through prompt compiler with `frame_type="hero"`
- **Fix (b):** Mid frames now get img2img conditioning from first_frame_url (strength = `--img2img-strength`, default 0.55)
- **File:** `tools/generate_storyboard_keyframes.py:726-815`

### 3. Engine-Specific LoRA Scale Not Passed Through (Feb 22)
- **Symptom:** `get_inference_config()` in `train_lora.py` didn't return `flux2_scale_solo`/`flux2_scale_dual`
- **Fix:** Added conditional pass-through of engine-specific fields
- **File:** `tools/train_lora.py:196-237`

### 4. FMLF Anchor Workflow Backwards (Feb 23)
- **Symptom:** Mid frames nearly identical to first frames. No visual differentiation between f1 and f2.
- **Root cause (a):** Generation order was `first → mid → last`. Mid was derived FROM first via img2img (0.40 strength = 60% identical). Should be `mid → first → last` — mid is the anchor.
- **Root cause (b):** `_build_action_pose()` had no fallback for `first_frame` prose. Both first and mid fell through to the same `action` field, producing identical prompts.
- **Root cause (c):** `mid_frame_url` was never captured, so first/last couldn't condition from mid even if the order was correct.
- **Fix (a):** FMLF order changed to `["mid", "first", "last"]` — mid generated first as anchor, no img2img
- **Fix (b):** Added `first_frame` prose fallback in `_build_action_pose()` for frame_type="first"
- **Fix (c):** Added `mid_frame_url` capture + first/last now img2img from mid anchor
- **Result:** All three frames get genuinely different prompts (first_frame/action/last_frame) and correct img2img chain
- **Files:** `lib/prompt_compiler.py:510-515`, `tools/generate_storyboard_keyframes.py:329-356,776-830,866-872`

### 5. Triptych Prompts Never Written — Silent Fallback (Feb 23)
- **Symptom:** Triptych shots generated as separate frames instead of shared-context strips. No visual consistency benefit from triptych classification.
- **Root cause (a):** `/storyboard` agent classifies shots as `triptych_split_flf` and writes `first_frame`/`last_frame` prose, but never writes the `triptych_prompt` field. EP001: 4 triptych shots, 0 with `triptych_prompt` populated.
- **Root cause (b):** Pipeline check was `if is_triptych and shot.get("triptych_prompt")` — when `triptych_prompt` is null, silently falls to per-frame generation. No warning printed.
- **Fix (a):** Added `compose_triptych_prompt()` function that auto-composes a triptych strip prompt from the shot's existing `first_frame`, `action`/`hero_frame`, and `last_frame` fields using the validated INNOVATIONS.md template. Pulls character visual, wardrobe, environment from BREAKDOWN and camera/film from PROJECT_CONFIG.
- **Fix (b):** Changed pipeline check to `if is_triptych:` — always takes the triptych path. Uses explicit `triptych_prompt` if present, auto-composes if null. Prints word count for auto-composed prompts.
- **Fix (c):** Updated storyboard agent (`agents/storyboard_agent.md`) with mandatory triptych prompt section — when assigning `triptych_split_flf`, agent MUST also write `triptych_prompt` using the 3-panel template.
- **Files:** `tools/generate_storyboard_keyframes.py:287-380,714-729`, `agents/storyboard_agent.md`, `.claude/skills/storyboard/SKILL.md`

---

## Three-Pass Pipeline Details (Feb 14-15)

**Architecture:** Qwen Multi-Angle → NBP (Gemini 3 Pro Image) → SeedVR2

| Pass | Engine | Purpose | Cost | Notes |
|------|--------|---------|------|-------|
| 1 | Qwen Multi-Angle | Angle geometry from hero image | $0.035 | No text prompts (causes regeneration) |
| 2 | NBP (Gemini 3 Pro) | Background swap + expression + identity lock + skin detail | $0.134 | Skip for back angles (adds smiles to back-of-head) |
| 3 | SeedVR2 | Non-generative quality upscale | $0.001 | Cannot alter pose/angle |

**Smart routing by angle type:**
- Face angles (front, closeups, 3/4): Pass 1 → Pass 2 (skip SeedVR2 — skin priority)
- Body angles (low, high, full_body): Pass 1 → Pass 2 → Pass 3 (proportionality + upscale)
- Back/profile: Pass 1 → Pass 3 (skip NBP — adds smiles)

**Curation results (Feb 15):**
| Source | Selected | Rejected | Win Rate |
|--------|----------|----------|----------|
| MidJourney originals | 4 | 1 | **80%** |
| Pass 1 (Qwen MA only) | 3 | 1 | **75%** |
| Pass 3 (SeedVR2 final) | 8 | 3 | **73%** |
| NBP pass | 4 | 2 | **67%** |
| Three-pass final (picks/) | 11 | 13 | **46%** |

**Takeaway:** MidJourney originals still win on quality. SeedVR2 adds clear value as final pass. Three-pass as a batch pipeline needs curation — quality is inconsistent.

---

## img2img Conditioning Values

| Context | Strength | Source |
|---------|----------|--------|
| First frame from mid (FMLF) | 0.40 | CLI flag `--img2img-strength` |
| Last frame from mid (FMLF) | 0.40 | CLI flag `--img2img-strength` |
| First frame from hero (hero-first) | 0.40 | Hardcoded in generate_storyboard_keyframes.py |
| Last frame from hero (hero-first) | 0.40 | Hardcoded |
| Last frame from first (FLF fallback) | 0.40 | CLI flag `--img2img-strength` |
| Location reference | 0.35 | CLI flag `--location-ref-strength` |
| Cross-shot punch-in | 0.30 | Storyboard `continuity_from.strength` |
| Qwen Image test (ComfyUI) | 0.55 | `labs/tests/test_qwen_img2img.py` (different context — Qwen img2img, not production pipeline) |

**CORRECTION (Feb 22):** Transcript search confirmed the intended img2img strength is **0.40**, not 0.55. The 0.40 value appears in multiple places in session 1004b837 (Feb 12). The CLI default has been corrected from 0.55 to 0.40.

**CORRECTION (Feb 23):** FMLF anchor workflow was backwards. Mid frame IS the anchor (peak action), generated FIRST with no img2img. First (anticipation) and last (aftermath) derive FROM mid. Previous implementation generated first→mid→last with mid derived from first — produced nearly identical f1/f2.

**Chain (FMLF):** `mid (anchor, fresh) → first (img2img 0.40 from mid) → last (img2img 0.40 from mid)`
**Chain (hero-first):** `hero (anchor, fresh) → first (img2img 0.40 from hero) → last (img2img 0.40 from hero)`
**Chain (FLF fallback):** `first (fresh) → last (img2img 0.40 from first)`

---

## EP001 Shot Distribution

| Generation Approach | Count | % |
|-------------------|-------|---|
| standard_flf | 25 | 80.6% |
| triptych_split_flf | 5 | 16.1% |
| held_frame_push | 1 | 3.2% |

- Only 3 of 31 shots have `hero_frame` populated
- 0 of 31 shots have `triptych_prompt` (all triptych shots fall to per-frame generation)
- Most shots have rich `first_frame` and `last_frame` prose descriptions

---

## Qwen Multi-Angle Parameters

```python
horizontal_angle: 0-360
vertical_angle: -30 to 90
zoom: 0-10
lora_scale: 0.9
image_size: "square_hd" (1024x1024)
num_inference_steps: 40
guidance_scale: 4.5
```

**Critical:** Text prompts (`additional_prompt`) must be EMPTY. Text causes regeneration instead of pure geometric rotation.

**Reddit technique (Qwen Edit Standard):** Requesting "Turn the camera 90 degrees to the left/right" works better than "180 camera rotation" (which rotates the subject, not the camera). Iterate 90 degrees at a time. Short prompts work better. See Obsidian inbox note for details.

**OPEN:** Reddit technique (Qwen Edit Standard with text prompts) has NOT been compared against the purpose-built Multi-Angle endpoint. Head-to-head comparison needed.

---

## Unsolved Production Challenges

### 1. Episode-Level Visual Coherence
The biggest unsolved challenge. Independent frame generation causes:
- Lighting drift between shots in the same scene
- Environment inconsistency across cuts
- Wardrobe/skin variation between independently generated shots

**Possible approaches:**
- Shared location reference images (partially implemented via `reference_image_url`)
- Cross-shot img2img conditioning (partially implemented via `continuity_from`, `same_angle_from`)
- Environment anchoring from breakdown.json habitat zones
- Post-generation visual QC via `visual_gate.py` (exists but doesn't run automatically)

### 2. "Presentational" / To-Camera Bias — ROOT CAUSE FOUND (Feb 12)
LoRA-generated images tend to be too direct/presentational. Characters face the camera instead of engaging with the scene.

**Root cause (confirmed Feb 12):** Two factors combine:
1. **LoRA training data is face-heavy** (33-40% close-ups, frontal) — LoRA has a strong prior toward front-facing compositions
2. **Prompt framing buried in Layer 8** of 10 layers — past Z-Image Turbo's ~75-token attention window. By the time the model reads camera angle/framing, it's stopped listening.

**Working fix (implemented Feb 12):** Short prompts (40-65 words from `first_frame` prose) with natural camera angles. `build_previz_prompt()` uses `first_frame` text directly. Verified: over-the-shoulder, from-behind, from-below, low angle, high angle all come through naturally.

**Production fix (implemented Feb 23):** Prompt compiler `framing-first` architecture — shot framing at Layer 0 (position 0, before LoRA triggers). Layer skipping per shot type (ECU skips wardrobe/environment/color, CU skips wardrobe).

**Additional approaches (untested):**
- Qwen Multi-Angle rotation of generated frames (rotate scene 90 degrees)
- Off-axis camera angles in storyboard data

### 3. WAN 2.2 Identity Drift
No metrics exist for how much character identity shifts during FLF video interpolation. This is the gap between "good keyframes" and "good video."

### 4. Shot Grammar Enforcement
The storyboard validator checks for jump cuts and establishing shots, but the generation pipeline has no mechanism to:
- Enforce the 180-degree rule
- Ensure 30% punch-in or 30-degree axis shift between cuts
- Maintain frame position continuity between shots

### 5. Visual QC Automation
`visual_gate.py` exists (two-gate: Gemini 2.5 Flash vision for artifact detection + semantic alignment) but doesn't run automatically after generation. JT wants Gemini 3.1 for video review when available.

### 6. SeedVR2 Smoothing vs. Detail Preservation (Unresolved)
SeedVR2 has a `noise_scale` parameter (0-1, default 0.1) that controls smoothing. Higher = more smoothing, loses skin detail. JT initially agreed to skip SeedVR2 on NBP face outputs, then reversed ("I was wrong, revert and return SeedVR2 to all outputs"), then wanted an upscaler shootout (SeedVR2 vs Crystal vs Creative vs Topaz). Status: **unresolved.**

Options available:
- `noise_scale: 0.01-0.03` — minimal smoothing
- `upscale_factor: 1` — no upscale, just quality pass
- Skip SeedVR2 on face-heavy angles entirely

### 7. Kian (Non-Human Character) Identity Lock
Universal identity lock template included "hair color, hair texture" which caused NBP to generate hair on Kian (helmeted android). Fixed with `mandatory_traits` in `rendering_directives` — character-specific visual markers separate from universal skull geometry lock. Check that `identity_type: non_human` is set for non-human characters.

### 8. Sandwich Workflow Repositioned
The Sandwich Workflow (Two-Anchor Interpolation: Frame 1 + Frame 96 both generated with correct LoRA identity, WAN interpolates between them) is **no longer the primary path**. It's "built and waiting." Current v1 approach uses flexible per-shot generation with different tools per shot type. May become relevant again with WAN 2.6 or Seedance 2.0.

**Implementation gap:** Current `generate_from_storyboard.py` uses **single-frame I2V only**. The last_frame image is generated but NOT used as a video endpoint. True two-anchor interpolation requires the FLF2V model:
- **Model:** `Wan2_1-FLF2V-14B-720P_fp8_e4m3fn.safetensors` (~14GB)
- **Acceleration LoRA:** `Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors` (~350MB)
- **Source:** `https://huggingface.co/Kijai/WanVideo_comfy/tree/main`
- **Status:** Evaluated but NOT installed. Install when ready to test sandwich workflow.

### 9. Trigger Word Convention (Undocumented)
LoRA trigger words follow an implicit convention: all-caps, consonant-cluster abbreviation + CHAR suffix (e.g., JNXCHAR, KIANCHAR, VRKCHAR). Must be non-word tokens to avoid collision with real vocabulary. Convention exists in `lora_registry.json` but is not documented as a standard for future characters.

---

## Habitat Zone System

72 script locations collapse into 5 visually distinct zones (from `breakdown.json`):

| Zone | Visual DNA |
|------|-----------|
| Lower Decks | Rust, amber emergency light, corroded steel |
| Mid-Ship | Sterile white corridors vs grimy maintenance |
| The Root | Organic, bioluminescent amber/green |
| Crown Level | Chrome, gold, ceremonial white |
| Planet Surface | Natural light, dirt, open sky |

Each zone has `visual_dna` descriptions. The three-pass pipeline reads habitat zones from breakdown.json and uses their visual_dna as the environment pool. Known issue: "split personality" backgrounds (split down the middle between two locations) caused by conflicting prompt data.

---

## Cost Model

### Per-Episode (EP001, 31 shots)
| Engine | Per Frame | Per Episode (FLF) | Per Episode (FMLF) |
|--------|-----------|-------------------|-------------------|
| Z-Image Turbo | $0.005 | $0.31 | $0.47 |
| Flux 2 Dev + LoRA | $0.02 | $1.24 | $1.86 |

### Full Series (60 Episodes)
| Engine | FLF | FMLF |
|--------|-----|------|
| Z-Image Turbo | ~$19 | ~$28 |
| Flux 2 Dev + LoRA | ~$74 | ~$112 |

### LoRA Candidate Generation (per character)
| Pipeline | Per Character | Notes |
|----------|--------------|-------|
| Three-pass (14 angles × expressions) | ~$25-35 | Smart routing reduces unnecessary passes |
| LoRA training (z-image) | $2.26/1K steps | ~$5-8 per LoRA |
| LoRA training (flux-lora-fast) | $2.00/1K steps | ~$5-7 per LoRA |

---

## Pipeline Architecture Decisions

### Kontext Removal (Jan-Feb 2026)
Original pipeline (Jan 29) included Flux Kontext as an intermediate step for per-shot character adjustments between keyframe generation and video generation:
```
MidJourney → Nano Banana Pro → fal.ai LoRA training → Flux 2 Klein + LoRA → Flux Kontext → Wan 2.1
```
**Why removed:** Kontext required different text encoders (CLIP-L + T5-XXL instead of Qwen) and couldn't share model weights with Flux 2 Klein. LoRA-based identity lock made per-shot Kontext adjustments unnecessary. Dropped without explicit documentation.

### Wan 2.1 vs Wan 2.2 Architecture Split
| Factor | Wan 2.1 (Local ComfyUI) | Wan 2.2 (fal.ai Cloud) |
|--------|-------------------------|------------------------|
| Architecture | Single model | MoE (2 experts) |
| Files | 1 GGUF (14.2 GB Q6_K) | Managed by fal.ai |
| Setup | Standard ComfyUI nodes | API call |
| LoRA support | Single LoRA (LightX2V) | Dual LoRA adapters |
| Quality | Very good | Slightly better motion |

**Decision:** fal.ai (Wan 2.2) for production. Local ComfyUI (Wan 2.1) for testing/development. Wan 2.2's MoE dual-sampler workflow too complex for local setup.

### Local ComfyUI Wan 2.1 Workflow
14-node pipeline: LoRA Select → Block Swap → Model Load → T5 Text Encoder → VAE → CLIP Vision → Load Image → CLIP Vision Encode → Image-to-Video Encode → Text Encode → Sampler → Decode → Video Combine

**Key parameters:**
- Steps: 4 (with LightX2V step-distillation LoRA) vs 40 (without)
- CFG: 1.0, shift: 5.0, scheduler: `dpm++_sde`
- Block swap: 20 of 40 transformer blocks (48GB Apple Silicon memory management)
- Frames: 81 (5s at 16fps)
- `merge_loras: false` — REQUIRED with GGUF quantized weights (cannot merge LoRA directly)
- Chinese negative prompt (standard for Wan 2.1 training data)

**Acceleration:**
- LightX2V LoRA: `Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors` (705MB) — reduces steps from 40 to 4-8
- SageAttention (INT8) and Nunchaku (4-bit): CUDA-only, do NOT work on Apple Silicon/MPS

### Identity Methods Comparison (Why LoRA Won)
| Method | Identity Strength | Ecosystem | Cloud API |
|--------|-------------------|-----------|-----------|
| **LoRA** | Highest | fal.ai, ComfyUI | Yes |
| PuLID | High (faces only) | Limited | No |
| IPAdapter | Medium-high | ComfyUI only | No |
| Redux (BFL) | Medium (style > identity) | Flux only | No |
| ControlNet | Structure only (no appearance) | Wide | Partial |
| img2img | Low-medium | Universal | Yes |

**Decision:** LoRA provides strongest identity lock AND integrates with fal.ai cloud generation. IPAdapter/PuLID may complement LoRA for Tier 2/3 multi-character shots but aren't tested.

---

## LoRA Scale Decision Needed

**OPEN:** Z-Image LoRA scale has changed multiple times:
1. Feb 7: 1.3 (original LoRAs, 1000 steps, rich captions)
2. Feb 12: Tested 1.1 (looked "more consistent/defined"), then 0.8 (after retrain with minimal captions)
3. Feb 22: Set back to 1.3 in production pipeline
4. Current `lora_registry.json`: 1.3 for all characters

**Problem:** The 1.3 value was calibrated with OLD LoRAs (Feb 7, 1000 steps, rich captions). Current LoRAs (Feb 21, 1200 steps, minimal 2-7 word captions) have never been A/B tested at 1.3. They WERE tested at 1.1 and 0.8 but not systematically compared.

**Needs:** A/B comparison of current LoRAs at 0.8 / 1.0 / 1.1 / 1.3 to set the production value.

---

## Future Engines & Extensions

### LTX-2 (Video Generation Alternative)
- Up to 20 seconds per clip (vs Wan 2.1's ~5s)
- Native synchronized audio generation
- Up to 4K resolution (vs 720p)
- Both Mac machines (M1 Ultra Studio + M4 MacBook Pro) can run at 48GB
- **Decision:** Don't swap out Wan pipeline now. Evaluate after Wan pipeline is proven (Phase 2+).
- No LoRA ecosystem exists yet.

### Veo 3.1 (Video Generation Alternative)
Optimal prompt formula: `[Shot Composition] + [Subject] + [Action] + [Environment] + [Lighting/Mood] + [Audio]`
- 4-8 second clips optimal
- Include audio cues in prompt
- Separate JT-built app with Gemini-powered prompt generator that analyzes first/last frame images
- **Status:** Separate tool, not integrated into Recoil pipeline.

### Wan 2.6 (Reference-to-Video)
- Introduces "reference-to-video" feature — could improve identity consistency
- **Status:** Released but not evaluated. Evaluate when back in ComfyUI.

### Flux 2 Native Multi-Reference (Up to 10 Images)
- Flux 2 natively supports up to 10 reference images without LoRA
- Prompt can say "the woman from image 1"
- ComfyUI-IPAdapter-Flux exists for finer control
- **Status:** Not used in production. LoRA provides stronger identity lock. Available for non-LoRA characters or quick prototyping.

### Performance Capture via ControlNet (Future)
```
1. SHOOT ACTORS (iPhone) → Real people performing scene
2. EXTRACT POSE → DWPose or MediaPipe per-frame
3. EXTRACT AUDIO → Separate track for TTS
4. CONTROLNET + LORA → ControlNet (pose) + LoRA (identity) + IPAdapter (style)
```
Wan 2.1 supports ControlNet conditioning natively. Not tested in Recoil pipeline.

### Z-Image Edit (Kontext Alternative)
Z-Image Edit could solve character consistency without LoRA training (similar to Flux Kontext). Z-Image ecosystem (ControlNet, IPAdapter, PuLID, LoRA tooling) hadn't developed at time of evaluation. Low priority — Z-Image Turbo + LoRA is the production path.

### Evaluated and Rejected
- **Seedream 4.0/4.5 (ByteDance):** API-only, proprietary, no local execution, no ComfyUI integration, no LoRA/ControlNet ecosystem. Doesn't slot into the pipeline.

---

## Known Gaps (Film Stock Field Unused)
`generate_from_storyboard.py` does NOT append the `cinematic` field from storyboard JSON to video generation prompts. The field exists in every storyboard shot but is ignored by the video pipeline. Film stock modifiers from T2I (via prompt compiler) are NOT carried forward to video.

---

## Files Modified This Session (Feb 22)

| File | Change |
|------|--------|
| `docs/archive/generation_workflows.md` | Master generation reference (archived Feb 27, 2026) |
| `docs/visual_production_findings.md` | NEW — This document |
| `lib/prompt_compiler.py` | Fixed first/last frame differentiation |
| `tools/generate_storyboard_keyframes.py` | Engine-aware LoRA scales, mid-frame coherence fix, `--fmlf` flag |
| `tools/train_lora.py` | Pass through flux2 scales in get_inference_config |
| `leviathan/visual/lora_registry.json` | Added flux2_scale_solo: 1.0, flux2_scale_dual: 0.5 |
| `docs/candidate_generation_engines.md` | Cross-ref link |
| `appendix_e_flux2_protocols.md` | LoRA scale 1.0 note |
| `CLAUDE.md` | Added generation_workflows.md to Key Documents |
| `docs/PRODUCTION_PIPELINE_GUIDE.md` | Phase 6 engine reference |

## Files Modified This Session (Feb 23)

| File | Change |
|------|--------|
| `tools/generate_storyboard_keyframes.py` | FMLF anchor order fix (mid→first→last), `compose_triptych_prompt()` auto-composition, triptych pipeline fallback |
| `lib/prompt_compiler.py` | `first_frame` prose fallback for frame_type="first", `mid_frame_url` capture |
| `agents/storyboard_agent.md` | Mandatory triptych prompt section, v3 generation types, output schema update |
| `.claude/skills/storyboard/SKILL.md` | Steps 4+6 (generation approach + triptych prompts), downstream consumers |
| `docs/visual_production_findings.md` | Codified 21 findings from transcript knowledge recovery |