> **ARCHIVED 2026-03-02.** LoRA pipeline eliminated Feb 27, 2026. Retained for potential future
> re-adoption of open-source identity models. For the current visual pipeline, see
> `../PRODUCTION_PIPELINE_GUIDE.md` Phase 6.

# Visual Pipeline — STATUS

**Last updated:** 2026-02-07
**Phase:** Pre-Production (Cloud-first pipeline tested, LoRA training active, automated candidate generation built)

---

## CURRENT STATE

### What's on the Network Volume (KEEP)
- [x] **WAN 2.2 I2V** — all models downloaded and persisted
  - `wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors` (14GB, models/unet/)
  - `wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors` (14GB, AniSora variant)
  - `umt5_xxl_fp8_e4m3fn_scaled.safetensors` (6.3GB, models/text_encoders/)
  - `wan2.2_vae.safetensors` (1.4GB, models/vae/)
  - `clip_vision_h.safetensors` (1.2GB, models/clip_vision/)
- [x] **Qwen Image** — PRIMARY T2I model
  - `qwen_image_fp8_e4m3fn.safetensors` (~18.6GB)
  - `qwen_2.5_vl_7b_fp8_scaled.safetensors` (~6.5GB)
  - `qwen_image_vae.safetensors` (~243MB)
- [x] **Z-Image Turbo** — fast sketch model
  - `z_image_turbo_bf16.safetensors` (12GB)
  - `qwen_3_4b.safetensors` (7.5GB)
  - `ae.safetensors` (320MB)
- [x] **Flux 2 Dev FP8** — for LoRA testing + fal.ai reference
  - `flux2_dev_fp8mixed.safetensors` (34GB, models/diffusion_models/)
  - `mistral_3_small_flux2_fp8.safetensors` (17GB, models/text_encoders/)
  - `flux2-vae.safetensors` (321MB, models/vae/)
- [x] **Custom nodes**: WanVideoWrapper, VideoHelperSuite, KJNodes, ComfyUI-Manager, ComfyUI-Wan22FMLF

### What's Been Retired (DELETED)
- ~~Flux.1-dev~~ → deleted, replaced by Flux 2 Dev
- ~~IPAdapter Flux (Shakker-Labs)~~ → deleted, replaced by Flux 2 native multi-reference
- ~~clip_l.safetensors, t5xxl_fp8_e4m3fn.safetensors~~ → Flux 1 text encoders
- ~~ae.safetensors (Flux 1)~~ → Flux 1 VAE
- ~~ip-adapter.bin~~ → IPAdapter weights
- ~~sigclip_vision_patch14_384.safetensors~~ → IPAdapter CLIP vision

### T2I Model Comparison Results (2026-02-05)

| Model | Speed | Quality | Decision |
|-------|-------|---------|----------|
| Flux 2 Dev (local FP8) | 8-25 min | Plastic skin, video-game look | **RETIRED from T2I** |
| Z-Image Turbo | 45s | Photographic but wrong hair | **Fast sketches only** |
| **Qwen Image** | **90-105s** | **Film-quality, correct character** | **PRODUCTION T2I** |

**Decision:** Qwen Image for production keyframes. Z-Image for fast iteration. Flux 2 retired from local T2I (still used on fal.ai with LoRA).

### Cloud vs Local Test Results (2026-02-06)

| Pipeline | Platform | Time | Cost | Quality |
|----------|----------|------|------|---------|
| Flux 2+LoRA (FP8, 20 steps) | RunPod A100 | 46s | ~$0.015 | Good |
| **Flux 2+LoRA (FP16, 28 steps)** | **fal.ai** | **9s** | **$0.02** | **Better — CLOUD WINS** |
| Qwen img2img (denoise 0.55) | RunPod A100 | 72s | ~$0.024 | Good composition match |

**Verdict:** Use fal.ai for production T2I. RunPod only for uncensored video fallback.

### LoRA Training Status

| Character | T2I LoRA | Video LoRA | Status |
|-----------|----------|------------|--------|
| **Jinx** | COMPLETE (Flux 2 + Z-Image, in registry) | COMPLETE (high+low noise URLs) | A/B test complete, user review pending |
| **Kian** | COMPLETE (Flux 2, trigger: KIANCHAR) | Not started | Solo shots good, dual-char at 0.5/0.5 scale |

**Jinx T2I LoRAs:**
- Flux 2: Trigger `JNXCHAR`, 28 images, 1000 steps, 317 MB
- Z-Image Turbo: 21 clean images, 2000 steps, 81 MB, $1.70 — 10x cheaper
- Identity lock confirmed: same face every generation
- A/B test complete (11 shots): Z-Image 1.5x faster (5.0s vs 7.6s), 3.5x faster on triptychs
- URL: see `leviathan/visual/lora_registry.json`

**Jinx WAN 2.2 Video LoRA:**
- Trained 2026-02-06 (34 min on fal.ai)
- Dual adapters: high_noise + low_noise
- **NOT YET TESTED** — fal.ai balance was exhausted, now topped off. Ready to test.
- Use with: `fal-ai/wan/v2.2-a14b/image-to-video/lora`

**Kian T2I LoRA:**
- Trigger: `KIANCHAR`, 7 images, 1000 steps
- COMPLETE (2026-02-07): `~/Desktop/kian_lora_training/kian_lora_v1.safetensors` (317 MB)
- Solo shots look good. Dual-character shots work at 0.5/0.5 scale.

### Existing Character Assets (KEEP)
- 24 Gemini-generated multi-angle Jinx refs (`leviathan/visual/refs/characters/JINX/`)
  - 5 costume variants × ~5 angles each (front, profile, three_quarter, close_up, back)
  - Best face geometry match to hero image — better than IPAdapter output
  - White backgrounds — fine for inference reference sheets. For LoRA training, use diverse backgrounds (5-10 environments) to prevent overfitting
- Hero image: `leviathan/visual/refs/characters/heroes/JINX.png`
- Scale lineup: `leviathan/visual/refs/characters/scale_lineup.png` (Jinx 5'5", Kian 6'5", Varek 6'2")

---

## STRATEGIC PIVOT: Cloud-First via fal.ai (2026-02-06)

**Why:** fal.ai runs Flux 2 Dev at full FP16 precision with LoRA support for $0.02/image in 9 seconds. Local RunPod A100 at FP8 takes 46s and produces visibly lower quality. Cloud also handles WAN 2.2 FLF at 40 steps for $0.04-0.08/video in 138s.

**Architecture:**
- **Cloud (fal.ai):** All T2I (Flux 2+LoRA) and video (WAN 2.2 FLF+LoRA) generation
- **Local (RunPod):** Fallback only for uncensored content that hits API moderation filters
- **Alternatives:** WaveSpeedAI (WAN 2.2 Spicy + LoRA), Replicate (FLF + Lightning LoRA)

**What's superseded:**
- IPAdapter Flux → Flux 2 native multi-reference + LoRA
- Flux Kontext → solved Flux 1's consistency problem; Flux 2 solves it natively
- Local-first rendering → Cloud wins on speed, cost, and quality (FP16 vs FP8)
- Manual ComfyUI workflows → API-driven generation from storyboard JSON

---

## PRODUCTION PIPELINE WORKFLOW (v3 — February 2026)

### Overview

```
CHARACTER DESIGN
  Keystone images (7-8 neutral poses in MidJourney) → keystones/ directory
    → batch_generate_refs.py --lora-prep 50 (keystone-conditioned candidates, 2-3 refs per angle)
      → lora_picker.html (curate 15-25 best)
        → train_lora.py prepare --from-candidates (build ZIP)
          → fal.ai Z-Image/Flux 2 LoRA training (~$1.70-8/character)
          → fal.ai WAN 2.2 Video LoRA training (~$25/character)

STORYBOARD (per episode)
  /storyboard → storyboard_ep_NNN.json (v3 schema)
    → generation_approach per shot (triptych_split_flf / standard_flf / held_frame)
    → E-style hero prompts (~150-180 words) for triptych shots
    → Asset naming convention: {PRJ}_EP{NNN}_S{NN}_T{NN}_{CHAR}

KEYFRAME GENERATION (cloud via fal.ai)
  Triptych shots → Flux 2+LoRA 1536x912 strip → auto-split → Gemini upscale
  Standard FLF  → Flux 2+LoRA first+last frames → Gemini upscale
  Held frames   → Flux 2+LoRA single keyframe → Ken Burns push

VIDEO GENERATION (cloud via fal.ai)
  Triptych → Split FLF (first→hero, hero→last) → concatenate
  Standard → Standard FLF (first→last)
  Held     → Ken Burns push (no WAN)
    → Dailies review (dailies_editor.html) → approve / retake
      → Final cut
```

### Stage 1: Character Asset Preparation — PROVEN + AUTOMATED

**Goal:** Create a trained LoRA per character that locks face geometry, body proportions, and style.

**Automated Candidate Pipeline (updated 2026-02-14):**
```
1. Generate 7-8 keystone images in MidJourney (neutral poses, key angles)
2. Place in [project]/visual/lora_candidates/[CHAR]/keystones/
3. engine_shootout.py --threepass           → Three-pass: Qwen MA → NBP → SeedVR2 (recommended)
   OR: batch_generate_refs.py --hybrid      → Legacy hybrid Qwen+Gemini candidates
   OR: batch_generate_refs.py --lora-prep 50 → Gemini-only candidates (legacy)
4. lora_picker.html                         → Curate 15-25 best in browser
5. train_lora.py prepare --from-candidates  → Build training ZIP from selection
6. train_lora.py submit --type z_image      → Submit to fal.ai
```
- **Three-pass sequential pipeline (recommended):** Qwen Multi-Angle (angle geometry) → NBP / Gemini 3 Pro (background swap + expression + identity lock) → SeedVR2 (quality upscale). ~$0.10/angle, ~56s/angle, ~$2.35 for 12 angles (smart expression distribution).
  - `engine_shootout.py --threepass`: Single-angle test of the three-pass chain
  - NBP skin texture prompting: anti-airbrushing prompts produce visible pores, freckles, natural imperfections
  - Full engine specs: `docs/candidate_generation_engines.md`
- **Legacy hybrid modes** (`--hybrid parallel`, `--hybrid twopass`) still available but not recommended — Gemini Flash can't hold angles reliably.
- **Keystone-conditioned generation (Gemini):** 2-3 most relevant keystones sent per candidate (Gemini max 3 input images). Smart angle-based selection: always front keystone + closest-angle match + one more for 3D depth.
- **Qwen angle coverage:** Numeric params (horizontal_angle, vertical_angle, zoom) — accurate geometry including low/high/back views that Gemini cannot produce.
- Diversity via Cartesian grid (Gemini): angles × locations (from breakdown.json) × expressions × lighting
- Locations cross-referenced from breakdown.json — character's actual in-world environments
- **Dataset requirements:** 3-5 wardrobe variations, 5-10 distinct backgrounds (no all-white), 5-7 expressions
- Natural language captions (model-aware: Z-Image 20-50 words, Flux 2 30-80 words)
- Picker UI: 4-column grid, keyboard nav, filter/sort, count tracker (15-25 optimal), **diversity analysis panel** (angles, locations, expressions, zones — floating bottom-right)
- Output: `[project]/visual/lora_candidates/[CHARACTER]/` with manifest.json
- Best practices: `docs/lora_training_best_practices.md`

**What works (tested with Jinx):**
- 15-25 images per character, diverse contextual backgrounds (not white studio)
- One LoRA per character (not per wardrobe). Wardrobe controlled by prompt + refs
- fal.ai Z-Image Turbo Trainer: ~$1.70/character, 2000 steps — 10x cheaper than Flux 2
- fal.ai Flux 2 Trainer: ~$8/character, 1000 steps, ~30 min
- Identity lock confirmed: same face every generation across varied prompts and angles

**Video LoRA (dual adapter):**
- fal.ai WAN 2.2 Video LoRA trainer: ~$25/character, 34 min
- Produces two adapters: high_noise + low_noise (both required for inference)
- Prevents character drift during video generation

**Training pipeline tools:**
- `tools/batch_generate_refs.py --lora-prep N` — diverse candidate generation
- `editors/_standalone/lora_picker.html` — browser-based candidate curation
- `tools/train_lora.py` — prepare, submit, poll, auto-register

### Stage 2: Keyframe Generation — CLOUD-FIRST

**Goal:** Generate keyframes for every storyboard shot with locked character consistency.

**fal.ai endpoints:**
- T2I: `fal-ai/flux-2/lora` (Flux 2 Dev + LoRA, FP16, 28 steps, $0.02/image, 9s)
- Video: `fal-ai/wan/v2.2-a14b/image-to-video/lora` (WAN 2.2 FLF + LoRA, 40 steps, 138s)

**By generation approach (from storyboard v3 schema):**

| Approach | Keyframes | Process |
|----------|-----------|---------|
| `triptych_split_flf` | 3 (first/hero/last) | 1536x912 strip → auto-split → Gemini upscale → Split FLF |
| `standard_flf` | 2 (first/last) | Individual frames → Gemini upscale → Standard FLF |
| `held_frame_push` | 1 (first only) | Single frame → Ken Burns push (no WAN) |
| `held_frame_static` | 1 (first only) | Single frame → static hold (no motion) |

**Per-shot process:**
1. Load storyboard JSON → read generation_approach + prompts per shot
2. Generate keyframe(s) via fal.ai with character LoRA
3. Auto-split triptych strips at 1/3 and 2/3 marks
4. Gemini NBP upscale all split panels (488x892 → 768x1344)
5. Save with asset naming convention: `{PRJ}_EP{NNN}_S{NN}_T{NN}_{CHAR}_{suffix}.{ext}`

**Automation tool:** `tools/generate_storyboard_keyframes.py`

**Previz tool:** `tools/generate_previz.py` — cheap visual review before committing to full production. Hero frame per shot at 512x896/8 steps via z-image turbo (~$0.004/shot). Outputs `previz/` subdirectory with PNG frames, `previz_manifest.json`, and `previz_review.html` contact sheet.

### Stage 3: Dailies Review

**Goal:** Catch bad shots before expensive video rendering.

**Tool:** Production Console → Dailies tab (`editors/_standalone/dailies_editor.html`, via Editor Hub on port 8420)

**Workflow:**
1. Keyframes generated → saved to `storyboards/assets/ep_NNN/`
2. Open Dailies Editor → loads storyboard JSON + scans asset directory
3. Review each shot: timeline panel, shot card detail, asset gallery with thumbnails
4. Status per shot: `pending` → `generated` → `needs_review` → `approved` / `retake`
5. Keyboard shortcuts: Arrow navigate, A=approve, R=retake, N=needs_review
6. Retake only failed shots → regenerate → re-review

**Automated QC:** `visual_gate.py` (Gate 1: artifact detection, Gate 2: semantic alignment via Gemini 2.5 Flash)

**This is the cheapest review point.** Keyframe generation is ~$0.02/image, 9s. Video rendering is ~$0.04-0.08/clip, 138s. Catch problems here.

### Stage 4: Video Generation (Cloud via fal.ai)

**Goal:** Convert approved keyframe pairs into video clips.

**By generation approach:**

| Approach | FLF Type | Segments | Endpoint |
|----------|----------|----------|----------|
| `triptych_split_flf` | Split FLF | 2 (first→hero + hero→last) | `fal-ai/wan/v2.2-a14b/image-to-video/lora` |
| `standard_flf` | Standard FLF | 1 (first→last) | `fal-ai/wan/v2.2-a14b/image-to-video/lora` |
| `held_frame_push` | Ken Burns | 0 (no WAN) | CSS/ffmpeg push effect |
| `held_frame_static` | None | 0 | Static hold |

**Split FLF (for triptych shots):**
1. First→Hero (segment 1): 49 frames, 40 steps, 720p, ~39s
2. Hero→Last (segment 2): 49 frames, 40 steps, 720p, ~39s
3. Concatenate segments → full video with 3 keyframe anchors
4. Middle keyframe prevents decoherence at complex action moments

**Standard FLF (for standard motion shots):**
1. First→Last: 81 frames, 40 steps, 720p, ~138s
2. Single segment, direct interpolation

**WAN 2.2 FLF test results (2026-02-06):**
- Best result: WAN 2.2 HQ (40 steps), 138s, best coherence
- LoRA support confirmed for triptych+standard approaches
- Video LoRA (Jinx) trained but NOT YET TESTED with FLF

**Local ComfyUI fallback** (for uncensored content):
```
WanVideoModelLoader (wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors)
LoadWanVideoT5TextEncoder (umt5_xxl_fp8_e4m3fn_scaled.safetensors)
WanVideoVAELoader (wan2.2_vae.safetensors)
CLIPVisionLoader (clip_vision_h.safetensors)
LoadImage (approved_first_frame.png)
WanVideoTextEncode (motion prompt, negative prompt)
WanVideoClipVisionEncode + WanVideoImageToVideoEncode
WanVideoSampler (steps=30, cfg=6.0, shift=5.0)
WanVideoDecode (enable_vae_tiling=True)
VHS_VideoCombine (frame_rate=16, h264-mp4)
```
**NOTE:** Local FLF blocked by VAE mismatch (needs wan_2.1_vae.safetensors). Use cloud for now.

### Stage 4b: Extended Shots (> 5 seconds) — TO TEST

For shots longer than WAN's native ~5-second limit, two techniques to evaluate:

**SVI (Smooth Video Interpolation) — intLeon workflow:**
- Uses `WanImageToVideoSVIPro` node (Kijai) + `ImageBatchExtendWithOverlap`
- Feeds latents (not just last frame) from previous segment into next generation
- Eliminates hard cuts, maintains motion continuity
- 3-phase sampling per segment: High noise (3 steps, LightX2V@1.5, cfg 3) → Low noise (3 steps) → Low noise (3 steps)
- Reported: 19 seconds of seamless video in 7 minutes on 4090
- Character consistency holds for ~30-45 seconds, then degrades
- CivitAI workflow: https://civitai.com/models/1866565
- Source: r/StableDiffusion — "Continuous video with wan finally works!"

**PainterLongVideo node:**
- GitHub: https://github.com/princepainter/ComfyUI-PainterLongVideo
- Chains WAN segments using dual-reference latent injection
- motion_amplitude parameter (1.15 default) to scale movement
- Maintains global consistency via initial reference image
- Simpler than SVI workflow but possibly less seamless

**Key models for speed (both approaches):**
- LightX2V LoRA — reduces WAN from 30 steps to 6-9 steps (3 per phase)
- High noise + Low noise model variants used in sequence

### Stage 4c: WAN 2.2 Animate — RESEARCHED (2026-02-08)

**Goal:** Character swap / motion transfer for dialogue scenes and complex motion.

**Two modes:**
- **Move:** Animate character with new background (motion reference → character on new scene)
- **Replace:** Swap character into existing video while preserving scene, camera, lighting

**Dialogue workflow (hybrid capture approach):**
1. Film actor performing dialogue scene
2. WAN Animate Replace mode swaps actor with AI character
3. Lip sync handled natively by Animate model
4. Character identity from Z-Image+LoRA hero frame used as reference

**API availability:**

| Provider | Endpoint | Cost | Notes |
|----------|----------|------|-------|
| fal.ai | `fal-ai/wan/v2.2-14b/animate/replace` | ~$0.04-0.08/s | Proven stack |
| Replicate | Various | ~$0.02-0.05/s | Cheaper |
| ComfyUI (local) | Native node | GPU time only | Needs Animate-specific 14B model (~14GB FP8) |

**Key limitation:** NO LoRA support. Character identity comes from the input reference image only. Workflow: generate hero frame with Z-Image+LoRA → feed as reference to Animate.

**RunPod:** Fits on A100 80GB (60GB peak VRAM), tight but doable.

**Cost for series:** ~$0.40/5s clip (fal.ai), ~$45-72 for all dialogue shots across 60 episodes.

### Stage 5: Post-Production

**Future (not yet implemented):**
- Frame interpolation (RIFE) for smoother motion
- Upscaling (RealESRGAN) for final resolution
- WAN 2.2 Animate Replace for dialogue scenes (lip sync from captured performance)
- Audio/music integration
- Final color grading

### RunPod as Render Farm

**Current:** Manual pod start/stop via web UI.

**Near-term automation:**
- `runpodctl` CLI for programmatic pod lifecycle
- Script: start pod → upload storyboard batch → render → download output → stop pod
- Pay only for GPU time during actual generation

**Future scaling:**
- RunPod Serverless: deploy ComfyUI as serverless endpoint
- Auto-scale: multiple GPUs for parallel shot rendering
- API-driven: Mac sends storyboard JSON, receives rendered video
- Cost model: pay per request, no idle GPU time

---

## PRODUCTION CONSOLE (2026-02-09)

**URL:** `http://127.0.0.1:8420/production_console.html`

Unified tabbed single-page app replacing the standalone HTML editors.

### Tabs

| Tab | Content | Source |
|-----|---------|--------|
| **Grammar** | Corpus grammar targets, Two-Peak timeline, shot type bar chart, deviation warnings | Visual Grammar Bible corpus data |
| **Breakdown** | Characters, locations, props, SFX/VFX | `breakdown_editor.html` (iframe) |
| **Storyboard** | Shot cards, creative direction, gate results | `storyboard_editor.html` / `storyboard_review.html` (iframe) |
| **Shotlist** | Script-to-shot mapping with beat markers | `shotlist_editor.html` (iframe) |
| **Revision** | Script doctor annotations, revision workflow | `revision_editor.html` (iframe) |
| **Dailies** | Generated keyframes, video clips, approve/retake | `dailies_editor.html` (iframe) |

### Architecture

- **Shell:** `production_console.html` — tab navigation, project selector, episode sidebar, status bar
- **Modules:** `modules/app.js` (shared state + event bus), `modules/api.js` (fetch wrappers), per-tab JS modules
- **Styles:** `styles/console.css` — CSS custom properties dark theme
- **API endpoints (serve.py):** `/api/project/<name>/corpus-summary`, `/score/<episode>`, `/pipeline-status`
- **Backward compat:** All standalone editors still work at original URLs

### Visual Grammar Bible Research (Phase 1+2 COMPLETE)

- **Corpus:** 33 scenes, 1,955 shots — 14 microdrama (4 series), 13 anime, 5 cinema
- **FINDINGS.md:** `_research/visual_grammar_bible/FINDINGS.md` — shareable research paper
- **Two-Peak:** 93% prevalence. Spike @44.9s, Button @106.9s
- **10 Production Rules** derived from empirical corpus data
- **Grammar tab** visualizes corpus grammar targets and compares against storyboard shot distributions

---

## STORYBOARD SYSTEM (AI Cinematographer + Review)

**Status:** Built — agent spec + editor HTML + ComfyUI sketch integration + annotation workflow. Needs integration testing with real storyboard JSON.

### The Three-Layer Model

| Layer | Who | What | Output |
|-------|-----|------|--------|
| **Script** | Writer | Story, dialogue, action | `ep_XXX.md` |
| **Storyboard** | AI Cinematographer + Director review | Creative visual decisions + rough viz | `storyboards/storyboard_ep_NNN.json` |
| **Shotlist** | Machine (from approved storyboard) | Mechanical execution specs (prompts, dims, seeds) | `storyboards/shotlist_ep_NNN.json` |

### Pipeline Position (Visual Design comes FIRST)

```
Script → Breakdown → Visual Design → STORYBOARD → Shotlist → Keyframe Gen → Video
                        |                |              |
                   Character refs    AI Cinemato-   Mechanical
                   HEX palettes     grapher +      execution
                   Lens package     Director       specs
                   Location refs    review +
                                    rough sketches
```

**Visual Design feeds INTO the storyboard.** By the time the AI Cinematographer runs, character reference images, palettes, and lens packages are available. The storyboard editor can display character thumbnails alongside each shot.

### What the Storyboard Does

The AI Cinematographer agent makes a **fully autonomous first pass** over the script, producing:

1. **Rough visualizations** — sketch-quality images via ComfyUI (with LoRA/refs if available, text-only if not)
2. **Shot-by-shot visual direction** — placed next to the script text:
   - Camera placement and movement (why HERE, not there?)
   - Lens choice and its emotional function
   - Lighting design (practicals, motivated light, atmosphere)
   - Blocking (where characters are, how they move)
   - Environment/atmosphere notes
   - Whether to hold or cut (and why)
   - Extended take decisions (SVI technique for one-take sequences)
3. **Rhythm decisions** — pacing of cuts, holds, transitions across the episode
4. **Character reference images** — pulled from visual_bible.md for each shot

### Feedback Loops

**Storyboard Editor — review + re-sketch:**
1. Director reviews prepopulated storyboard (every shot has direction + rough viz)
2. ACCEPT / MODIFY / REJECT per shot
3. "Sketch" button per shot fires ComfyUI for rough viz regeneration
4. "Re-sketch Modified" button batch-regenerates all modified shots
5. Export annotations JSON → `/storyboard --revise` incorporates changes
6. Export to shotlist JSON → opens in Shotlist Editor

**Shotlist Editor — tweak + generate:**
1. Each shot has full generation specs (prompts, dimensions, refs, seeds)
2. "Generate Frame" button per shot fires ComfyUI at full resolution
3. Tweak prompt, regenerate, compare — tight iteration loop
4. Approve individual frames → batch video render on RunPod

**Target: ~80% automated accuracy.** The autonomous storyboard pass should get most shots right. The director reviews and fixes the 20% that need creative adjustment. Then the shotlist mechanical pass should produce good frames on first try for most shots.

### Conditioning Channel Separation

| Channel | Controls | Bleed risk |
|---------|----------|------------|
| **LoRA** | Character identity (face, body) | None — baked into weights |
| **Character refs** (close-cropped) | Wardrobe, angle, expression | Minimal — no background |
| **Location refs** | Environment, set design | Contained — dedicated images |
| **Text prompt** | Action, mood, camera, lighting | None — text only |
| **ControlNet** (future) | Exact pose from skeleton | None — geometric only |

### Implementation Status

- [x] Renamed `editors/storyboard_editor.html` → `shotlist_editor.html`
- [x] Created new `editors/storyboard_editor.html` — storyboard review interface with ComfyUI sketch integration
- [x] Created `agents/storyboard_agent.md` (AI Cinematographer) + `shotlist_agent.md` (mechanical JSON)
- [x] Storyboard schema v3 — `generation_approach`, `hero_frame`, `triptych_prompt`, `asset_name`, `characters_in_shot`
- [x] EP001 storyboard upgraded — all 21 shots classified, 6 E-style hero prompts, 6 triptych strip prompts
- [x] Asset naming convention: `{PRJ}_EP{NNN}_S{NN}_T{NN}_{CHAR}[_{suffix}].{ext}`
- [x] `generate_storyboard_keyframes.py` — triptych + FLF + held frame pipeline
- [x] `generate_previz.py` — GATE 2 previz runner (hero frame per shot, HTML contact sheet, manifest)
- [x] `dailies_editor.html` — asset review with timeline, shot cards, thumbnails, lightbox, keyboard shortcuts
- [x] `storyboard_review.html` — gate-aware frame review with auto-pass/reject routing
- [x] `visual_gate.py` — Gate 1 (artifacts) + Gate 2 (semantic alignment via Gemini)
- [x] Rough frames generated for EP1 — 21/21 shots via Gemini
- [x] Updated all codebase references

---

## ARCHITECTURE DECISIONS

### Decided
- **T2I model:** Flux 2 Dev via fal.ai (FP16 + LoRA, $0.02/image, 9s)
- **Local T2I:** Qwen Image for production keyframes, Z-Image Turbo for sketches
- **Video model:** WAN 2.2 I2V via fal.ai (FLF + LoRA, 40 steps, 138s)
- **Character consistency:** LoRA (T2I + Video) — one per character, identity lock confirmed
- **LoRA training:** fal.ai (T2I: ~$8/char, Video: ~$25/char)
- **Render infrastructure:** Cloud-first (fal.ai). RunPod A100 as uncensored fallback only.
- **Storyboard schema:** v3 with generation_approach, triptych prompts, asset naming
- **Innovations:** Triptych strips, split FLF, E-style prompting, decisive moment generation — all tested and working

### Pending
- **Aspect ratio:** 9:16 vertical (phone-native) vs 16:9 widescreen (cinematic). May need both.
- **WAN version:** 2.2 in production. Monitor WAN 2.6 for quality + native audio upgrade.
- **Lip sync tool:** LivePortrait vs MuseTalk vs newer options. Evaluate during Stage 5.
- **Video LoRA testing:** Jinx WAN 2.2 video LoRA trained but not yet tested with FLF.

---

## IMMEDIATE NEXT STEPS

### Completed
1. [x] Flux 2 Dev migration (models, workflows, testing)
2. [x] T2I model comparison (Qwen Image wins, Flux 2 retired from local T2I)
3. [x] Jinx T2I LoRA trained + tested (identity lock confirmed)
4. [x] Jinx WAN 2.2 video LoRA trained (high+low noise adapters)
5. [x] Cloud vs local testing (fal.ai wins for production)
6. [x] Innovation testing (triptych, grid, E-style, split FLF — all working)
7. [x] Storyboard v3 schema + EP001 upgrade (21 shots classified)
8. [x] Asset naming convention + automation tool
9. [x] Dailies editor built
10. [x] Rough frames generated for EP1 (21/21 via Gemini)
11. [x] Habitat zones defined (5 zones, 72 locations mapped)

### Next Up
1. [ ] **Re-submit Kian T2I LoRA training** — request ID lost, `submit_kian_training.py` ready
2. [ ] **Test Jinx WAN 2.2 video LoRA + FLF** — balance now topped off, ready
3. [ ] **Run EP1 previz** — `generate_previz.py leviathan/ --episode 1` (review before full production)
4. [ ] **Generate EP1 production keyframes** — use storyboard v3 + generate_storyboard_keyframes.py
5. [ ] **Test triptych + split FLF end-to-end** — strip → split → upscale → split FLF → concat
6. [ ] **Download wan_2.1_vae.safetensors** — fix local ComfyUI FLF (VAE channel mismatch)
7. [ ] **Decide aspect ratio** — 9:16 vs 16:9
8. [ ] **Train Kian video LoRA** (after T2I LoRA completes)

---

## KEY FILES

| File | Location | Purpose |
|------|----------|---------|
| LoRA training tool | `tools/train_lora.py` | Generalized: prepare, submit, poll, auto-register |
| LoRA registry | `leviathan/visual/lora_registry.json` | Per-project character LoRA URLs + training metadata |
| Asset naming | `tools/asset_naming.py` | Naming convention utility + regex |
| Storyboard keyframe gen | `tools/generate_storyboard_keyframes.py` | Triptych + FLF + held frame pipeline |
| Previz generator | `tools/generate_previz.py` | Hero frame per shot, HTML review, manifest |
| Gemini upscaler | `tools/upscale_gemini.py` | NBP upscale for split panels |
| Visual gate | `tools/visual_gate.py` | Two-gate automated QC |
| Rough frame gen | `tools/generate_rough_frames.py` | Gemini sketch generation |
| Storyboard schema | `templates/storyboard_schema.json` | v3 with generation_approach fields |
| Innovations doc | `INNOVATIONS.md` | Pipeline IP documentation |
| Dailies editor | `editors/_standalone/dailies_editor.html` | Asset review + approve/retake |
| Jinx hero ref | `leviathan/visual/refs/characters/heroes/JINX.png` | Character hero image |
| Jinx Gemini refs (24) | `leviathan/visual/refs/characters/JINX/` | Multi-angle refs (5 costumes × 5 angles) |
| Jinx T2I LoRA | fal.ai URL (see registry) | Flux 2 character identity lock |
| Jinx Video LoRA | fal.ai URLs (see registry) | WAN 2.2 dual adapter (high+low noise) |
| Kian training scripts | `~/Desktop/kian_lora_training/` | Submit + check scripts, dataset zip |

## RUNPOD CONNECTION

```bash
# NOTE: IP and port change with each new pod. Update after creating new pod.
# Last known (may be stale):
ssh -T root@38.140.51.195 -p 18629 -i ~/.ssh/id_ed25519
```

## MODELS ON RUNPOD (Current — Pod STOPPED)

| Model | Size | Path | Role |
|-------|------|------|------|
| qwen_image_fp8_e4m3fn.safetensors | ~18.6GB | models/ | PRIMARY T2I |
| qwen_2.5_vl_7b_fp8_scaled.safetensors | ~6.5GB | models/text_encoders/ | Qwen text encoder |
| qwen_image_vae.safetensors | ~243MB | models/vae/ | Qwen VAE |
| z_image_turbo_bf16.safetensors | 12GB | models/ | Fast sketch T2I |
| qwen_3_4b.safetensors | 7.5GB | models/text_encoders/ | Z-Image encoder |
| ae.safetensors | 320MB | models/vae/ | Z-Image VAE |
| flux2_dev_fp8mixed.safetensors | 34GB | models/diffusion_models/ | LoRA testing |
| mistral_3_small_flux2_fp8.safetensors | 17GB | models/text_encoders/ | Flux 2 encoder |
| flux2-vae.safetensors | 321MB | models/vae/ | Flux 2 VAE |
| wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors | 14GB | models/unet/ | Video (high noise) |
| wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors | 14GB | models/unet/ | Video (low noise) |
| umt5_xxl_fp8_e4m3fn_scaled.safetensors | 6.3GB | models/text_encoders/ | WAN text encoder |
| wan2.2_vae.safetensors | 1.4GB | models/vae/ | WAN VAE (48ch bug) |
| clip_vision_h.safetensors | 1.2GB | models/clip_vision/ | WAN CLIP vision |

**Pod ID:** erydfm210jz2lv | **GPU:** A100 80GB ($1.19/hr) | **Status:** STOPPED

**Custom Nodes:**
- WanVideoWrapper (built-in), VideoHelperSuite, KJNodes, ComfyUI-Manager, ComfyUI-Wan22FMLF

**Still Needed:**
- `wan_2.1_vae.safetensors` (~1.4GB) — fixes VAE channel mismatch for local FLF
- LightX2V LoRA (high+low noise variants, ~1GB each) — 5-step FLF acceleration

## LESSONS LEARNED

### Infrastructure
- RunPod proxy SSH blocks automation — always use direct TCP SSH
- RunPod port changes on restart — use API to get current port before SSH
- Network volume is essential — without it, ~50GB of models must redownload every session
- Pasting long commands into SSH breaks them — use SCP to upload scripts
- ComfyUI must restart after installing custom nodes
- `pkill -f 'python.*main.py'` can kill SSH session — use `kill PID` instead
- First generation after model load is slow (loading into VRAM) — subsequent are fast
- serve.py stale processes: must `pkill -f serve.py` and wait before restarting

### Models & Quality
- Flux 2 plastic skin is architectural (distilled model) — cannot be fixed with prompts. Retired from T2I.
- Qwen Image CLIPLoader type is `qwen_image`, steps=50, cfg=4.0, euler, simple
- Z-Image Turbo CLIPLoader type is `lumina2`, steps=4, cfg=1.0, res_multistep, simple
- "Triptych" prompt language keeps output photographic. "Storyboard/comic" triggers illustration mode.
- One LoRA per character (not per wardrobe). Wardrobe controlled by prompt + refs.
- Image-trained LoRAs cause flicker in video — video LoRAs (I2V-trained) required for WAN 2.2.

### Pipeline
- fal.ai training costs vary by model: Z-Image ~$0.85/1K steps, Flux 2 ~$3/1K steps, WAN ~$2/1K steps. Cloud beats local for most workloads.
- Cloud (fal.ai FP16) beats local (RunPod FP8) on both speed and quality for T2I.
- Gemini image model is `gemini-2.5-flash-image`, NOT `gemini-2.0-flash-exp` (404 error).
- Python glob is case-sensitive on this macOS setup. Must try multiple case variants.
- Python stdout buffers in background tasks. Use `PYTHONUNBUFFERED=1` or `-u` flag.
- Images too large for Claude context — resize: `sips -s format jpeg -s formatOptions 60 -Z 512 input.png --out output.jpg`