# BUILD SPEC: Visual Validation System

## Overview
Three CriticLoop subclasses that use Gemini Flash to validate reference images, start frames, and generated video before they're used in the pipeline. Catches extra limbs, wrong character details, background contamination, and missing props before they poison Kling generations.

## Architecture
All three critics inherit from the existing `lib/critic.py:CriticLoop`. They share a common Gemini Flash vision call utility. Each returns structured `CriticResult` with `Dimension` objects.

---

## Phase 1: Shared Vision Check Utility

### File: `lib/vision_check.py` (NEW)

Shared utility that sends an image to Gemini Flash with structured validation questions and returns parsed results. Also handles video frame extraction + per-frame validation.

**Key functions:**
- `validate_image(image_path, checks, context_description) -> dict` — sends image + structured questions to Gemini 2.5 Flash, returns pass/fail per check
- `validate_video_frames(video_path, checks, context_description, num_frames) -> dict` — extracts N frames via ffmpeg, validates each

**Check format:** Each check is `{name, question, expected, severity}`. The question is asked of Gemini, the answer is compared to `expected` via case-insensitive substring match.

**Error handling:** On API failure, return `passed=True` with error message — never block the pipeline on a validation error.

**Cost:** ~$0.01 per image check via Gemini 2.5 Flash.

### Validation: `python3 -c "from lib.vision_check import validate_image, validate_video_frames; print('OK')"`

---

## Phase 2: Reference Image Critic

### File: `lib/critics/ref_image_critic.py` (NEW)

Subclass of `CriticLoop`. Validates reference images before use as Kling elements.

**Dimensions (configurable per character type):**
- `LIMB_COUNT` (HARD) — correct number of limbs for character type. Configurable via `legs_on_ground` and `total_limbs` params.
- `EXTRA_APPENDAGES` (HARD) — no phantom/duplicate limbs
- `PROP_HELD` (HARD, if applicable) — character is holding expected prop. Configurable via `expected_props` list.
- `BACKGROUND_CLEAN` (SOFT) — background is solid neutral color, no scene contamination

**Constructor params:** `character_type` ("human"/"quadruped"/"vehicle"), `expected_props`, `legs_on_ground`, `total_limbs`

**max_attempts:** 1 (no auto-fix — ref images need regeneration, not patching)

### Validation: `python3 -c "from lib.critics.ref_image_critic import RefImageCritic; print('OK')"`

---

## Phase 3: Start Frame Critic

### File: `lib/critics/start_frame_critic.py` (NEW)

Subclass of `CriticLoop`. Validates start frames before Kling generation.

**Dimensions:**
- `BACKGROUND_VALID` (HARD) — no white/blank backgrounds when scene is expected (configurable: `expected_background="scene"` or `"solid color"`)
- `CHARACTER_IDENTITY` (HARD) — character matches expected description. Configurable via `character_descriptions` list of `{name, hair, facial_hair, clothing}` dicts.
- `SCENE_ELEMENTS` (SOFT) — expected elements present (car, road, power pole, etc.)
- `COMPOSITION_VALID` (HARD) — image is a proper composition, not blank/corrupted

**max_attempts:** 1 (report only)

### Validation: `python3 -c "from lib.critics.start_frame_critic import StartFrameCritic; print('OK')"`

---

## Phase 4: Video Frame Critic

### File: `lib/critics/video_frame_critic.py` (NEW)

Subclass of `CriticLoop`. Validates generated video by extracting and checking N frames.

**Dimensions (aggregated across all sampled frames):**
- `EXTRA_LIMBS` (HARD) — no characters with phantom/duplicate limbs in any frame
- `STYLE_CONSISTENT` (SOFT) — visual style matches expected style across frames
- `ELEMENT_PERSISTENCE` (SOFT, per element) — expected elements (car, deer, etc.) persist across frames

**Constructor params:** `character_type`, `expected_style`, `expected_elements`, `num_frames` (default 5)

**Aggregation:** A hard check fails if ANY sampled frame fails it. Report includes which timestamps failed.

**max_attempts:** 1 (video can't be auto-fixed, only flagged for regeneration)

Uses `validate_video_frames()` from vision_check.py which handles ffmpeg extraction.

### Validation: `python3 -c "from lib.critics.video_frame_critic import VideoFrameCritic; print('OK')"`

---

## Phase 5: Tests

### File: `tests/lib/test_vision_critics.py` (NEW)

Test all three critics with mocked `validate_image` / `validate_video_frames` calls:
- `TestRefImageCritic`: init params, passing evaluation, extra limbs detection, prop check
- `TestStartFrameCritic`: init defaults, white background failure, beard detection via character_descriptions
- `TestVideoFrameCritic`: init, extra limbs in video frames with timestamp reporting

All tests mock the Gemini API calls — no real API calls in tests.

### Validation: `cd /Users/joeturnerlin/Dropbox/CLAUDE_PROJECTS/starsend && python3 -m pytest tests/lib/test_vision_critics.py -v`

---

## Phase 6: Simplify

Run `/simplify` on all new files to check for code reuse opportunities, quality issues, and efficiency improvements.

### Files to review:
- `lib/vision_check.py`
- `lib/critics/ref_image_critic.py`
- `lib/critics/start_frame_critic.py`
- `lib/critics/video_frame_critic.py`
- `tests/lib/test_vision_critics.py`
