# BUILD_SPEC — Mask-Free Edit Extraction

**Generated:** 2026-03-14
**Input:** `consultations/starsend/first-last-frame-extraction/SYNTHESIS.md`
**Detail level:** max
**Visual design:** no
**Phases:** 2
**Estimated build time:** 45-60 min

## Validation command
```bash
python3 -c "import ast; ast.parse(open('lib/frame_editor.py').read())" && \
python3 -c "import ast; ast.parse(open('editors/review_server.py').read())" && \
python3 -c "from lib.frame_editor import edit_hero_pose, validate_companion, build_edit_instruction; print('OK')" && \
python3 -m pytest tests/test_previz_context.py tests/test_prompt_config.py -q 2>&1 | tail -3
```

---

## Phase 1: Core Edit Function + Quality Gate

### Files to create
- `lib/frame_editor.py` — mask-free edit via `generate_content` + histogram quality gate

### Exact implementation

**Create `lib/frame_editor.py`:**

```python
"""
frame_editor.py — Mask-free image editing via Gemini generate_content.

Replaces the generate-from-scratch extraction pipeline. Instead of building
refs + prompt and generating a new image, we pass the hero frame + an edit
instruction to Gemini. The model modifies the existing image rather than
generating from scratch, preserving background pixels.

Cost: same as NBP generation (~$0.134/frame for Pro, ~$0.039 for Flash).
"""

import io
import logging
import os
from pathlib import Path
from typing import Optional

import numpy as np
from PIL import Image

logger = logging.getLogger(__name__)

# Cost estimate for a single edit call (same as NBP generation)
EDIT_COST = 0.134


def edit_hero_pose(
    hero_image_path: Path,
    edit_instruction: str,
    model: str | None = None,
) -> dict:
    """Edit a hero frame to change the character's pose via Gemini mask-free edit.

    Uses the same generate_content endpoint as NBP generation, but with a
    different payload structure: [image, edit_instruction] instead of
    [ref_images..., scene_prompt]. This triggers image-to-image translation
    rather than from-scratch generation.

    Args:
        hero_image_path: Path to the hero/anchor frame to edit.
        edit_instruction: Natural language description of the pose change.
        model: Override model ID. Defaults to gemini-3-pro-image-preview.

    Returns:
        {"success": True, "image_data": bytes, "cost": float}
        or {"success": False, "error": str}
    """
    try:
        from google import genai
        from google.genai import types as genai_types

        api_key = os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY")
        if not api_key:
            return {"success": False, "error": "GEMINI_API_KEY not set"}

        if not hero_image_path.is_file():
            return {"success": False, "error": f"Hero image not found: {hero_image_path}"}

        client = genai.Client(api_key=api_key)

        # Load hero image as PIL Image
        hero_image = Image.open(hero_image_path)

        # Edit payload: [image, instruction] triggers edit mode
        config = genai_types.GenerateContentConfig(
            temperature=0.3,
            responseModalities=["IMAGE", "TEXT"],
            imageConfig=genai_types.ImageConfig(
                aspectRatio="9:16",
            ),
        )

        from lib.model_profiles import get_model
        edit_model = model or get_model("production", "image")

        response = client.models.generate_content(
            model=edit_model,
            contents=[hero_image, edit_instruction],
            config=config,
        )

        # Extract image from response
        image_data = None
        if response and response.candidates:
            for candidate in response.candidates:
                if candidate.content and candidate.content.parts:
                    for part in candidate.content.parts:
                        if hasattr(part, "inline_data") and part.inline_data:
                            image_data = part.inline_data.data

        if image_data:
            return {"success": True, "image_data": image_data, "cost": EDIT_COST}

        return {"success": False, "error": "No image in edit response"}

    except Exception as e:
        return {"success": False, "error": str(e)}


def validate_companion(
    hero_bytes: bytes,
    companion_bytes: bytes,
    threshold: float = 0.70,
) -> dict:
    """Validate that a companion frame preserved the hero's background.

    Compares color histograms between hero and companion. High correlation
    means similar color distribution = similar background/lighting.

    Args:
        hero_bytes: Raw bytes of the hero/anchor frame.
        companion_bytes: Raw bytes of the generated companion frame.
        threshold: Minimum correlation score to pass (0.0-1.0).

    Returns:
        {"passed": bool, "correlation": float}
    """
    try:
        hero_img = Image.open(io.BytesIO(hero_bytes)).convert("RGB")
        comp_img = Image.open(io.BytesIO(companion_bytes)).convert("RGB")

        hero_hist = np.array(hero_img.histogram(), dtype=np.float64)
        comp_hist = np.array(comp_img.histogram(), dtype=np.float64)

        hero_hist = hero_hist / (hero_hist.sum() + 1e-10)
        comp_hist = comp_hist / (comp_hist.sum() + 1e-10)

        hero_mean = hero_hist.mean()
        comp_mean = comp_hist.mean()
        numerator = np.sum((hero_hist - hero_mean) * (comp_hist - comp_mean))
        denominator = np.sqrt(
            np.sum((hero_hist - hero_mean) ** 2) * np.sum((comp_hist - comp_mean) ** 2)
        )
        correlation = float(numerator / (denominator + 1e-10))

        return {
            "passed": correlation >= threshold,
            "correlation": round(correlation, 4),
        }

    except Exception as e:
        logger.warning("Companion validation failed: %s", e)
        return {"passed": False, "correlation": 0.0}


def build_edit_instruction(
    target_type: str,
    pose_description: str,
) -> str:
    """Build the edit instruction string for Gemini mask-free editing.

    Args:
        target_type: "anticipation" or "aftermath"
        pose_description: Description of the target pose from Flash text call.

    Returns:
        The full edit instruction string.
    """
    if target_type == "anticipation":
        return (
            f"Edit this image: change the character's body position and pose to show "
            f"the moment BEFORE the current action — {pose_description}. "
            f"Keep the background, lighting, environment, camera angle, walls, floor, "
            f"and all non-character elements completely identical. "
            f"Keep the character's clothing, hair style, and appearance the same. "
            f"Only change the character's body position, pose, and expression."
        )
    else:
        return (
            f"Edit this image: change the character's body position and pose to show "
            f"the moment AFTER the current action — {pose_description}. "
            f"Keep the background, lighting, environment, camera angle, walls, floor, "
            f"and all non-character elements completely identical. "
            f"Keep the character's clothing, hair style, and appearance the same. "
            f"Only change the character's body position, pose, and expression."
        )
```

### Scope boundary
- Do NOT modify any existing files in this phase
- Do NOT import from this module anywhere yet

### Validation
```bash
python3 -c "import ast; ast.parse(open('lib/frame_editor.py').read())" && \
python3 -c "from lib.frame_editor import edit_hero_pose, validate_companion, build_edit_instruction; print('imports OK')" && \
echo "Phase 1 OK"
```

---

## Phase 2: Rewire Extraction Endpoint + Fallback

### Files to modify
- `editors/review_server.py` — rewrite `_api_extract_frame` to use edit mode

### What already exists (from Phase 1 + prior codebase)
- Phase 1 created `lib/frame_editor.py` with `edit_hero_pose()`, `validate_companion()`, `build_edit_instruction()`, `EDIT_COST`
- `editors/review_server.py` has `_api_extract_frame()` with `_bg_extract_frames()` inner function
- `lib/keyframe_context.py` has `build_extrapolation_prompt()` — we reuse this to get the pose description text
- The current `_bg_extract_frames` calls `build_extrapolation_prompt` → `_generate_nbp_frame` with ref images
- We replace `_generate_nbp_frame` with `edit_hero_pose` and add quality gate + retry + fallback

### Exact changes

**1. In `_api_extract_frame`, replace the import block.** Find:
```python
            from lib.keyframe_context import build_extrapolation_prompt, build_extrapolation_refs
            from tools.generate_previs import _generate_nbp_frame, NBP_COST
            from lib.previz_context import resolve_location_refs, resolve_all_character_refs
```
Replace with:
```python
            from lib.keyframe_context import build_extrapolation_prompt
```

**2. Replace the ref-building block.** Find the block starting with `# Build refs for NBP` (or `# Edit mode: no ref stack needed`) and ending before `# Respond immediately`. Replace with:
```python
        # Edit mode: hero image is edited directly, no ref stack needed
        print(f"  [DEBUG] {shot_id}: using edit mode (mask-free edit of hero)")
```

**3. Remove `_ref_images` closure.** Find and remove:
```python
        _ref_images = ref_images
```

**4. Replace the entire `_bg_extract_frames` function body** with the edit-mode implementation. The new function should:
- Import `edit_hero_pose`, `validate_companion`, `build_edit_instruction`, `EDIT_COST` from `lib.frame_editor`
- For each frame in `_frames_to_gen`:
  - Call `build_extrapolation_prompt()` to get pose description (existing)
  - Call `build_edit_instruction()` to wrap it as an edit instruction
  - Merge `prompt_override` as director note if provided
  - Call `edit_hero_pose()` with the hero image path + edit instruction
  - Validate with `validate_companion()` (hero bytes vs companion bytes)
  - On quality gate fail: retry once with appended "Preserve every background detail exactly."
  - Track `consecutive_failures` counter across frames
  - On 2 consecutive failures: set `gate_results.extraction_mode = "single_frame"` and break
  - On success: save frame, update gate_results, reset consecutive_failures counter
  - Use `EDIT_COST` instead of `NBP_COST` for cost tracking

### Scope boundary
- Do NOT modify `build_extrapolation_prompt` in keyframe_context.py
- Do NOT modify the console UI
- Do NOT modify `_generate_nbp_frame` — still used for keyframe generation
- Do NOT touch StepRunner

### Validation
```bash
python3 -c "import ast; ast.parse(open('editors/review_server.py').read())" && \
python3 -c "import ast; ast.parse(open('lib/frame_editor.py').read())" && \
python3 -c "from lib.frame_editor import edit_hero_pose, validate_companion, build_edit_instruction; print('frame_editor OK')" && \
grep -q 'edit_hero_pose' editors/review_server.py && \
grep -q 'validate_companion' editors/review_server.py && \
grep -q 'build_edit_instruction' editors/review_server.py && \
grep -q 'extraction_mode.*single_frame' editors/review_server.py && \
grep -q 'consecutive_failures' editors/review_server.py && \
echo "Phase 2 OK"
```

---

## Verification Checklist

1. **Import/export consistency:** Phase 2 imports from `lib.frame_editor` (Phase 1) and `lib.keyframe_context` (existing). ✓
2. **File conflict check:** Phase 1 creates `lib/frame_editor.py`. Phase 2 modifies `editors/review_server.py`. No conflicts. ✓
3. **Validation completeness:** Both phases have syntax + structural + import checks. ✓
4. **Scope boundaries:** Both phases have explicit scope boundaries. ✓
5. **Context check:** Phase 2 has "What already exists" section. ✓
6. **Identifier consistency:** Grep targets match spec identifiers. ✓
7. **Deliverable coverage:** mask-free edit (P1), quality gate (P1), rewire endpoint (P2), 2-failure fallback (P2), single-frame marking (P2). All covered. ✓