# Unified Reference Selection System (URSS) — Design

**Date:** 2026-03-02
**Status:** Post-consult — approved architecture, pending implementation plan
**Consultation:** `consultations/urss_architecture/` (3 rounds with Gemini 3.1 Pro)

## Problem Statement

The casting pipeline currently has separate, bespoke flows for character casting (3x3 grid → hero → turnaround), location refs (3-angle chaining), and no flows at all for wardrobe, hair/makeup, or props. These are all mechanically the same creative operation:

1. Start with a reference (mood/vibe, not identity-locked)
2. Generate candidates influenced by that reference
3. Iteratively refine: reject bad ones, keep promising ones, re-roll
4. Lock the winner as the canonical ref

The current implementation conflates "casting" (show me 9 different actors) with "turnaround" (show me this exact actor from 4 angles). These need to be cleanly separated, and the casting pattern needs to generalize to all visual asset types.

## Architecture: Approach A — Generic RefSelector with Type Descriptors

One reusable component handles all asset types. Grid layout, prompt template, and generation strategy configured per asset type via type descriptors.

### Core Pattern (All Asset Types)

```
1. ANCHOR — Intake image or first generation
   ├── If intake image: Vision extraction → mood text (strips identity)
   └── If no intake: text-only from bible description

2. GRID — Candidates generated from mood text
   ├── Anchor displayed in center (UI composites, NOT generated by model)
   ├── Candidates surround anchor
   └── Per-candidate actions: Reject, Keep, Lock

3. RE-ROLL — Regenerate rejected/empty slots
   ├── Anchor + kept candidates persist
   ├── Optional [USER OVERRIDE: direction] appended to prompt
   └── Diegetic framing enforces cross-re-roll consistency

4. HERO LOCK — Selected candidate becomes hero
   ├── Character: NBP Beauty Pass (temp 0.2) before turnaround
   ├── Location/Wardrobe/Props: locked as-is
   └── rembg → white bg for final production refs (characters only)
```

### The Mood vs Identity Solution (Gemini Consultation Finding)

**Problem:** NBP's inline reference feature is trained to preserve identity. Sending a reference image always biases toward the same face.

**Solution:** Vision extraction. Send the anchor image to Gemini Vision to extract a text description of mood, lighting, wardrobe style, and energy — while explicitly stripping facial identity. Feed that text (not the image) to the casting grid generation.

**Vision Extraction Prompt:**
```
You are an expert cinematographer and production designer. Analyze this image and generate a highly detailed, comma-separated prompt for an image generation model.

FOCUS STRICTLY ON:
1. Lighting setup, direction, and color grading
2. Environmental context, background elements, and atmospheric effects
3. Wardrobe style, textures, and fit
4. Camera angle, lens focal length, and framing
5. Overall emotional tone, energy, and posture/body language

CRITICAL DIRECTIVE: DO NOT describe the subject's age, race, gender, facial features, hair color, or specific identity. Replace all references to the person with 'A subject'. Do not use any proper nouns.
```

**Exceptions:** Props and locations CAN pass the anchor directly as inline ref (no uncanny valley risk for objects/environments).

### Type Descriptors

```python
TYPE_DESCRIPTORS = {
    "character": {
        "generation_strategy": "composite_grid",
        "grid_format": "2x3",          # 6 candidates per re-roll
        "aspect_ratio": "2:3",
        "model": "flash",              # $0.039 per re-roll
        "temperature": 0.65,
        "prompt_template": "casting_director",
        "diegetic_frame": "A casting director's audition photo array, neutral 18% gray seamless backdrop, flat even studio lighting",
        "ref_handling": {
            "strategy": "vision_extraction",
            "inline_ref": False
        },
        "beauty_pass": True,
        "beauty_pass_temp": 0.2
    },
    "location": {
        "generation_strategy": "parallel_singles",
        "candidates_per_batch": 4,
        "aspect_ratio": "16:9",
        "model": "flash",              # $0.156 per re-roll (4 calls)
        "temperature": 0.7,
        "prompt_template": "location_scout",
        "diegetic_frame": "A cinematic location scout's wide-angle photograph",
        "ref_handling": {
            "strategy": "direct_pass",
            "inline_ref": True
        },
        "special_behavior": "staggered_dispatch"
    },
    "wardrobe": {
        "generation_strategy": "composite_grid",
        "grid_format": "2x3",          # 6 candidates (full body)
        "aspect_ratio": "2:3",
        "model": "flash",              # $0.039 per re-roll
        "temperature": 0.45,
        "prompt_template": "costume_designer",
        "diegetic_frame": "A costume designer's flat-lay technical photograph",
        "ref_handling": {
            "strategy": "hybrid",
            "inline_ref": "hero_image",
            "text_modifier": "vision_extraction"
        }
    },
    "hair_makeup": {
        "generation_strategy": "composite_grid",
        "grid_format": "2x2",          # 4 candidates (max face detail)
        "aspect_ratio": "1:1",
        "model": "flash",              # $0.039 per re-roll
        "temperature": 0.35,
        "prompt_template": "makeup_continuity",
        "diegetic_frame": "A makeup artist's continuity polaroid contact sheet, harsh flash photography, extreme close-up macro shot of the face",
        "ref_handling": {
            "strategy": "hybrid",
            "inline_ref": "hero_image",
            "text_modifier": "vision_extraction"
        }
    },
    "props": {
        "generation_strategy": "composite_grid",
        "grid_format": "3x3",          # 9 candidates (no uncanny valley)
        "aspect_ratio": "1:1",
        "model": "flash",              # $0.039 per re-roll
        "temperature": 0.6,
        "prompt_template": "prop_master",
        "diegetic_frame": "A prop master's archival inventory grid, shot top-down on a cutting mat",
        "ref_handling": {
            "strategy": "direct_pass",
            "inline_ref": True
        }
    }
}
```

### Cost Model

| Asset Type | Per Re-Roll | Model | Candidates |
|-----------|------------|-------|-----------|
| Character | $0.039 | Flash | 6 |
| Location | $0.156 | Flash (4 calls) | 4 |
| Wardrobe | $0.039 | Flash | 6 |
| Hair/Makeup | $0.039 | Flash | 4 |
| Props | $0.039 | Flash | 9 |
| Beauty Pass | $0.134 | NBP | 1 |
| Turnaround | $0.134 | NBP | 4 angles |

**Full character pipeline (3 re-rolls):** $0.117 → $0.134 (beauty) → $0.134 (turnaround) = **$0.385**

### GridSession State Model

```json
{
    "session_id": "uuid",
    "asset_type": "character",
    "parent_context": {
        "character_id": "TORCH",
        "phase_id": null,
        "location_id": null
    },
    "descriptor": { ... },
    "anchor": {
        "path": "output/refs/characters/torch/hero.jpeg",
        "source": "intake",
        "mood_text": "Moody cinematic lighting, cool shadows, distressed leather jacket..."
    },
    "candidates": [
        {
            "slot": 0,
            "path": "output/refs/characters/torch/candidates/candidate_01.png",
            "state": "kept",
            "re_roll_generation": 1
        },
        {
            "slot": 1,
            "path": null,
            "state": "rejected",
            "re_roll_generation": 2
        }
    ],
    "re_roll_count": 2,
    "user_overrides": ["more angular jawline", "warmer lighting"],
    "collapsed_override": "more angular jawline, warmer lighting",
    "hero_locked": false,
    "hero_path": null,
    "beauty_pass_path": null
}
```

### API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/casting/grid-session` | POST | Create new session (with descriptor + optional anchor) |
| `/casting/grid-session/{id}` | GET | Poll session state |
| `/casting/grid-session/{id}/action` | POST | Candidate action (reject/keep/lock) |
| `/casting/grid-session/{id}/reroll` | POST | Re-roll with optional override text |
| `/casting/grid-session/{id}/lock-hero` | POST | Lock anchor as hero, trigger beauty pass |
| `/casting/grid-session/{id}/beauty-pass` | GET | Poll beauty pass status |

Existing endpoints (`generate-grid`, `select-hero`, `approve-ref`, etc.) remain as backward-compatible aliases.

### UI Component: RefSelector

One component that adapts to the descriptor:
- **Grid layout**: Anchor center, candidates around it. Grid shape varies by type (2x3+1, 2x2+1, 3x3+1, 2x2 landscape).
- **Candidate cards**: Image + action buttons (reject/keep/lock). State-dependent styling.
- **Re-roll bar**: Button + optional text input. Shows generation count and cost.
- **Override history**: Collapsed text showing accumulated user overrides.
- **Loading state**: For locations (staggered dispatch), panels reveal one by one.

### Navigation

```
Casting Tab
├── Gallery (character cards)
│   └── Click character → Asset Type selector
│       ├── Casting (RefSelector: character descriptor)
│       ├── Wardrobe (RefSelector: wardrobe descriptor, per-phase)
│       ├── Hair/Makeup (RefSelector: hair_makeup descriptor, per-phase)
│       └── Props (RefSelector: props descriptor, per-phase)
├── Locations (RefSelector: location descriptor)
├── Turnaround (existing, post-hero-lock)
└── Expressions (existing, unchanged)
```

### Key Constraints from Gemini Consultation

1. **Anchor is NEVER generated by the model** — UI composites it. Model generates candidates-only.
2. **Diegetic framing enforces consistency** — 18% gray backdrop + flat studio lighting means re-rolls match.
3. **[USER OVERRIDE: ...] appended, collapsed** — never stack multiple brackets.
4. **Flash 3.1 strict QPS** — stagger parallel calls with 500ms delay.
5. **3x3 causes latent bleed on faces** — trait homogenization. 2x3 is the sweet spot for characters.
6. **Wardrobe/Hair uses hybrid ref** — hero image for identity + vision-extracted text for style.
7. **NBP beauty pass before turnaround** — prevents Flash micro-artifacts from propagating.

### Pre-Implementation Consultation: NBP Prompting Practices

Before implementing the beauty pass, run a targeted `/consult` with Gemini on NBP-specific prompting for:
- Skin texture, pores, imperfections for human faces (beauty pass)
- Material/fabric detail language for wardrobe finals
- Surface detail prompting for prop close-ups
- What prompt language actually moves NBP vs. what's ignored

Gemini knows its own model internals — let it define the quality prompts rather than guessing.

### Items Requiring Empirical Testing

1. 2x3 vs 3x3 latent bleed comparison for characters
2. Vision extraction quality (does the prompt strip identity well enough?)
3. Beauty pass necessity (are Flash artifacts visible in turnaround?)
4. Wardrobe hybrid ref quality (right person + different clothes?)
5. Temperature calibration per type