# Frontier Model Playbook

*Recoil Studios — Production Reference*
*Last updated: 2026-02-13*
*Companion doc: `WORKFLOW_SPEC.md`, `PRODUCTION_PIPELINE_GUIDE.md`*

---

## How to Use This Document

This playbook covers the **frontier models** used for Tier 2 shot generation — the shots where you're working directly in each model's web UI or API, not through the automated ComfyUI pipeline. For each model, you'll find:

- What it can do (specs, limits, pricing)
- How to set up character consistency
- How to write prompts that work for *that specific model*
- A fill-in-the-blanks prompt template
- What reference images to prepare

The pipeline models (Flux 2, WAN 2.2, Z-Image Turbo) are covered in `appendix_d_ai_video.md` and `appendix_e_flux2_protocols.md`.

---

## Quick Reference: Cross-Model Comparison

| Feature | Kling 3.0 | Veo 3.1 | Seedance 2.0 | Sora 2 | Hailuo 02/2.3 |
|---------|-----------|---------|--------------|--------|---------------|
| **Provider** | Kuaishou | Google DeepMind | ByteDance | OpenAI | MiniMax |
| **Prompt style** | Director's shot list | Screenplay direction | Multimodal composition (@refs) | Natural scene description | Director mode [brackets] |
| **Identity system** | Element Library (2-4 refs) | Ingredients (3 refs) | Identity-Lock (@refs, 12 max) | Characters/Cameo (video selfie) | Subject Reference (1-3 refs) |
| **Multi-shot** | 6 shots/generation | Single clip (chain via extension) | Multi-shot via prompt | Storyboard (Pro only) | Single clip |
| **Native audio** | Dialogue + SFX + music | Dialogue + SFX + music | Dialogue + SFX + music | Dialogue + SFX + music | No audio |
| **Max duration** | 15 sec | 8 sec (extend to ~148 sec) | 15 sec | 12 sec API / 25 sec Pro | 10 sec |
| **Best resolution** | 1080p confirmed (4K marketed) | 4K (upscaled) / 1080p native | 2K (2048x1152) | 1080p (Pro only) | 1080p |
| **Frame rate** | 30fps (60fps marketed) | 24fps | 24fps | 24-60fps | 25fps |
| **Physics** | Very Good | Good (lighting excellent) | Good | Best in class | Very Good |
| **Multi-character** | Best (3+ tracked) | Good (via ingredients) | Good (via @refs) | Limited (2 max) | Limited (1 primary) |
| **Negative prompts** | Yes (dedicated field) | Yes (dedicated field) | No (in-prompt only) | No (in-prompt only) | No (in-prompt only) |
| **Camera language** | Excellent | Excellent | Good | Good | Excellent (bracket syntax) |
| **Prompt length** | 2,500 chars | 1,024 tokens (~150 words) | 5,000 chars | 50-100 words optimal | 2,000 chars |
| **Best for Recoil** | Multi-char dialogue, action | Establishing shots, polish, B-roll | Multi-ref composition, music video | Physics-heavy shots, realism | Cost-effective single-char |
| **Cost/clip (10s, 1080p)** | ~$0.50-1.12 | ~$4.80 (Standard+audio) | ~$0.60 | ~$1.00-5.00 | ~$0.48 |
| **API status** | fal.ai, official, others | Vertex AI, fal.ai, others | Launching Feb 24, 2026 | OpenAI API (Tier 2+) | fal.ai, Replicate, others |

---

## Model 1: Kling 3.0 (Kuaishou)

*Released February 5, 2026*

### A. Identity & Capabilities

| Spec | Value |
|------|-------|
| Model variants | V3 (core video), O3 Omni (multimodal + native audio), Image 3.0 |
| Max clip duration | 3-15 seconds (configurable). Extendable to ~3 minutes via chaining. |
| Resolution | 1080p confirmed via API. 4K/60fps claimed in marketing but not confirmed via third-party APIs — treat with caution. |
| Frame rate | 30fps standard. 60fps claimed but unverified in API. |
| Native audio | Yes — dialogue with lip-sync, SFX, ambient, music. Co-generated in same pass. 5 languages (EN, ZH, JP, KR, ES) with dialect/accent support and mid-sentence language switching. |
| Multi-shot | Up to 6 distinct camera cuts per generation. Each shot gets its own prompt, duration (1-12s), framing, and camera movement. |
| Aspect ratios | 16:9, 9:16, 1:1. I2V mode auto-adapts to input image proportions. |
| API | fal.ai (primary third-party), WaveSpeed, Freepik, PiAPI, KIE, official Kling API, ComfyUI nodes. |

**Pricing (fal.ai):**

| Mode | Cost/sec |
|------|----------|
| V3 Standard (no audio) | $0.168/s |
| V3 Pro (audio + voice control) | $0.392/s |
| O3 Standard (audio) | ~$0.224/s |

*Example: 5-second O3 Standard with audio = $1.12*

### B. Identity System: Element Library

The Element Library lets you define visual "elements" (characters, objects) by uploading reference images. The model creates an identity lock and maintains visual identity across shots, camera angles, lighting changes, and scene transitions.

**Image Element Setup:**

| Spec | Requirement |
|------|-------------|
| Images per element | 2-4 (3-5 from different angles recommended) |
| Formats | JPG, PNG only |
| Max file size | 10MB per image |
| Min resolution | 300x300 px |
| Aspect ratio | Between 1:2.5 and 2.5:1 |

**Video Element Setup (O3 Omni only):**

| Spec | Requirement |
|------|-------------|
| Videos per element | 1 |
| Formats | MP4, MOV |
| Max file size | 50MB |
| Recommended length | 3-8 seconds |
| Bonus | Extracts both visual traits AND voice characteristics |

**How to bind characters:**
1. Create element with `frontal_image_url` (primary) + `reference_image_urls` (array of angles)
2. Reference in prompt using `@element_name` (e.g., `@Character_A`)
3. For multi-shot, the element persists across all shots in the storyboard
4. Use distinctive features (unique clothing, hair, accessories) for best lock

**Persistence:** Elements are defined per-request via API (not saved server-side between calls). On the web UI, the Element Library appears to save to your account. Re-submit element definitions with each API generation request.

**Multi-character:** Supports 3+ characters simultaneously with independent tracking. Three-person dialogue with accurate speaker attribution is confirmed.

**Gotchas:**
- Don't re-describe physical traits in the prompt after referencing an element — let the reference do the work
- End-frame images are incompatible with multi-shot mode
- 2D/traditional animation styles are harder for consistency
- Video elements (O3) produce better consistency than image-only elements

### C. Prompt Structure

**Formula (position matters — model weighs earlier content more heavily):**

```
[1. Context/Scene] + [2. Subject & Appearance] + [3. Action Timeline] +
[4. Camera Movement] + [5. Audio & Atmosphere] + [6. Technical Specs]
```

**Position 1 — Context/Scene:** Lead with environment and lighting. "A cyberpunk alleyway at midnight, illuminated by flickering neon signs reflecting off wet pavement."

**Position 2 — Subject & Appearance:** Hyperspecific when not using image references. Use unique consistent labels: `[Character A: Black-suited Agent]`. When using Elements, reference with `@element_name` and skip physical description.

**Position 3 — Action Timeline:** Sequential, not simultaneous. "First [A], then [B], finally [C]." Specify motion quality: "anxious rushing" not "walks."

**Position 4 — Camera Movement:** Use professional terms. The model excels with:
- Dolly in/out, dolly zoom (vertigo effect)
- Truck left/right
- Tracking shot, pan/tilt
- Low-angle tracking, FPV, handheld, Steadicam
- Crash zoom, whip-pan, shot-reverse-shot
- Macro close-up, profile shot, POV
- Static/locked-off

**Position 5 — Audio & Atmosphere:** Dialogue uses bracket attribution:
```
[Speaker: Man, raspy deep voice] "We need to leave."
[Speaker: Woman, clear fearful voice] "Not without her."
```

**Position 6 — Technical Specs:** Resolution, motion blur, aspect ratio, stylistic modifiers.

**Prompt limits:**
- Max positive prompt: 2,500 characters
- Max negative prompt: 2,500 characters
- CFG/Guidance Scale: 0 (creative) to 1 (strict). Default 0.5.

**Recommended negatives:** "Smiling, laughing, cartoonish, bright colors, low resolution, morphing, blurry text, disfigured hands, extra fingers"

**What to avoid:**
- Unstructured "word salad"
- Object lists without temporal/spatial logic
- Vague camera descriptions ("the camera moves around")
- Re-describing element features that come from the reference
- Overcomplicating simultaneous actions

**Multi-shot syntax:** When using multi-shot mode, the main prompt field must be **empty**. Prompts go into the `multi_prompt` array, one per shot.

**Film stock references:** The model understands "cinematic sci-fi," "film noir," "documentary style," "film grain," general cinematographic aesthetics. Specific stock names (Kodak Vision3, shot on 35mm Panavision) are **unverified** — test empirically.

### D. Strengths & Weaknesses

**Strengths:**
- Only major model with multi-shot storyboarding (6 shots per generation)
- Native audio co-generation with multilingual lip-sync
- Best multi-character handling of any model (3+ tracked independently)
- Strong motion quality — smooth, physically plausible, less "rubbery" than predecessors
- Motion Brush for painting motion paths onto source images
- Best cost efficiency for audio+video clips (~$0.50/10s with audio)

**Weaknesses:**
- Text rendering still glitchy
- Hands in extreme close-ups produce artifacts (better than V2.6 but not solved)
- Physics rated "Very Good" not "Best" — Sora 2 still leads
- Fluid simulation (water, smoke, fire) below Sora 2 level
- V3 cannot accept video references (only O3 Omni can)
- Cannot sync to pre-existing audio tracks
- Resolution uncertainty: marketing claims 4K/60fps but APIs deliver 720p-1080p

### E. Prompt Template

```
--- KLING 3.0 PROMPT ---

Context: [location] at [time of day], [lighting conditions], [atmosphere/weather]

Subject: @[element_name] — [only add details NOT in reference: current emotion,
temporary props, costume changes]

Action: First, [action A with motion quality]. Then, [action B]. Finally, [action C].

Camera: [shot type] — [camera movement with specific cinematic term],
[lens/DOF if relevant]

Audio: [Speaker: character name, voice quality] "[dialogue]"
[ambient sound description]

Style: [genre reference], [lighting style], [color palette]

--- NEGATIVE ---
Smiling, laughing, cartoonish, bright colors, low resolution, morphing,
blurry text, disfigured hands, extra fingers, [project-specific negatives]

--- ELEMENT LIBRARY REFS NEEDED ---
[CHARACTER]: [front-facing ref] + [3/4 angle] + [profile] (JPG/PNG, <10MB each)
[LOCATION]: [optional reference image for environment consistency]
```

### F. Asset Kit Requirements

| Asset | Format | Specs | Naming Convention |
|-------|--------|-------|-------------------|
| Character front | JPG/PNG | 300x300 min, <10MB | `asset_kits/kling3/[CHAR]/front.png` |
| Character 3/4 angle | JPG/PNG | 300x300 min, <10MB | `asset_kits/kling3/[CHAR]/three_quarter.png` |
| Character profile | JPG/PNG | 300x300 min, <10MB | `asset_kits/kling3/[CHAR]/profile.png` |
| Character video (O3) | MP4/MOV | 3-8s, <50MB | `asset_kits/kling3/[CHAR]/reference.mp4` |

**Upload workflow (web UI):**
1. Go to Element Library → Create New Element
2. Upload 3-4 reference images (front, 3/4, profile, full-body)
3. Name the element (e.g., "JINX")
4. In generation prompt, reference as `@JINX`
5. For multi-shot: set up all elements first, then build storyboard with per-shot prompts

---

## Model 2: Veo 3.1 (Google DeepMind)

*Released October 2025, major update January 13, 2026*

### A. Identity & Capabilities

| Spec | Value |
|------|-------|
| Model variants | `veo-3.1-generate-preview` (Standard), `veo-3.1-fast-generate-preview` (Fast) |
| Max clip duration | 4, 6, or 8 seconds per generation. 8s required when using reference images or first/last frame. |
| Scene extension | +7 seconds per extension call. Max 20 extensions = ~148 seconds total. Extension drops to 720p. Coherence degrades past ~60s. |
| Resolution | 720p (default), 1080p (8s clips, Standard tier), 4K upscale (January 2026 update). First mainstream AI model with 4K. |
| Frame rate | 24fps (cinema standard) |
| Native audio | Yes — joint audio-video generation in single pass. 48kHz stereo, AAC 192kbps. Dialogue (lip-synced), SFX, ambient, music. Audio only works in text-to-video mode — NOT image-to-video. |
| Multi-shot | No native multi-shot. Chain clips via extension or Flow Scene Builder (save last frame → use as first frame of next clip). |
| Aspect ratios | 16:9, 9:16 (native vertical added January 2026) |
| API | Vertex AI (Google Cloud), Gemini API, fal.ai, Replicate, Kie.ai, AIML API |

**Pricing:**

| Access | Cost |
|--------|------|
| Google AI Pro | $19.99/mo — Veo 3.1 Fast only, ~3 videos/day, 720p, watermarked |
| Google AI Ultra | $249.99/mo — Fast + Standard, ~5 videos/day, up to 1080p, no watermark |
| Vertex API: 3.1 Fast (video only) | $0.15/sec |
| Vertex API: 3.1 Fast (with audio) | ~$0.225/sec |
| Vertex API: 3.1 Standard (video only) | $0.40/sec |
| Vertex API: 3.1 Standard (with audio) | ~$0.60/sec |
| fal.ai: 3.1 Fast (no audio) | ~$0.10/sec |

*Example: 8-second Standard clip with audio via Vertex = ~$4.80*

**Rate limits (Vertex AI):** 50 RPM for GA endpoints, 10 RPM for preview. Max 4 outputs per text-to-video request, 1 per reference image request. Processing: ~1 min 13 sec (Fast), ~2 min 41 sec (Standard).

### B. Identity System: Ingredients to Video

Provide up to 3 reference images ("asset images") of a character, object, or product. The model preserves the subject's appearance in the output video.

**Reference Image Requirements:**

| Spec | Requirement |
|------|-------------|
| Max images | 3 per generation |
| Formats | JPEG, PNG |
| Max file size | 20MB per image |
| Recommended resolution | 720p (1280x720) or higher |
| Recommended aspect ratio | Match your output (16:9 or 9:16) |
| Non-standard images | Auto-resized or center-cropped |

**API syntax:**
```python
VideoGenerationReferenceImage(
    image=Image(gcs_uri="gs://bucket/character.png", mime_type="image/png"),
    reference_type="asset",  # must be "asset"
)
```

**How to bind characters:**
1. Generate clean character stills first (can use Gemini 2.5 Flash Image)
2. Upload as `referenceImages[]` array with `reference_type="asset"`
3. In prompt, focus on describing **interactions and actions**, not appearance (appearance comes from refs)
4. Use syntax: "Using the provided images for [Character A] and [Setting], create a [shot description]..."

**First and Last Frame:**
- Provide start frame and/or end frame for controlled transitions
- Duration locked to 8 seconds when using this feature
- Useful for scene-to-scene continuity: save last frame of clip N, use as first frame of clip N+1

**Persistence:** NO automatic memory between generations. Each generation is independent. Re-supply the same reference images AND repeat full character description in every prompt.

**Style references:** NOT supported in Veo 3.1 (only Veo 2.0). Describe style in text prompt instead.

**Critical limitation:** Audio does NOT generate in image-to-video mode (including when using reference images). Audio only works with pure text-to-video.

### C. Prompt Structure

**Formula:**

```
[Cinematography/Camera] + [Subject] + [Action] + [Context/Setting] +
[Style/Aesthetic] + [Audio]
```

**Prompt length:** 3-6 sentences, 100-150 words optimal. Hard API limit: 1,024 tokens. Over-compressed one-liners and over-long paragraphs both perform worse.

**What it attends to most (in order of influence):**
1. Style/aesthetic — "the most powerful lever"
2. Camera framing and motion — wide vs. close-up, lens choice, DOF
3. Lighting and color — quality and direction set mood as strongly as action
4. Specific verbs and nouns — "jogs three steps and stops" beats "moves quickly"
5. Dialogue clarity — placed in dedicated block, kept concise

**Camera direction:** Strong cinematography literacy.
- Shot types: wide establishing, medium close-up, aerial, tracking, drone, Dutch angle, crane, Steadicam, handheld
- Lens language: "Wide-angle 24mm," "Anamorphic 2.0x," "85mm portrait lens"
- Motion: "slow dolly forward over 3 seconds," "tracking shot from behind"
- Rule: **One clear camera move per shot** for best results. Describe in beats or counts for precise timing.

**Dialogue format (critical — the colon prevents unwanted subtitle generation):**
```
The woman in the red dress says: 'We should leave now.'
The man with glasses replies: 'Not yet.'
```

**SFX format:** Use parentheses: `(a loud thunderclap)`, `(a key turning in a lock)`

**Structured audio lanes (for complex scenes):**
```
Ambience: distant traffic hum, building echoes
SFX: footsteps on gravel, door slamming
Dialogue: The man says: 'We need to go.'
Music: building to crescendo
```

**Timing anchors:** `sfx: door slam at 1.6s` — short temporal anchors help with tight sync.

**Negative prompts:** Supported via `negativePrompt` parameter. Also works in-prompt: `--no blurry`, `--no cloudy`, `(no subtitles)`. Google recommends listing unwanted elements in the negative field rather than saying "no X" in the main prompt.

**Film stock / style references:** The model understands specific stocks ("Kodak Vision3 500T," "bleach bypass," "grainy 16mm"), directors (Kubrick, Villeneuve, Wes Anderson, Wong Kar-wai), color grading ("teal and orange," "desaturated cold palette"). Pick 1-2 style anchors — don't stack conflicting references.

**What to avoid:**
- Dialogue longer than 8 seconds (causes rushed speech or gibberish)
- Single-word dialogue like "Hello" alone (causes silence)
- Unspecified backgrounds (triggers audio hallucinations)
- Using "no" or "don't" in the prompt body (describe what you DO want instead)
- Stacking 3+ conflicting style references

### D. Strengths & Weaknesses

**Strengths:**
- Broadcast-ready cinematic polish — best color science and lighting of any model
- Best prompt adherence in benchmarks (outperforms Sora 2 and Kling)
- Best audio generation quality (joint audio-video, not bolted-on TTS)
- 4K output (first mainstream AI model to support this)
- Excellent lip-sync, especially single-speaker
- Native 9:16 vertical (not cropped horizontal)
- 40-60% frame consistency improvement over Veo 3.0

**Weaknesses:**
- Physics simulation inconsistent — weight, momentum, collision are weak. Sora 2 is meaningfully better.
- Complex multi-step actions appear abrupt; objects suddenly appear/disappear
- Rapid panning introduces artifacts (35% improved but still present)
- Quality degrades beyond ~60 seconds of chained clips
- Extension drops resolution to 720p
- No audio when using reference images (critical workflow limitation)
- Expensive — ~$4.80 per 8-second Standard clip with audio
- Generation variability — results vary significantly between runs of the same prompt

### E. Prompt Template

```
--- VEO 3.1 PROMPT ---

[Camera: shot type, lens, movement]. [Subject: character description or
"Using the provided image for [Character]"]. [Action: what happens, one clear
beat]. [Setting: location, time of day, weather]. [Style: film stock or
director reference, lighting quality and direction, color palette].

Dialogue:
[Character] says: '[Line of dialogue.]'

Ambience: [ambient sound description]
SFX: [sound effect at timestamp]

--- NEGATIVE ---
blurry, watermark, text overlay, extra fingers, morphing

--- INGREDIENTS NEEDED ---
Ingredient A: [character ref image] (JPEG/PNG, 720p+, <20MB)
Ingredient B: [second character or location ref]
Ingredient C: [third ref if needed]

--- SETTINGS ---
Duration: 8s (required if using ingredients)
Resolution: 1080p
Aspect: 16:9 or 9:16
Audio: ON (text-to-video only) or OFF (if using image refs)
```

### F. Asset Kit Requirements

| Asset | Format | Specs | Naming Convention |
|-------|--------|-------|-------------------|
| Character ref (2-3 images) | JPEG/PNG | 720p+, <20MB, matching output aspect ratio | `asset_kits/veo31/[CHAR]/ingredient_01.png` |
| Location/setting ref | JPEG/PNG | 720p+, <20MB | `asset_kits/veo31/locations/[ZONE]_ref.png` |
| First frame (for transitions) | JPEG/PNG | Match output resolution | `asset_kits/veo31/keyframes/[SHOT]_first.png` |
| Last frame (for transitions) | JPEG/PNG | Match output resolution | `asset_kits/veo31/keyframes/[SHOT]_last.png` |

**Upload workflow (Vertex AI / Gemini API):**
1. Upload reference images to GCS bucket or pass as base64
2. Set `reference_type="asset"` for each image
3. In prompt, describe actions/interactions (not appearance)
4. Set duration to 8 seconds (required with ingredients)
5. Note: audio will NOT generate when using reference images

**Upload workflow (web UI — Gemini App / Flow):**
1. Open video generation in Gemini App or Flow
2. Upload up to 3 ingredient images
3. Write prompt focusing on what happens, not what characters look like
4. Select duration (4/6/8s) and aspect ratio

---

## Model 3: Seedance 2.0 (ByteDance)

*Released February 8-10, 2026. Official global API launch: February 24, 2026.*

### A. Identity & Capabilities

| Spec | Value |
|------|-------|
| Architecture | Dual-Branch Diffusion Transformer — simultaneous audio-video generation |
| Max clip duration | 4-15 seconds (user-selectable) |
| Resolution | Up to 2K (2048x1152). Standard output 1080p. |
| Frame rate | 24fps confirmed. 60fps may be available at paid tiers (unverified). |
| Native audio | Yes — dialogue with phoneme-level lip-sync in 8+ languages, SFX, ambient, music. Beat-sync for music videos (best with strong 4/4 patterns). |
| Multimodal inputs | Images (up to 9), videos (up to 3), audio (up to 3), text (1 prompt). 12 files max total. |
| Aspect ratios | 16:9, 9:16, 4:3, 3:4, 1:1, 21:9 (ultra-wide — confirmed in some docs, not all) |
| Platforms | Dreamina (international web), Jimeng AI (China), CapCut integration, Doubao app |
| API status | NOT YET PUBLIC. Expected Feb 24, 2026 via Volcengine. Third-party access via APIYI, WaveSpeed, RecCloud available. |

**Pricing:**

| Access | Cost |
|--------|------|
| Dreamina Free | ~150-225 daily credits (watermarked, limited resolution) |
| Dreamina Basic | $18/mo — 2,700 credits |
| Dreamina Standard | $42/mo — 10,800 credits |
| Dreamina Advanced | $84/mo — 29,700 credits |
| API (estimated) | ~$0.60 per 10-second clip at 1080p |
| API Pro (estimated) | ~$0.30/minute at 1080p with audio |

**Multimodal input specs:**

| Input | Max Count | Constraints |
|-------|-----------|-------------|
| Images | 9 | JPEG, PNG, WEBP, BMP, TIFF, GIF; <30MB each |
| Videos | 3 | MP4, MOV; 480p-720p recommended; <50MB; 2-15s combined |
| Audio | 3 | MP3, WAV; <15MB each; <=15s total |
| Text | 1 | Up to 5,000 characters |
| **Total files** | **12** | Across all modalities |

*Important: Video uploads exceeding 15 seconds are silently truncated to the first 15 seconds.*

### B. Identity System: Identity-Lock (@ Reference System)

Seedance's character consistency system uses the `@` syntax to tag uploaded reference files in your prompt. The model locks onto facial features, outfit details, and physical characteristics from tagged images and preserves them throughout generation.

**The @ Syntax:**
```
@Image1 as the main character
@Image2 as the environment reference
@Video1 for camera movement style
@Audio1 for background rhythm
@Image3 is Character B
```

**Two reference modes:**

1. **First/Last Frame Mode:** Upload 1-2 images as start/end keyframes. Simple image-to-video.
2. **All-Round Reference Mode:** Full multimodal — combine images + videos + audio + text. This is where the `@` syntax and 12-file capacity come into play.

**How to bind characters:**
1. Upload a high-resolution, clean-background still showing features head-on
2. In prompt, assign the role: `@Image1 is the main character`
3. For subsequent generations, reuse the exact same reference image
4. Multi-character: `@Image1 is Character A, @Image3 is Character B`

**Consistency quality:**
- Strong within a single generation and across 2-3 extensions
- Extension 3 shows minor color drift
- Extension 4+ suitable only for rough previews
- 70-80% first-try success rate (vs. 40% for Seedance 1.5 Pro)
- Maintains identity across 20+ shots per marketing claims

**Practical file limits:** Quality degrades past 6-7 reference files. At 10-12, random element mixing occurs. For production: keep to 5-6 files max.

**CRITICAL: Real human face upload ban.** As of Feb 10, 2026, ByteDance suspended all real human face uploads as reference material after a privacy incident (model reconstructed voice from facial photo). **Workaround:** Use 3D avatars, stylized characters, or oil-painting-style images. This ban's duration is unknown.

### C. Prompt Structure

**Formula:**

```
Subject + Action + Camera + Style + Constraints
```

**The @ syntax in practice:**

| What you write | What it does |
|----------------|-------------|
| `@Image1 as the first frame` | Sets opening visual |
| `@Image2 as the last frame` | Sets ending visual |
| `@Image3 is the main character` | Locks character identity |
| `@Video1 for camera movement` | Copies camera language from reference |
| `@Audio1 for background music` | Uses audio for rhythm/pacing |
| `Extend @Video1 by 5s` | Extends existing clip |

**Key rules:**
- Specific `@` references succeed ~90% of the time
- Vague references succeed only ~33%
- Keep overall prompt shorter when using `@` tags — images carry the visual info, text carries the action
- Trim video references to their strongest 2-5 seconds
- Name files to match roles before upload

**Prompt length:** 5,000 character max. Optimal: 30-100 words (under 60 words plus constraints).

**Camera direction:** Use explicit shot language.
- Shot sizes: Wide (establish), Medium (subject + context), Close (detail/emotion)
- Movement: Dolly/track, Pan, Handheld, Gimbal
- Speed: Pair with distance ("slow dolly-in, 1-2 feet")
- Angle: Eye level, low angle, high angle
- Lens: Wide (24-28mm), normal (35-50mm), telephoto (85mm+)
- Simple movements achieve ~80% first-try success. Compound simultaneous motions cause the model to pick 1-2 and ignore the rest.

**Multi-shot in a single prompt:**
```
Shot 1 is a wide view of the city. Shot 2 is a close-up of the character
from @Image1. Shot 3 is an over-the-shoulder tracking shot.
```

**Negative prompts:** No dedicated field. Include in main prompt as constraints (max 3-5):
- "No text overlays, no watermarks, no extra characters, no snap zooms, no extra fingers"

**What to avoid:**
- Compound camera moves in a single clause (model picks 1-2, ignores rest). Write as beats: `Start: slow dolly-in. Then: gentle pan right.`
- Mood words as camera directions ("dynamic," "emotional")
- Stacking excessive negative constraints (dulls the image)
- Abstract "feelings" instead of specific movements

### D. Strengths & Weaknesses

**Strengths:**
- Only model accepting 12 files across 4 modalities — biggest differentiator
- Native audio-video co-generation with 8+ language lip-sync
- Fastest generation speed (~30% faster than v1.5, 5-second video in under 60 seconds)
- Best cost efficiency (~$0.60/10s vs. $1.00 Sora 2, $4.80 Veo 3.1)
- 2K native resolution (higher than Sora 2's max 1080p)
- Most aspect ratio options (6 including 21:9 ultra-wide)
- Excellent template replication and video remixing
- Multi-shot storytelling from one prompt with automatic storyboarding

**Weaknesses:**
- Complex physics (water, smoke, fire) unrealistic — "waves didn't break naturally"
- Hand/finger close-ups produce impossible positions
- Text rendering: small text blurs/warps, large text readable only ~60% of the time
- Quality degrades past 6-7 reference files
- Audio can become garbled in longer sequences
- Compound camera motions ignored beyond 1-2
- **Real human face uploads currently banned** (major workflow impact)
- API not yet publicly available (Feb 24, 2026 target)

### E. Prompt Template

```
--- SEEDANCE 2.0 PROMPT ---

@Image1 is the main character. @Image2 is the environment.

Subject: [character description — keep brief if using @ref]
Action: [what happens, present tense, plain language]
Camera: [shot size] — [one movement verb with speed] — [lens bucket]
Style: [visual anchor — film/process/artist], [lighting], [color treatment]

Constraints: No [2-3 specific exclusions]. Duration [X]s.

Audio: [dialogue attribution or ambient description]

--- FILES TO UPLOAD ---
@Image1: [character front-facing ref] (JPEG/PNG, <30MB)
@Image2: [environment ref]
@Video1: [camera movement reference] (MP4/MOV, <50MB, trimmed to 2-5s)
@Audio1: [rhythm/music reference] (MP3/WAV, <15MB)

--- SETTINGS ---
Duration: [4-15]s
Resolution: 1080p or 2K
Aspect: [16:9 / 9:16 / 1:1 / 4:3 / 21:9]
```

### F. Asset Kit Requirements

| Asset | Format | Specs | Naming Convention |
|-------|--------|-------|-------------------|
| Character ref (front, clean BG) | JPEG/PNG | <30MB, clear lighting | `asset_kits/seedance/[CHAR]/identity_ref.png` |
| Environment ref | JPEG/PNG | <30MB | `asset_kits/seedance/locations/[ZONE]_ref.png` |
| Camera movement ref | MP4/MOV | 2-5s, 480-720p, <50MB | `asset_kits/seedance/motion/[STYLE]_ref.mp4` |
| Audio rhythm ref | MP3/WAV | <15MB, <=15s | `asset_kits/seedance/audio/[MOOD]_ref.mp3` |
| Style ref image | JPEG/PNG | <30MB | `asset_kits/seedance/style/[LOOK]_ref.png` |

**Upload workflow (Dreamina web UI):**
1. Select "All-Round Reference" mode (not First/Last Frame)
2. Upload files — system assigns `@Image1`, `@Image2`, `@Video1`, etc.
3. Write prompt using `@` tags to assign each file's role
4. Set duration, resolution, aspect ratio
5. Note: keep to 5-6 files max for reliable quality

**Current restriction:** No real human face uploads. Use stylized/avatar-based character references.

---

## Model 4: Sora 2 (OpenAI)

*Released September 30, 2025*

### A. Identity & Capabilities

| Spec | Value |
|------|-------|
| Model variants | `sora-2` (standard), `sora-2-pro` (higher quality/resolution) |
| Max clip duration (API) | `sora-2`: 4, 8, or 12 seconds. `sora-2-pro`: 10, 15, or 25 seconds. |
| Max clip duration (web) | Plus: 15s. Pro: 25s via storyboard. |
| Resolution (API) | `sora-2`: 1280x720. `sora-2-pro`: 1280x720, 720x1280, 1024x1792, 1792x1024. |
| Resolution (web) | Plus: 720p. Pro: 1080p. |
| Frame rate | 24-60fps (configurable) |
| Native audio | Yes — dialogue with lip-sync, multi-speaker conversations, ambient SFX, background music, spatial audio. Generated in same pass as visuals. |
| Multi-shot | Storyboard feature (Pro only, sora.com) — build multi-shot sequences up to 25s. |
| Extensions | Yes — continue any clip with new prompt, preserves characters/settings. |
| Aspect ratios | 16:9, 9:16 (API confirmed). 1:1 mentioned in some sources but not in official API docs. |

**Pricing:**

| Access | Cost |
|--------|------|
| ChatGPT Plus | $20/mo — 1,000 credits/mo, 720p, ~30 daily limit |
| ChatGPT Pro | $200/mo — 10,000 credits + unlimited Relaxed mode, 1080p, Storyboard |
| API `sora-2` (720p) | $0.10/sec |
| API `sora-2-pro` (720p) | $0.30/sec |
| API `sora-2-pro` (1080p) | $0.50/sec |

*Example: 10-second `sora-2-pro` at 1080p = $5.00*

**Credit consumption (web):** 720p = 16 credits/sec. 1080p = 40 credits/sec. A 5-second 720p video = 80 credits.

**API rate limits:** Tier 1: 25 RPM (`sora-2`) / 10 RPM (`sora-2-pro`). Tier 2 minimum ($10 top-up) required to unlock API access.

**Generation times:** Plus: 5-8 min. Pro: 3-5 min. Peak hours (10 AM - 4 PM PST) = longer queues.

### B. Identity System: Characters (formerly Cameo)

Sora 2's identity system is based on **video selfie enrollment**, not reference images.

**How it works:**
1. Record a short verification video (3-10 seconds) reading numbers displayed on screen
2. System extracts facial features, body type, voice characteristics, movement patterns
3. Saved as a persistent **Character ID**
4. Reference in prompts using `@character` + index number

**Character requirements:**
- Record a short personal video as instructed (reading aloud numbers on screen)
- Generate character reference videos from 3-5 different angles for best results
- Each character video clipped to max 3 seconds
- System learns appearance, movements, and voice

**Persistence:** Character ID persists indefinitely across unlimited generations. System remembers character from different angles. Reported 95%+ consistency across shots.

**Multi-character limit:** **Maximum 2 characters per video** — this is a hard technical limitation. For group scenes, split into multiple two-person shots and join in post.

**Remix mode:** Take any generated clip and modify it — change characters, swap styles, extend scenes. Change one element at a time for best results.

**Storyboard (Pro only):** Lay out multiple shots with per-shot prompts. Characters and settings carry across shots within the storyboard. Export as single continuous video.

**vs. competitors:**
- Kling Elements: More explicit, reference-image-driven, handles 3+ characters
- Seedance Identity-Lock: More compositional control via multimodal @refs
- Sora Characters: Best internal model understanding, but 2-character limit is a hard constraint

**Note:** As of Feb 2026, Characters may not be available via API — confirmed on sora.com only.

### C. Prompt Structure

**Core philosophy:** Prompt like you're briefing a cinematographer. Detailed prompts = control and consistency. Lighter prompts = creative freedom. Same prompt generates different results each time — iterate and collaborate.

**Recommended structure:**
1. **Prose scene description** — characters, costumes, scenery, weather, environment
2. **Cinematography** — shot framing, mood, lens choice, DOF, lighting
3. **Actions** — specific beats or gestures (one clear action per shot)
4. **Dialogue** — brief, natural lines labeled by speaker, in a separate block below prose

**Optimal prompt length:** 50-100 words in 2-4 sentences. Single-sentence prompts lack specificity. Prompts over 150 words often introduce conflicting instructions.

**What it attends to most (in order):**
1. Style/aesthetic
2. Camera framing and motion
3. Lighting and color
4. Specific verbs and nouns
5. Dialogue clarity

**Camera direction:** Strong cinematography literacy.
- Specify shot type: wide establishing, medium close-up, aerial, tracking
- Specify lens: "Wide-angle 24mm," "Anamorphic 2.0x," "85mm portrait lens"
- Specify motion: "slow dolly forward over 3 seconds"
- **One clear camera move per shot**

**Dialogue format:** Place in a separate block below prose description. 4-second clips = 1-2 short exchanges. 8-second clips allow more. Use consistent speaker labels.

**Physics/motion:** Keep motion simple — one clear subject action per shot. Use temporal markers: "in the final second," "three times," "takes four steps, pauses." Complex multi-beat sequences work better split across separate 4-second clips.

**Negative prompts:** No formal parameter. Phrasing exclusions in the prompt works: "avoid Dutch angles; no on-screen text; no lens flare."

**Strong vs. weak prompts:**

| Weak | Strong |
|------|--------|
| "Beautiful street at night" | "Wet asphalt, zebra crosswalk, neon signs reflecting in puddles" |
| "Person moves quickly" | "Cyclist pedals three times, brakes, stops at crosswalk" |
| "Cinematic look" | "Anamorphic 2.0x, shallow DOF, volumetric light" |

### D. Strengths & Weaknesses

**Strengths:**
- **Best physics simulation** of any model — realistic weight, momentum, collision, fluid dynamics
- **Best temporal consistency** — characters don't change faces between shots
- Native audio generated in same pass — lip-sync, ambient, SFX all temporally aligned
- Strong cinematographic understanding — lens, framing, lighting highly controllable
- Storyboard feature enables multi-shot sequences with character persistence
- Extensions carry scenes forward with visual continuity

**Weaknesses:**
- **2-character limit per video** — hard constraint, biggest limitation for Recoil
- Text rendering nearly impossible — letters substituted with similar shapes
- Hands still fragile — buttons, zippers, pouring liquids problematic
- Physics edge cases: liquid pouring, cloth draping, small object interactions fail
- Identity drift under dramatic lighting shifts
- Fast camera moves (whip pans, fast spins) trigger warping artifacts
- Aggressive content moderation can block legitimate creative prompts
- Characters feature may not be available via API (web only as of Feb 2026)
- Expensive at Pro tier ($200/mo or $0.50/sec API)
- Free tier removed January 2026

### E. Prompt Template

```
--- SORA 2 PROMPT ---

[Scene prose: 2-3 sentences describing setting, characters, atmosphere.
Use specific sensory details — wet asphalt, neon reflections, visible breath
in cold air. Describe character by distinctive visual features.]

[Cinematography: shot type, lens choice ("Anamorphic 2.0x"), DOF,
camera movement ("slow dolly forward over 3 seconds"), lighting direction
and quality.]

[Action: one clear beat. "She turns, sees the figure, freezes."
Use temporal markers for timing.]

Dialogue:
Speaker A: 'Line of dialogue.'
Speaker B: 'Response.'

[One subtle ambient sound as rhythm anchor: "distant traffic hiss" or
"rain on metal."]

Avoid: Dutch angles, on-screen text, lens flare

--- CHARACTER SETUP (sora.com only) ---
1. Record 3-10s verification video reading on-screen numbers
2. Generate refs from 3-5 angles, clip to 3s each
3. Reference in prompt as @character + index

--- SETTINGS ---
Duration: [4/8/12]s (sora-2) or [10/15/25]s (sora-2-pro)
Resolution: 720p (sora-2) or 1080p (sora-2-pro)
Aspect: 16:9 or 9:16
```

### F. Asset Kit Requirements

Sora 2's identity system is video-selfie-based, not reference-image-based. Asset preparation is different from other models.

| Asset | Format | Specs | Notes |
|-------|--------|-------|-------|
| Character verification video | Video | 3-10s, reading on-screen numbers | Recorded in sora.com UI |
| Character angle refs | Video | 3s each, 3-5 different angles | Generated from initial enrollment |

**Workflow:**
1. Go to sora.com → Characters
2. Record verification video (3-10s, reading displayed numbers)
3. Generate character reference videos from multiple angles
4. In new generations, reference as `@character` + index number
5. For multi-character: max 2 Character IDs per video
6. For group scenes: generate separate 2-person shots and composite in post

---

## Model 5: Hailuo 02 / 2.3 (MiniMax)

*Hailuo 02 released late 2025. Hailuo 2.3 (latest) released early 2026.*

### A. Identity & Capabilities

| Spec | Value |
|------|-------|
| Model variants | `T2V-02` (text-to-video), `I2V-02` (image-to-video), `S2V-01` (subject reference), `T2V-01-Director` / `I2V-01-Director` (camera control). Hailuo 2.3 is the latest successor. |
| Max clip duration | 6 or 10 seconds per generation |
| Resolution | 512p (Standard), 768p (Standard), 1080p (Pro) |
| Frame rate | 25fps |
| Native audio | **No.** All output is silent video. Audio must be added in post. |
| Multi-shot | No. Single clips only. Use Subject Reference (S2V-01) for consistency across separate generations. |
| Start & End Frame | Yes — define first and last keyframes, model interpolates between them |
| Aspect ratios | Between 2:5 and 5:2 (very flexible range) |
| API | fal.ai, Replicate, AI/ML API, Atlas Cloud, WaveSpeed, getimg.ai, VEED |

**Pricing:**

| Access | Cost |
|--------|------|
| hailuoai.video Free | Daily bonus credits (watermarked, limited) |
| hailuoai.video Standard | $7.99/mo (limited-time, reg $14.99) — 1,000 credits |
| hailuoai.video Pro | $24.99/mo (limited-time, reg $54.99) — 4,500 credits |
| hailuoai.video Max | $199.99/mo — 20,000 credits, unlimited 01 & 02 in Relax Mode |
| fal.ai 768p | $0.045/sec (~$0.27 per 6s clip) |
| fal.ai 1080p | $0.08/sec (~$0.48 per 6s clip) |

*Cheapest per-clip cost of any frontier model. A 6-second 1080p clip on fal.ai = ~$0.48.*

**Generation times:** ~4-5 min for 6s standard. ~8-9 min for 10s pro. Not real-time.

### B. Identity System: Subject Reference (S2V-01)

Uses a proprietary identity reference network built on diffusion transformers. Extracts core identity features from a reference image (facial structure, hairstyle, skin tone, distinguishing features) and enforces them across all frames.

**Reference image specs:**

| Spec | Requirement |
|------|-------------|
| Images | 1 (primary workflow). Up to 3 of same person may be supported. |
| Best practice | Clear, well-lit headshot with distinct features. Avoid heavy filters, dim lighting, occlusions. |
| Formats | JPG, JPEG, PNG |
| Min input resolution | 300px on shorter side |
| Max file size | 20MB |

**How it works:**
- Upload reference image each time you generate (no persistent character ID)
- Model preserves facial structure while allowing modification of posture, expressions, lighting, clothing via text prompt
- Re-upload same image across generations for consistency

**Multi-character:** Limited. S2V-01 works best with **one primary character**. Multi-character identity preservation is unreliable. Multi-subject references are a planned future feature.

**Limitations:**
- May follow prompts less precisely than standard T2V or I2V modes
- Background morphing (environments shifting unexpectedly)
- Best with simple, clear prompts focused on one character

### C. Prompt Structure

**Formula:**

```
[Camera Movement] + [Character Description] + [Action] + [Scene Description] +
[Lighting/Mood] + [Style]
```

**Optimal length:** 40-60 words.

**Director Mode — bracket camera commands:**

| Category | Commands |
|----------|----------|
| Truck | `[Truck left]`, `[Truck right]` |
| Pan | `[Pan left]`, `[Pan right]` |
| Push/Pull | `[Push in]`, `[Pull out]` |
| Pedestal | `[Pedestal up]`, `[Pedestal down]` |
| Tilt | `[Tilt up]`, `[Tilt down]` |
| Zoom | `[Zoom in]`, `[Zoom out]` |
| Shake | `[Shake]` |
| Follow | `[Tracking shot]` |
| Static | `[Static shot]` |

**Bracket rules:**
- Multiple commands in one `[]` execute simultaneously: `[Pan left, Pedestal up]`
- Max 3 combined simultaneous movements recommended
- Sequential: place commands in order in prompt text
- `[Static shot]` cannot be combined with other movements

**Prompt enhancer:** API has `enhance_prompt` / `prompt_optimizer` parameter (boolean, default true) that auto-optimizes your text.

**What to avoid:**
- Overly complex prompts with too many simultaneous actions
- Vague action descriptions (causes morphing and blurring)
- Combining `[Static Shot]` with other camera movements
- Too many subjects in one generation

**Negative prompts:** No dedicated field. In-prompt exclusion ("no motion blur," "no grainy textures") works inconsistently.

**Max prompt length:** 2,000 characters.

**Audio/dialogue:** Not applicable. No audio generation. Add in post.

### D. Strengths & Weaknesses

**Strengths:**
- Excellent physics simulation (among the best, close to Sora 2)
- Richer, smoother camera movements than most competitors
- Excellent body movement rendering, micro-expressions
- Best price-to-quality ratio of any model (~$0.27-0.48 per clip)
- Ranked #2 globally on Artificial Analysis Video Arena (I2V category)
- True 1080p native output
- Excellent start/end frame interpolation
- Strong prompt adherence with Director Mode brackets

**Weaknesses:**
- **No audio** — major gap for dialogue scenes
- 10-second max duration
- Athletic/fast action distortion at impact frames
- Hand-drawn/anime style jitter with complex motion
- Background blur with rapid movement
- Environmental morphing in Subject Reference mode
- Multi-character identity consistency unreliable
- Slower generation (4-9 minutes per clip)

**Hailuo 2.3 improvements over 02:**
- Better human motion fluidity and micro-expressions
- Expanded stylization (anime, ink-wash, game CG)
- Improved prompt following
- Enhanced lighting and color tones
- Better text rendering
- Same pricing
- **Regression note:** Some physics scenarios (football, fight scenes) actually perform worse in 2.3 than 02

### E. Prompt Template

```
--- HAILUO 02/2.3 PROMPT ---

[Camera: bracket command, e.g., [Push in]] A [character description]
[action with specific verb and adverb] in [scene/location description],
[lighting quality and direction], [style keyword: cinematic/photorealistic/etc.]

--- SUBJECT REFERENCE (S2V-01) ---
Upload: [clear headshot of character] (JPG/PNG, <20MB)

--- DIRECTOR MODE CAMERA ---
[Primary movement], [optional secondary: max 3 combined]
Sequential: describe in order within prompt text

--- SETTINGS ---
Duration: 6s or 10s
Resolution: 768p (Standard) or 1080p (Pro)
Prompt optimizer: ON (default)

--- NO AUDIO ---
Audio must be added in post-production.
```

### F. Asset Kit Requirements

| Asset | Format | Specs | Naming Convention |
|-------|--------|-------|-------------------|
| Character headshot | JPG/PNG | Clear, well-lit, <20MB, 300px+ | `asset_kits/hailuo/[CHAR]/headshot.png` |
| First frame keyframe | JPG/PNG | Match output resolution, <20MB | `asset_kits/hailuo/keyframes/[SHOT]_first.png` |
| Last frame keyframe | JPG/PNG | Match output resolution, <20MB | `asset_kits/hailuo/keyframes/[SHOT]_last.png` |

**Upload workflow (web UI — hailuoai.video):**
1. Go to Subject Reference to Video section
2. Upload character headshot (clear, well-lit, distinct features)
3. Write prompt with Director Mode brackets for camera control
4. Select duration (6s or 10s) and resolution
5. Note: re-upload same headshot for each generation to maintain consistency

**Upload workflow (fal.ai API):**
1. Use `fal-ai/minimax/hailuo-02/pro/image-to-video` endpoint for Subject Reference
2. Pass character image as input
3. Set resolution, duration, aspect ratio
4. Enable/disable prompt optimizer

---

## Appendix: Model Selection Guide for Recoil

### By Shot Type

| Shot Type | First Choice | Why | Fallback |
|-----------|-------------|-----|----------|
| Multi-character dialogue (2-3 people) | **Kling 3.0** | Only model with multi-shot + 3+ character tracking + audio | Seedance 2.0 (via @refs) |
| Establishing/environment shot | **Veo 3.1** | Best cinematic polish, color science, lighting | Kling 3.0 |
| Physics-heavy action (explosions, fluids) | **Sora 2** | Best physics simulation, temporal consistency | Kling 3.0 |
| Single character dramatic moment | **Veo 3.1** | Broadcast-quality polish, excellent lip-sync | Hailuo 02 (budget) |
| Music video / rhythm-synced | **Seedance 2.0** | Multimodal audio input, beat-sync capability | Kling 3.0 (native audio) |
| Shot requiring video/audio references | **Seedance 2.0** | Only model accepting video + audio + image refs simultaneously | Kling O3 (video refs) |
| Budget single-character shots | **Hailuo 02/2.3** | $0.27-0.48/clip, strong quality | Seedance 2.0 ($0.60/clip) |
| Over-the-shoulder / shot-reverse-shot | **Kling 3.0** | Multi-shot mode with per-shot camera control | Sora 2 (Storyboard) |
| B-roll / atmosphere | **Veo 3.1** | Best ambient audio generation, cinematic quality | Hailuo 02 (no audio, add in post) |

### By Priority

| Priority | Model | Why |
|----------|-------|-----|
| Lowest cost per shot | Hailuo 02/2.3 | $0.27-0.48/clip via fal.ai |
| Best overall quality | Veo 3.1 Standard | Broadcast-ready polish, but expensive ($4.80/8s) |
| Best character consistency | Kling 3.0 | Element Library with multi-angle refs, 3+ characters |
| Most flexible workflow | Seedance 2.0 | 12-file multimodal input, @ref system |
| Best physics/realism | Sora 2 | World-leading physics simulation |

### Cost Comparison (10-second clip at 1080p with audio where available)

| Model | Approximate Cost | Audio Included |
|-------|-----------------|----------------|
| Hailuo 02 (fal.ai) | $0.80 | No |
| Seedance 2.0 (estimated) | $0.60 | Yes |
| Kling 3.0 O3 (fal.ai) | $1.12-2.24 | Yes |
| Sora 2 (API, sora-2) | $1.00 | Yes |
| Sora 2 Pro (API, 1080p) | $5.00 | Yes |
| Veo 3.1 Standard (Vertex) | $6.00 | Yes |
| Veo 3.1 Fast (fal.ai) | $1.00 | No |

---

## Appendix: Universal Negative Prompts for Recoil

Use these as a starting point and customize per model:

**Base negatives (all models):**
```
disfigured hands, extra fingers, morphing, blurry, low resolution,
watermark, text overlay, logo, cartoon style (unless intended),
floating objects, jittery motion
```

**Kling 3.0 additions:**
```
Smiling, laughing, cartoonish, bright colors
```
*(Kling defaults to optimistic/smiling faces — counter with negatives)*

**Veo 3.1 additions:**
```
subtitle text, on-screen graphics
```
*(Use colon dialogue format to prevent subtitle generation)*

**Sora 2 in-prompt exclusions:**
```
avoid Dutch angles; no on-screen text; no lens flare
```

---

## Appendix: Information Gaps & Verification Needed

The following items could not be confirmed during research and should be verified during EP001 testing:

| Model | Uncertainty | Impact |
|-------|-------------|--------|
| Kling 3.0 | 4K/60fps via API — marketing claims vs. 720p/1080p in practice | Resolution planning |
| Kling 3.0 | Max simultaneous elements — 3 confirmed, upper bound unknown | Multi-character scenes |
| Kling 3.0 | Film stock emulation (e.g., "shot on 35mm") — unverified | Style prompting |
| Veo 3.1 | 4K native vs. upscaled 1080p — unclear | Resolution planning |
| Veo 3.1 | Audio with first/last frame feature — may not work | Transition workflow |
| Veo 3.1 | Exact API pricing — no durable public rate card from Google | Budget planning |
| Seedance 2.0 | Real human face ban duration — temporary or permanent? | Character workflow |
| Seedance 2.0 | API launch date Feb 24 — may slip | Integration timing |
| Seedance 2.0 | 21:9 ultra-wide aspect ratio — confirmed in some docs, not all | Aspect ratio planning |
| Sora 2 | Characters feature via API — confirmed on web only | Automation potential |
| Sora 2 | 1:1 aspect ratio — mentioned but not in official API docs | Aspect ratio planning |
| Hailuo 02/2.3 | 1080p at 10s — some sources suggest 1080p limited to 6s | Duration planning |
| Hailuo 2.3 | Regression vs. 02 in some physics scenarios | Shot routing decisions |

---

*This document should be updated after the EP001 testing sprint with confirmed specs, verified workflows, and Recoil-specific learnings.*
