# Voice Direction Guide

Style rules for generating emotional parentheticals in fountain screenplays. Used by the voice-direct phase of `/listen`.

---

## What Voice Direction Is

A **parenthetical** sits between a character cue and their dialogue line in fountain format:

```
JINX
(choking, words squeezed out between compressed breaths — forcing dark humor through a closed throat)
Can't really talk with the hand situation, chrome-boy.
```

The parenthetical becomes the `instruct` field in the Qwen3 TTS CustomVoice pipeline, directing the model to adjust delivery (pace, breath, tension, volume) while keeping voice identity locked. Currently stored in the fountain file for use once custom voices are fine-tuned.

**Better parentheticals = better audio.**

---

## Style Rules

### 1. Physical, Not Emotional

Describe what the body is doing, not what the character "feels."

| Bad | Good |
|-----|------|
| (angrily) | (slamming metal between words, fury making her jaw tight) |
| (sadly) | (voice dropping to nothing, the last word barely leaving her mouth) |
| (scared) | (hearing boots getting closer, the tremor in her voice is real) |
| (excited) | (eyes wide, already doing the math, greed and wonder fighting for the same breath) |

### 2. Specific to the Line

The direction must be about THIS line in THIS moment. Reference the physical situation, what just happened, what the character is doing.

| Bad (generic) | Good (specific) |
|---------------|-----------------|
| (sarcastically) | (bored, clinical — rattling off death odds the way someone reads a grocery list) |
| (commanding) | (hands-up diplomacy, deliberately calm, managing a room about to explode) |

### 3. The Em-Dash Structure

Use the em-dash (`—`) to separate the surface delivery from the underlying truth:

```
(flat deadpan, barely above a whisper — the sigh of someone watching their day get immeasurably worse, too tired to scream)
```

Pattern: `(what it sounds like — what's actually happening underneath)`

### 4. Read the Project's Character Bible

Character voice info lives in `[project]/bible/characters.md`, not here. Before voice-directing, read the character bible for the project and use each character's established vocal patterns, physicality, and behavioral DNA to inform the parentheticals.

The parenthetical should describe **acoustic qualities** the TTS model can render — vocal tension, breath control, volume, pace, roughness, cracking. Not writing style or thematic metaphors.

### 5. One Parenthetical per Line

Never stack multiple parentheticals. Combine into one rich direction.

### 6. Length

Aim for 15-40 words. Long enough to give the TTS model real direction, short enough to be a single emotional gesture.

### 7. Avoid These Words

These are too abstract for TTS models to interpret:
- emotional, intense, dramatic, passionate
- with feeling, meaningfully, significantly
- loudly, quietly (use physical descriptions instead: "barely above a whisper", "voice filling the corridor")

### 8. Always Include Physical Context

Ground the direction in what the character's body is doing:
- Breathing state (gasping, holding breath, exhaling slowly)
- Posture (stepping forward, backing away, frozen)
- Physical activity (running, dragging, fighting, climbing)
- Injuries/state (hoarse, throat raw, exhausted, bleeding)

---

## Elaborating Existing Parentheticals

When a line already has a parenthetical, **elaborate** it — don't replace it. Build on what the writer intended.

### Rules for Elaboration

1. **Keep the core intent.** If the writer wrote `(angry)`, the elaboration must still be angry.
2. **Add the physical.** Turn the emotion into a body.
3. **Add the em-dash.** Reveal what's underneath the surface.
4. **Preserve any specific references.** If the original mentions a prop or action, keep it.

### Examples

| Original | Elaborated |
|----------|------------|
| (angry) | (teeth clenched, barely controlling volume — the kind of anger that makes hands shake, not fists) |
| (whispering) | (whispering against the wall, lips almost touching metal — checking over her shoulder between words) |
| (desperate) | (gripping his collar, pulling him close — voice cracking on every other word, dignity abandoned) |
| (V.O., calm) | (V.O., surface calm like still water — but the pauses between sentences are too long, choosing words carefully to hide the tremor) |

---

## Generating New Parentheticals

When a dialogue line has no parenthetical at all:

1. **Read the scene context.** What just happened? What's the physical situation? What's the character doing?
2. **Read the dialogue text.** What tone does the content imply? Sarcasm? Fear? Command?
3. **Read the character bible.** What are this character's vocal patterns and physical tendencies?
4. **Write the acoustic direction.** Surface delivery + em-dash + underlying truth. Focus on what the TTS model can render: breath, tension, pace, volume, roughness.

### Scene Context Window

To generate a good parenthetical, read:
- The scene heading (location, time)
- The 2-3 action lines immediately before the dialogue
- The dialogue line itself
- What happens immediately after (if it reveals the emotional state)

---

## Narrator Lines

**Do NOT add parentheticals to narrator/action lines.** The narrator uses identity-only cloning for consistency. Parentheticals on narrator lines would be ignored by the TTS pipeline.

---

## Quality Check

After voice-directing, every dialogue line in the target episodes should have a parenthetical that:
- [ ] Contains a physical description
- [ ] Is specific to the moment (not generic)
- [ ] Reflects the character's established voice
- [ ] Uses the em-dash structure (surface — truth)
- [ ] Is 15-40 words
