---
name: listen
description: Generate audio narration of screenplay episodes via TTS (ElevenLabs or Qwen3).
allowed-tools: Read, Write, Edit, Bash, Glob, Grep
argument-hint: "[project] [init | episodes N-M | budget | voices | cast | --skip-voice-direct]"
---

# /listen - Screenplay Audio Reader

Generate MP3 narration of compiled .fountain screenplays using ElevenLabs or Qwen3 TTS.
Distinct voices per character, credit tracking, episode-by-episode output.
**Automatic voice direction** enhances every dialogue line with physical emotional parentheticals before generation.

## Usage

```
/listen [project]                          # Generate all episodes
/listen [project] episodes 1-5            # Generate episode range
/listen [project] episode 3               # Single episode
/listen [project] init                    # Set up voice config for new project
/listen [project] budget                  # Check credit usage this month
/listen [project] voices                  # List available ElevenLabs voices
/listen [project] cast                    # Open voice casting page in browser
/listen [project] --dialogue-only         # Skip narration to save credits
/listen [project] --force                 # Regenerate existing episodes
/listen [project] --concat                # Also output single combined file
/listen [project] --skip-voice-direct     # Skip the voice direction pass
```

**Examples:**
```
/listen tartarus episodes 1-5                # Voice-direct + generate first 5
/listen tartarus episodes 6-10 --force       # Re-direct + regenerate 6-10
/listen tartarus episode 1 --skip-voice-direct --force  # Just regenerate, keep existing directions
/listen leviathan --dialogue-only            # Dialogue only (saves ~60% credits)
/listen tartarus budget                      # How many credits left this month
```

## Prerequisites

- **Compiled .fountain file** in project root (run `/compile` first if needed)
- **Voice config** at `[project]/audio/voice_config.yaml` (run `init` first)
- **For ElevenLabs:** `ELEVEN_API_KEY` env var, `pip install elevenlabs pydub pyyaml audioop-lts`
- **For Qwen3:** `pip install qwen-tts soundfile pydub pyyaml`, local GPU
- **ffmpeg** (already installed)

## Execution

When this skill is invoked:

1. **Parse arguments** for project name, episode range, and flags
2. **Resolve project path** to `projects/[project]/`
3. **Run the appropriate command** (see below)

### Generate (default) — Two-Phase Pipeline

**Phase 0: Voice Direction** (automatic, unless `--skip-voice-direct`)

Before ANY audio generation, enhance the fountain file's emotional parentheticals.
This phase modifies the .fountain file on disk so changes persist across runs.

**Steps:**

1. Read the compiled .fountain file
2. Read the voice direction style guide at `/docs/voice_direction_guide.md`
3. Identify all dialogue lines in the TARGET EPISODE RANGE (not the whole series)
4. For each dialogue line, read the surrounding scene context (scene heading + 2-3 action lines before + the dialogue itself)
5. Apply one of two operations:

   **A) No parenthetical exists** — Generate one:
   - Read scene context (scene heading + surrounding action lines) and dialogue text
   - Read the project's `bible/characters.md` for the character's vocal patterns and physicality
   - Generate an acoustic direction the TTS model can render: breath, tension, pace, volume, roughness, cracking
   - Use the em-dash structure: `(surface delivery — underlying truth)`
   - Write the parenthetical into the fountain file between character cue and dialogue

   **B) Parenthetical already exists** — Elaborate it:
   - Read the existing parenthetical
   - Preserve the core intent and any specific references
   - Add physical/acoustic grounding if missing
   - Add the em-dash underlying-truth layer if missing
   - Expand generic emotion words into specific physical descriptions
   - If the parenthetical is already rich and visceral (20+ words, has physical detail + em-dash), leave it alone

6. Write the enhanced fountain file back to disk
7. Report: "Voice direction complete: X new, Y elaborated, Z unchanged"

**Rules for voice direction:**
- NEVER add parentheticals to narrator/action lines (narrator uses identity-only cloning)
- NEVER replace a good existing parenthetical — only elaborate
- If a parenthetical is already 20+ words with physical detail and em-dash structure, skip it
- Keep parentheticals to 15-40 words
- Focus on what the TTS model can render: breath state, vocal tension, pace, volume, roughness, cracking
- Avoid abstract words: emotional, intense, dramatic, passionate, meaningful

**Phase 1: TTS Generation**

After voice direction is complete, run the TTS pipeline:

```bash
python3 /tools/fountain_reader.py /path/to/project generate [--episode N] [--episodes N-M] [--dialogue-only] [--concat] [--force]
```

Then open the first generated MP3 for playback (or the concat file if `--concat` was used).

### Init
```bash
python3 /tools/fountain_reader.py /path/to/project init
```
Report characters found and next steps. No voice direction needed.

### Budget
```bash
python3 /tools/fountain_reader.py /path/to/project budget
```

### Voices
```bash
python3 /tools/fountain_reader.py /path/to/project voices
```

### Cast
```bash
open -a "Google Chrome" /editors/voice_casting.html
```

### Dry Run
```bash
python3 /tools/fountain_reader.py /path/to/project dry-run [--episode N] [--episodes N-M] [--dialogue-only]
```

## Voice Config

Located at `[project]/audio/voice_config.yaml`. Created by `init`, editable by hand or via casting page.

```yaml
engine: qwen3_clone          # or "elevenlabs" or "qwen3"
voice_refs: "/path/to/refs"  # For qwen3_clone: directory with ref_*.wav files

narrator:
  voice_id: "..."              # ElevenLabs voice ID
  ref_wav: "ref_narrator.wav"  # Qwen3 clone ref (Rainbow Passage WAV)

characters:
  JINX:
    voice_id: "..."
    ref_wav: "ref_jinx.wav"
    instruct: "Dry, sardonic..."   # Fallback instruct (parenthetical overrides)

side_voice_pool:
  - voice_id: "..."
    ref_wav: "ref_pool_julian.wav"

silence_ms:
  between_segments: 200
  section_break: 500
  episode_break: 1000
```

### Engine Modes

| Engine | Identity | Emotional Direction | Notes |
|--------|----------|---------------------|-------|
| `elevenlabs` | ElevenLabs voice IDs | Via ElevenLabs API | Requires ELEVEN_API_KEY, costs per char |
| `qwen3` | Built-in Qwen3 speakers | Static instruct per character | Local, free |
| `qwen3_clone` | ICL voice cloning from ref WAVs | Not yet supported (identity-only) | Local, free, best identity consistency |
| `qwen3_custom` | Fine-tuned custom speakers | `instruct` parameter per line | Requires fine-tuning (CUDA GPU), then local inference |

**Current status (Feb 2026):** `qwen3_clone` uses ICL (in-context learning) mode —
the model conditions on the full reference audio + Rainbow Passage transcript,
preserving voice identity far better than the older x-vector blending approach.
Parentheticals are stored in the fountain file for future use with `qwen3_custom`
(fine-tuned speakers that support the `instruct` parameter for per-line emotional direction).

See `01_Tasks/backlog/recoil-custom-voice-finetune.md` for the fine-tuning plan.

## Output

```
[project]/audio/
    voice_config.yaml       # Voice assignments
    credit_log.json         # Per-episode credit tracking
    episodes/
        ep_001.mp3
        ep_002.mp3
        ...
    [project]_complete.mp3  # Combined file (--concat mode)
```

## Budget

ElevenLabs Starter plan: 60,000 chars/month.
Qwen3 (local): Free (GPU time only).

Typical episode: ~2,200 chars (full narration) or ~230 chars (dialogue-only).

## Notes

- Voice direction modifies the .fountain file permanently. This is intentional — the parentheticals are part of the script.
- Each `generate` call costs credits (ElevenLabs) or GPU time (Qwen3). Use `dry-run` first.
- `--force` regenerates existing files. Without it, existing MP3s are skipped.
- `--skip-voice-direct` skips Phase 0. Use when parentheticals are already good and you just want to regenerate audio.
- Side characters share voices from the pool via deterministic hashing.
- Adjust `silence_ms` in config to tune pacing. Regenerate with `--force` after changes.