# Adversarial Review

> Stage 4 of the Evaluation Pipeline. Advocate vs. Advocate with independent Judge.

For high-stakes decisions where reliability is paramount.

---

## Why Adversarial Works

Single-evaluator reasoning has blind spots:
- Confirmation bias after initial lean
- Failure to steelman alternatives
- Missing weaknesses in preferred option

Adversarial structure forces:
- Best possible case for EACH option
- Explicit attack on weaknesses
- Neutral judgment of arguments

---

## Core Methodology

### Role 1: Advocate A
Make the strongest possible case FOR Option A.
- Steelman all strengths
- Address potential weaknesses proactively
- Explain why this option serves the objective better

### Role 2: Advocate B
Make the strongest possible case FOR Option B.
- Same approach, opposite option
- Cannot simply rebut Advocate A—must make independent case

### Role 3: Judge
Evaluate both arguments:
- Which advocate made the stronger case?
- What arguments were most compelling?
- What weaknesses were unaddressed?
- Final verdict

---

## Prompt Templates

### Advocate A Prompt

```
ADVERSARIAL REVIEW: Advocate for Option A

You are the advocate for Option A. Your job is to make the STRONGEST
POSSIBLE case for why Option A should be selected.

OBJECTIVE: {development_objective}

OPTION A: {option_a_name}
{option_a_full_content}

OPTION B (for context only—do NOT argue for this):
{option_b_summary}

YOUR TASK:

1. STEELMAN OPTION A
   Present Option A's strengths in the best possible light.
   What does it accomplish that no other option can?

2. ADDRESS WEAKNESSES
   Acknowledge Option A's potential weaknesses.
   Explain why they are manageable or acceptable.

3. DIFFERENTIATE FROM OPTION B
   Without attacking Option B, explain why A is the better choice.
   What does A provide that B cannot?

4. CLOSING ARGUMENT
   In 2-3 sentences, make your strongest case for Option A.

FORMAT:

STRENGTHS OF OPTION A:
[comprehensive steelman]

ADDRESSING CONCERNS:
[weakness acknowledgment and mitigation]

WHY A OVER B:
[differentiation without attack]

CLOSING ARGUMENT:
[strongest 2-3 sentence case]
```

### Advocate B Prompt
(Same structure, reverse options)

### Judge Prompt

```
ADVERSARIAL REVIEW: Judge

You are the impartial judge. Two advocates have presented cases for
different options. Your job is to evaluate their arguments and
determine which advocate made the stronger case.

OBJECTIVE: {development_objective}

ADVOCATE A's CASE:
{advocate_a_full_argument}

ADVOCATE B's CASE:
{advocate_b_full_argument}

YOUR TASK:

1. EVALUATE ADVOCATE A's ARGUMENT
   What were Advocate A's strongest points?
   What weaknesses did they fail to address?
   How compelling was their case overall?

2. EVALUATE ADVOCATE B's ARGUMENT
   What were Advocate B's strongest points?
   What weaknesses did they fail to address?
   How compelling was their case overall?

3. COMPARE ARGUMENTS
   Which advocate made the more convincing case?
   What was the deciding factor?
   Were there arguments that neither advocate made that should have been considered?

4. VERDICT
   Declare the winner based on strength of argument.
   State your confidence level.

FORMAT:

ADVOCATE A EVALUATION:
  Strongest points: [list]
  Unaddressed weaknesses: [list]
  Overall strength: [weak/moderate/strong]

ADVOCATE B EVALUATION:
  Strongest points: [list]
  Unaddressed weaknesses: [list]
  Overall strength: [weak/moderate/strong]

DECIDING FACTOR:
[what swayed the decision]

VERDICT: [A or B wins]

CONFIDENCE: [High/Medium/Low]

REASONING:
[summary of why this verdict]
```

---

## When to Use Adversarial Review

### Always Use For:
- **Protagonist definition** — Character identity is irreversible
- **Ending direction** — Sets destination for entire arc
- **Major structural decisions** — Act breaks, key plot points
- **Thematic question refinement** — Foundation for everything else

### Use When:
- Pairwise comparison confidence is Medium or Low
- Rubric scores are within 10% of each other
- Decision is flagged as "locked" (🔒) in episode_arc.md
- User requests additional validation

### Skip When:
- Pairwise comparison confidence is High
- Rubric scores show clear winner (>20% gap)
- Decision is flagged as flexible (🎲)
- Time/cost constraints apply

---

## Output Format

```
═══════════════════════════════════════════════════════════════
ADVERSARIAL REVIEW: Anchor Type Selection
═══════════════════════════════════════════════════════════════

ADVOCATE A (The Cub):
"The Cub anchor provides proven emotional stakes. Audiences
universally respond to protection dynamics—it's hardwired.
While less thematically unique, this reliability ensures the
emotional engine runs consistently across 60 episodes. The
Mirror anchor risks feeling cold if execution falters. With
The Cub, we have a floor of 'good enough' that The Mirror
cannot guarantee."

ADVOCATE B (The Mirror):
"The Mirror anchor makes ASI-BRIDGE unique. Every thriller has
a protector/protected dynamic—none have a human learning to
trust their reflection in an alien mind. Yes, it's harder to
execute, but this story is ABOUT different minds connecting.
Choosing The Cub would be choosing safety over purpose. The
difficulty of The Mirror IS the point—if it were easy, it
wouldn't be worth telling."

───────────────────────────────────────────────────────────────

JUDGE EVALUATION:

ADVOCATE A (The Cub):
  Strongest points: Reliability, emotional floor, proven pattern
  Unaddressed: Doesn't explain how Cub serves THIS theme
  Overall: Moderate case—safe but uninspired

ADVOCATE B (The Mirror):
  Strongest points: Thematic necessity, uniqueness, "difficulty is the point"
  Unaddressed: Execution risk mitigation strategies
  Overall: Strong case—compelling vision, acknowledged risk

DECIDING FACTOR:
Advocate B's argument that "the difficulty IS the point" reframed
the risk as feature rather than bug. The Cub advocate never
explained how a generic anchor type serves this specific story.

───────────────────────────────────────────────────────────────
VERDICT: Option B (The Mirror) WINS

CONFIDENCE: High

REASONING:
The Mirror anchor's direct embodiment of the thematic question
makes it the right choice for this story, even with execution
risk. Advocate B successfully argued that choosing safety would
undermine the story's purpose. The execution risk should be
addressed through careful development, not option selection.
═══════════════════════════════════════════════════════════════
```

---

## Reliability Boosters

### Multiple Judge Runs
Run the Judge step 3 times with slightly varied prompts. Take majority verdict.

### Devil's Advocate Addition
After both advocates present, have a third role identify arguments NEITHER made:
```
DEVIL'S ADVOCATE:
Arguments neither advocate made:
1. What about a hybrid—ASI that protects a Cub-like figure?
2. Could The Mirror evolve INTO a Cub dynamic over 60 episodes?
3. Risk of Mirror: audience may not empathize with non-human...
```

### Cross-Examination
Optional step where advocates can respond to each other:
```
ADVOCATE A RESPONDS TO B:
"The argument that 'difficulty is the point' assumes we can
execute well. What if we can't? The Cub provides a fallback."

ADVOCATE B RESPONDS TO A:
"The fallback argument assumes we should optimize for failure.
If we're not confident we can execute The Mirror, why are we
telling this story at all?"
```

---

## Integration

**Called by:** Showrunner Agent when:
- Pairwise comparison confidence < High
- Decision is marked as high-stakes
- User requests additional validation

**Input:**
- Two options to compare
- Development objective
- Prior analysis (rubric scores, pairwise reasoning)

**Output:**
- Winner with adversarial analysis
- Confidence level
- Summary of winning argument

**Next step:** Present to user with full adversarial context.

---

## Cost Considerations

Adversarial review requires 3 model calls:
- Advocate A
- Advocate B
- Judge

For maximum reliability, use Opus for all three roles.

**Estimated cost:** $0.15-0.30 per adversarial review

Reserve for decisions where the cost is justified by stakes.
