# Pairwise Comparison

> Stage 3 of the Evaluation Pipeline. Head-to-head comparison with reasoning-before-judgment.

The key insight: **Forcing analysis before stating preference dramatically improves consistency.**

---

## Why Pairwise Works

Asking "Rate this option 1-10" is unreliable:
- Yesterday's 7 is today's 6
- Anchoring effects distort scores
- No consistent baseline

Asking "Which of these two options better achieves X?" is reliable:
- Direct comparison reduces drift
- Reasoning first prevents gut reactions
- Specific criteria constrain judgment

---

## Core Methodology

### Step 1: Present Both Options
Include full content of each option side by side.

### Step 2: Force Analysis of Each
Require explicit analysis of strengths BEFORE stating preference.

### Step 3: Identify Trade-off
Make the evaluator articulate what you gain/lose with each choice.

### Step 4: State Choice with Reasoning
Only after analysis, state which option wins and why.

---

## Prompt Template

```
PAIRWISE COMPARISON

OBJECTIVE: {development_objective}

EVALUATION CRITERIA:
{specific_criteria_for_this_decision}

───────────────────────────────────────────────────────────────

OPTION A: {option_a_name}
{option_a_full_content}

───────────────────────────────────────────────────────────────

OPTION B: {option_b_name}
{option_b_full_content}

───────────────────────────────────────────────────────────────

TASK (Complete in order):

1. ANALYZE OPTION A
   What are Option A's specific strengths relative to the objective?
   What does Option A do particularly well?

2. ANALYZE OPTION B
   What are Option B's specific strengths relative to the objective?
   What does Option B do particularly well?

3. IDENTIFY THE TRADE-OFF
   What is the core trade-off between these options?
   What do you gain with A that you lose with B, and vice versa?

4. STATE YOUR CHOICE
   Which option better accomplishes the objective?
   Explain your reasoning with specific reference to the criteria.

5. CONFIDENCE
   How confident are you in this choice? (High/Medium/Low)
   If Low or Medium, what additional information would help?

───────────────────────────────────────────────────────────────

OUTPUT FORMAT:

OPTION A STRENGTHS:
[analysis]

OPTION B STRENGTHS:
[analysis]

CORE TRADE-OFF:
[what you gain/lose with each]

WINNER: [A or B]

REASONING:
[specific justification with reference to criteria]

CONFIDENCE: [High/Medium/Low]
[if not High, what would help]
```

---

## Tournament Structure

When comparing more than 2 options, use bracket elimination:

### 3 Options
```
Round 1: A vs B → Winner 1
Round 2: Winner 1 vs C → Final Winner
```

### 4+ Options
```
Bracket elimination until final winner.
Consider seeding by rubric scores (highest vs lowest first).
```

### Tie or Close Call
If confidence is Low, or margin seems thin:
1. Run ensemble voting (3-5 iterations)
2. Take majority winner
3. If still tied, escalate to adversarial review

---

## Ensemble Voting

For critical decisions, run the same pairwise comparison multiple times:

```
Run 1: A wins
Run 2: B wins
Run 3: A wins
Run 4: A wins
Run 5: B wins

Result: A wins (3-2)
```

### Implementation
- Use different prompt phrasing each run (same content, varied framing)
- OR use temperature variation
- OR use different model instances
- Majority vote determines winner

### When to Use
- High-stakes decisions (protagonist choice, ending direction)
- Close calls from single run
- User-requested verification

---

## Output Format

### Clear Winner
```
═══════════════════════════════════════════════════════════════
PAIRWISE COMPARISON: Anchor Type Selection
═══════════════════════════════════════════════════════════════

OPTIONS COMPARED:
  A: The Cub (protection dynamic)
  B: The Mirror (ASI as anchor)

OPTION A STRENGTHS:
  • Proven pattern—mentor/protector relationships work reliably
  • Clear emotional beats (danger → protection → sacrifice)
  • Lower execution risk

OPTION B STRENGTHS:
  • Directly embodies thematic question (trust across minds)
  • Creates unique relationship not seen in other stories
  • Higher dramatic potential (non-human connection)

CORE TRADE-OFF:
  A: Safety and reliability
  B: Thematic depth and uniqueness (with higher risk)

───────────────────────────────────────────────────────────────
WINNER: B (The Mirror)

REASONING:
While Option A is safer, the thematic alignment of Option B makes
it the stronger choice for this specific story. ASI-BRIDGE's core
question—can different minds trust each other—is directly embodied
when the protagonist's most important relationship IS with a
different kind of mind. The risk is worth the potential payoff.

CONFIDENCE: High
═══════════════════════════════════════════════════════════════
```

### Close Call / Needs Escalation
```
═══════════════════════════════════════════════════════════════
PAIRWISE COMPARISON: Anchor Type Selection
═══════════════════════════════════════════════════════════════

[analysis sections...]

WINNER: B (The Mirror) - MARGINAL

REASONING:
Both options have strong cases. A provides emotional reliability;
B provides thematic depth. This decision hinges on risk tolerance.

CONFIDENCE: Medium

RECOMMENDATION: Run ensemble voting or escalate to adversarial
review for high-stakes validation.
═══════════════════════════════════════════════════════════════
```

---

## Common Pitfalls

### Analysis Too Short
Require substantive analysis before judgment. One-sentence analysis = untrustworthy.

**Bad:**
```
OPTION A STRENGTHS: It's a proven pattern.
OPTION B STRENGTHS: It's more thematic.
```

**Good:**
```
OPTION A STRENGTHS:
The Cub anchor creates an immediate emotional hook—audiences
naturally invest in protecting the vulnerable. This pattern has
worked in countless stories (Leon: The Professional, The Road,
The Mandalorian). It also provides clear dramatic beats: danger
to the cub forces protagonist action, protection creates tension,
and sacrifice pays off the relationship. Execution is well-understood.
```

### Judgment Before Analysis
If the winner is stated before analysis is complete, the analysis is suspect. Insist on order.

### Vague Trade-off
"A is better at X, B is better at Y" is not a trade-off. A trade-off explains what you LOSE by choosing each option.

### Ignoring Criteria
The choice must reference the specific evaluation criteria, not general quality.

---

## Integration

**Called by:** Showrunner Agent after rubric scoring

**Input:**
- Top 2-3 options from rubric scoring
- Development objective
- Specific evaluation criteria for this decision

**Output:**
- Winner with reasoning
- Confidence level
- Optional: Escalation recommendation

**Next step:** If confidence is high, present to user. If medium/low, escalate to adversarial review.
