# SYNTHESIS: Client-Side StepRunner Architecture

## Consultation: Dual (Gemini 3.1 Pro + Claude Opus 4.6), 3 rounds each
## Date: 2026-03-27
## Topic: How to extend the visual pipeline for client video projects

---

## Executive Decision

**Build a `ClientSequenceRunner` orchestrator** (Option B — Lightweight Wrapper). Both engines unanimously selected this option across all 6 rounds. StepRunner stays untouched except for one 3-line parameter addition.

---

## Agreed Decisions (HIGH CONFIDENCE — both engines converge)

### 1. Architecture: Wrapper, Not Fork
- `ClientSequenceRunner` is a new orchestrator class that calls existing StepRunner primitives
- It is the client-video equivalent of `pipeline.py` for series work
- StepRunner is the shared execution engine — it doesn't know or care whether it's running series or client work
- Zero risk to the existing series pipeline (additive only)

### 2. No New StepRunner Methods
- Grid exploration reuses `execute_keyframe()` with a grid-specific prompt and `SEQ01_grid` shot_id
- No `generate_grid()` method needed — the orchestrator composes existing primitives
- The only StepRunner change: add optional `shot_prompts` parameter to `execute_multi_shot()` (~3 lines)

### 3. No ExecutionStore Changes
- No new states. No state_profile concept. No grid_exploring state.
- Grid exploration state lives entirely in the orchestrator's `sequences.json`
- ExecutionStore tracks shot-level execution (video_submitted → video_complete)
- Orchestrator tracks workflow state (PLANNED → GRID_REVIEW → GENERATING → APPROVED)

### 4. Sequence State in Separate File
- `projects/{project}/state/client/sequences.json` — owned by ClientSequenceRunner
- 8 workflow states: not_started, grid_exploring, grid_review, start_frame_approved, generating, review, approved, final
- ExecutionStore and sequences.json are complementary, not overlapping

### 5. Mixed-Mode Sequences
- Default: `execute_multi_shot()` for the whole sequence (one API call)
- Fallback: individual `execute_video()` per shot when specific shots need their own start frames
- Per-take mode selection (same sequence can switch between multi-shot and individual across takes)
- Individual shots use real shot_ids in ExecutionStore (e.g., `SEQ08_shot03`)

### 6. Shot-Level Prompt Tracking
- Store the per-shot prompt array on each multi-shot take record via `shot_prompts` field
- Passed through StepRunner (Option a) — single writer, clean data flow
- Enables cherry-picking prompts across takes during iteration

### 7. Element Cap Handling
- Orchestrator enforces the 3-element cap when a start frame is present (vs 4 without)
- Priority order from the plan's element array (first 3 win)
- Loud warning log when elements are dropped
- ElementManager itself is unchanged — it already works correctly for client projects

### 8. Client Plan Format
- Read natively. Do NOT convert to series format.
- `sequences[].shots[]` structure is correct for the client workflow
- No adapter pattern needed — ClientSequenceRunner understands this format directly

### 9. CLI Tool
- New `tools/client_generate.py` wrapping ClientSequenceRunner
- Commands: `grid`, `pick`, `generate`, `status`, `approve`
- Do NOT extend `test_via_steprunner.py` — that's the low-level exerciser

### 10. Console Integration — Phased
- **Day 1:** No Console changes. CLI is the interface.
- **Day 1:** Dailies tab works naturally — video takes with SEQ IDs show up for review
- **Week 2:** Build a Sequence Board view showing the 12 sequences mapped to song structure

---

## Disagreements Resolved Through Dialogue

| Topic | Gemini R1 | Opus R1 | Final (R3) |
|-------|-----------|---------|------------|
| Grid exploration | StepRunner method | Separate GridExplorer class | **Orchestrator calls execute_keyframe()** |
| Grid state | Map to keyframe_* states | Orchestrator-only | **Orchestrator-only (sequences.json)** |
| State machine | Don't touch | Add state_profile | **Don't touch** |
| generate_grid() | Yes, add to StepRunner | Yes, add to StepRunner | **No — reuse execute_keyframe()** |
| Prompt tracking | Not addressed until R2 | shot_prompts on takes | **shot_prompts via StepRunner param** |
| Mixed-mode | SEQ01_SH03 naming | SEQ08_shot02 naming | **Both valid — orchestrator picks** |
| Biggest risk | ElementManager schema | Grid prompt inconsistency | **Grid prompt inconsistency** (Opus more insightful) |

---

## Files to Create

| Path | Description |
|------|-------------|
| `starsend/tools/client_generate.py` | CLI entry point — argparse, dispatches to runner |
| `starsend/tools/client_sequence_runner.py` | Orchestrator — sequence lifecycle, grid workflow, StepRunner composition |
| `starsend/tools/image_utils.py` | Grid cell extraction, image crop utilities |

## Files to Modify

| Path | Change |
|------|--------|
| `starsend/orchestrator/step_runner.py` | Add `shot_prompts` optional param to `execute_multi_shot()` — ~3 lines |
| `starsend/lib/client_bridge.py` | Flesh out data loaders, add sequence state read/write |

## Files NOT to Modify

- `execution_store.py` — no state machine changes
- `elements.py` — works as-is
- `api_client.py` — works as-is
- `pipeline.py` — DO NOT TOUCH (series orchestrator)
- `recoil_bridge.py` — irrelevant to client work
- `starsend_config.json` — no config changes

---

## Day 1 Implementation Order

1. **`image_utils.py`** — `extract_grid_cell(image_path, row, col, grid_size)`. Pure function, no deps. (10 min)
2. **`step_runner.py` mod** — Add `shot_prompts` param to `execute_multi_shot()`. (5 min)
3. **`client_sequence_runner.py` — grid workflow** — `explore_grid()` calls `execute_keyframe("SEQ01_grid", grid_prompt)`, `pick_cell()` crops and saves start frame, state management in `sequences.json`. (60-90 min)
4. **`client_sequence_runner.py` — generate workflow** — `generate()` resolves elements (with 3-cap), builds prompt array, calls `execute_multi_shot()` with `shot_prompts`. (45-60 min)
5. **`client_generate.py` CLI** — Wire up argparse: `grid`, `pick`, `generate`, `status`. (20 min)
6. **End-to-end test** — SEQ10: grid → pick cell → generate video. Verify sequences.json state, cost tracking, shot_prompts on takes, video on disk.

**Estimated time to first client video generation: 3-4 hours.**

---

## Risk Register

| Risk | Severity | Likelihood | Mitigation |
|------|----------|------------|------------|
| Grid prompt produces non-grid output | Medium | High | Retry with stronger prompt in orchestrator. Fallback: 4 separate images composited. Do NOT push retry into StepRunner. |
| Sequence state / ExecutionStore split brain | High | Low | `reconcile()` method cross-checks before state transitions. Never mark sequence complete without verifying StepResults. |
| `execute_multi_shot` batch ID format assumptions | Medium | Medium | Verify batch ID is treated as opaque string. Test with `SEQ01` before building all 12. |
| Console Dailies parsing expects series shot_id format | Low | Medium | Quick test: create a mock shot with ID `SEQ01` and verify Console renders it. |
| `ProjectPaths` assumes series directory structure | Medium | Low | Audit `ProjectPaths.for_episode()` — verify it works with client project structure. |
| Element cap silently drops characters | Medium | High | Loud warning log + CLI confirmation prompt when elements exceed cap. |

---

## Model Behavior Assumptions

| Decision | Depends On | Model Behavior |
|----------|-----------|----------------|
| Grid prompt producing 4-panel output | Gemini image model | Responds to "2x2 grid" instruction — inconsistent, needs retry logic |
| Kling O3 element cap of 3 with start frame | fal.ai Kling O3 API | Hard limit — 3 elements + start frame, or 4 elements without |
| Multi-shot prompt durations as strings | fal.ai Kling API | Duration must be string ("3"), not int (3) — already known |
| Grid cell extraction quality | Image resolution | 2x2 grids at 1024x1024 yield 512x512 cells — sufficient for start frames |

---

## Scoring Summary

**Winner: Opus (marginal, 87 vs 84 weighted)**
**Confidence: high** (consistent across position swap)

Both engines performed excellently. Gemini was more practical/ship-focused; Opus was more thorough in risk analysis and state design. The architecture recommendation is identical from both engines.
