# Architecture Consultation: Client-Side StepRunner

## Executive Summary

**Option B (Lightweight Wrapper) is the correct choice.** It is the only option that lets you ship now, avoids forking battle-tested code, and creates a clean architectural precedent for future non-series projects. But it needs to be done with discipline — not as a throwaway shim, but as a proper `ClientSequenceRunner` that becomes the canonical way to run any non-series project through the pipeline.

Below is the full analysis.

---

## 1. Architecture Review: What's Actually Reusable

Before evaluating options, let's be precise about the coupling layers in the current system.

### Truly Generic (model-agnostic, workflow-agnostic)
- `api_client.py` / `FalAiKlingClient` — pure API transport
- `ElementManager` — element resolution and payload building
- `StepRunner.execute_video()` — single-shot I2V/T2V, parameterized
- `StepRunner.execute_multi_shot()` — multi-prompt sequence, parameterized
- `ExecutionStore` — state persistence (the state *machine* is coupled, but the *store* is generic)
- Take management, cost logging, save logic
- Visual validation critics
- `lib/slicer.py`

### Recoil-Coupled (series workflow assumptions baked in)
- `pipeline.py` — the 13-step orchestrator. This is 100% series-specific. No client project will ever call it.
- `recoil_bridge.py` — reads Recoil narrative engine output. Dead code for client projects.
- `scene_planner.py` / routing — tier classification for series shots. Irrelevant to client work.
- The ExecutionStore *state machine* (the 25+ states and their transitions) — ~60% of states are previz/keyframe pipeline stages that client video skips entirely.

### Awkwardly In-Between
- `client_bridge.py` — exists but is read-only. It loads data but doesn't execute anything.
- Grid exploration — proven workflow with zero formalization. Lives entirely in ad-hoc console interactions.
- Sequence concept — the plan format supports it, but nothing in StepRunner or ExecutionStore understands sequences as a unit.

**Key finding:** The reusable surface is large and well-factored. The coupling is concentrated in `pipeline.py` and `recoil_bridge.py`, which are orchestration layers you'd never call anyway. StepRunner itself is already surprisingly generic — its `execute_video` and `execute_multi_shot` methods accept all their configuration as parameters. The problem isn't that StepRunner is coupled; it's that there's no orchestration layer between "raw StepRunner methods" and "full 13-step series pipeline."

---

## 2. Option Evaluation

### Option A: Fork StepRunner — REJECT

This is the worst option. Here's why:

- **StepRunner is already generic.** The methods you'd call (`execute_video`, `execute_multi_shot`) take all their inputs as parameters. There's nothing to fork away — you'd be duplicating working code to remove methods you simply wouldn't call.
- **Fork divergence is guaranteed.** When you fix a bug in save logic, take management, cost tracking, or validation hooks, you'll need to fix it in both runners. As a solo developer, you will forget. One runner will drift.
- **It misidentifies the problem.** The issue isn't that StepRunner has the wrong interface. The issue is that there's no *sequence-level orchestrator* that sits between the human and StepRunner. Forking StepRunner doesn't create that orchestrator.

### Option B: Lightweight Wrapper — RECOMMENDED

This correctly identifies the architectural gap: you need a **sequence orchestrator** that knows how to read a client plan, resolve its dependencies, and drive StepRunner's existing methods.

**Why it works:**
- StepRunner stays the single execution engine. One place for save logic, take management, cost logging, validation.
- The wrapper handles the *workflow differences* (sequence ordering, grid exploration, element resolution, plan format) without touching the *execution mechanics*.
- Future client projects get the same wrapper with a different plan file. The pattern scales.
- Zero risk to the series pipeline. You're adding code, not modifying code.

**The concern about "two levels of abstraction" is a feature, not a bug.** `pipeline.py` is already a wrapper around StepRunner for series work. `ClientSequenceRunner` is the equivalent wrapper for client work. They share the same execution engine but implement different workflows. This is correct layering.

### Option C: Build New — REJECT

The "massive duplication of battle-tested code" con is the whole story. You'd be rebuilding save logic, take management, cost tracking, validation hooks, API retry handling, and error recovery — all of which have been debugged in production over months. For a solo developer under time pressure, this is a trap. You'd ship faster initially, then spend weeks re-discovering edge cases that StepRunner already handles.

---

## 3. Answers to the Six Questions

### Q1: Which option for a solo developer who needs to ship NOW?

**Option B.** Concrete reasoning:

- Time to first client video generation: ~2-4 hours of implementation (the wrapper is thin by design).
- Time to production-ready: ~1-2 days (add grid exploration, sequence state, Console hooks).
- Risk to existing pipeline: zero (additive only).
- Maintenance burden: minimal (one execution engine, two workflow orchestrators).

### Q2: Should grid exploration be formalized?

**Yes, but not as a StepRunner method.** Grid exploration is a *creative workflow step*, not an execution primitive. It belongs in `ClientSequenceRunner`, not in StepRunner.

Here's the formalization I recommend:

```python
class GridExplorer:
    """Manages the grid exploration → pick → crop workflow."""

    def generate_grid(self, prompt: str, style: str = "2x2",
                      model: str = "gemini", aspect_ratio: str = "16:9") -> GridResult:
        """Generate a grid of options. Returns grid image + metadata."""

    def pick_quadrant(self, grid_result: GridResult, quadrant: int) -> Path:
        """Crop the selected quadrant to a clean start frame."""

    def approve_start_frame(self, frame_path: Path, sequence_id: str) -> None:
        """Mark a start frame as approved for a sequence."""
```

This is a standalone utility class that `ClientSequenceRunner` calls when a sequence needs a start frame. It could later be used by the series pipeline too (e.g., for location exploration). But it should not be bolted onto StepRunner, which is an *execution* class, not an *exploration* class.

**Where it saves output:** `projects/driver-beware/output/grids/{sequence_id}/` for grids, and the approved crop goes to `output/frames/ep_001/clean/` as it does now.

### Q3: How should sequence-level state be tracked?

**In a dedicated sequence state file, managed by `ClientSequenceRunner`.** Not in ExecutionStore.

Here's why: ExecutionStore tracks *shot-level execution state* (pending → generating → complete → approved). A sequence is a *workflow unit*, not an execution unit. One sequence maps to one `execute_multi_shot` call, but the sequence also has pre-execution state (element resolution, start frame approval) and post-execution state (review, iteration).

Proposed state file: `projects/driver-beware/state/starsend/sequence_state.json`

```json
{
  "EP001": {
    "SEQ01": {
      "status": "complete",
      "start_frame": "output/frames/ep_001/clean/seq01_start.png",
      "start_frame_approved": true,
      "elements_resolved": true,
      "video_path": "output/video/seq01_take03.mp4",
      "takes": 3,
      "current_take": 3,
      "completed_at": "2026-03-27T14:30:00Z",
      "notes": "Final version approved by JT"
    },
    "SEQ02": {
      "status": "start_frame_pending",
      "grid_path": "output/grids/SEQ02/grid_001.png",
      "start_frame": null,
      "start_frame_approved": false,
      "elements_resolved": true,
      "video_path": null,
      "takes": 0,
      "current_take": null
    },
    "SEQ03": {
      "status": "not_started"
    }
  }
}
```

**Sequence states (minimal):**
1. `not_started` — in the plan but no work done
2. `elements_pending` — elements need resolution
3. `start_frame_pending` — needs grid exploration or start frame selection
4. `start_frame_approved` — start frame locked, ready to generate
5. `generating` — multi-shot API call in flight
6. `review` — video generated, awaiting JT review
7. `complete` — approved
8. `iterating` — re-generating with adjusted prompts

That's 8 states vs. 25+. Clean, purpose-built, no dead states.

**Individual shots within a sequence** still go through ExecutionStore for take management and output tracking — but they use a minimal state subset (see Q4).

### Q4: Should client projects use a simplified ExecutionStore state machine?

**Yes.** But don't create a second ExecutionStore class. Instead, add a `state_profile` concept.

The ExecutionStore already has a `transition()` method that validates state transitions. Add a profile that defines which states and transitions are valid:

```python
STATE_PROFILES = {
    "series": {
        # Full 25+ state machine (existing VALID_TRANSITIONS)
    },
    "client_video": {
        # Minimal: just video generation + approval
        "states": ["planned", "elements_ready", "video_pending",
                    "video_generating", "video_complete", "approved", "rejected"],
        "transitions": {
            "planned": ["elements_ready"],
            "elements_ready": ["video_pending"],
            "video_pending": ["video_generating"],
            "video_generating": ["video_complete", "video_pending"],  # retry
            "video_complete": ["approved", "rejected"],
            "rejected": ["video_pending"],  # re-generate
        }
    }
}
```

The profile is set at the project level via `project_config.json`'s `project_type` field (already `"client_video"` for Driver Beware). ExecutionStore reads it once at init and enforces accordingly.

**This is a small, safe change to ExecutionStore.** The default profile remains `"series"`. Client projects opt into the reduced profile. No existing behavior changes.

### Q5: Production Console integration story

**Phase the integration. Don't try to make the full Console work on day one.**

**Immediate (ship now):**
- The Console's Board tab can work if it reads from `sequence_state.json` instead of (or in addition to) ExecutionStore for client projects. Add a `project_type` check: if `client_video`, render sequences as the primary unit instead of individual shots.
- Dailies tab already works — it reads from the output directory. Client video outputs land in the same structure. No changes needed.

**Near-term (next week):**
- Add a "Sequence Runner" panel to the Console that shows the 12 sequences, their states, and lets JT trigger the workflow (resolve elements → explore grid → approve start frame → generate → review).
- This replaces the ad-hoc Python console workflow that's been used for SEQ08-09.

**Don't do:**
- Don't try to make the full pipeline steps (Script Lock through Export) work for client projects. They don't apply. The Console should detect `project_type` and show the appropriate UI.

### Q6: How to handle the different plan format?

**Do NOT convert on load. Do NOT create an adapter that pretends client plans are series plans.**

The formats are different because the workflows are different. Forcing a conversion creates a leaky abstraction — you'll constantly be translating between "sequence with shots" and "flat shots with routing," and every translation will lose information (song timestamps, director notes, sequence-level element assignments).

**Instead:** `ClientSequenceRunner` reads the client plan format natively. It understands `sequences[].shots[]`. It uses that structure directly when building multi-shot calls.

The client plan format is good as-is. It captures what client video needs. Keep it.

---

## 4. Proposed Architecture

### New Files

```
starsend/
├── client_runner.py              # ClientSequenceRunner — the wrapper
├── grid_explorer.py              # GridExplorer — formalized grid workflow
├── models/
│   └── client_plan.py            # ClientPlan, ClientSequence, ClientShot dataclasses
```

### Modified Files

```
starsend/
├── execution_store.py            # Add state_profile support (small, safe change)
├── client_bridge.py              # Extend: add sequence_state read/write
├── console/                      # Add project_type branch for client UI (phased)
```

### Untouched Files

```
starsend/
├── pipeline.py                   # DO NOT TOUCH
├── step_runner.py                # DO NOT TOUCH
├── recoil_bridge.py              # DO NOT TOUCH
├── scene_planner.py              # DO NOT TOUCH
├── api_client.py                 # DO NOT TOUCH
├── element_manager.py            # DO NOT TOUCH
```

### `client_runner.py` — Core Design

```python
class ClientSequenceRunner:
    """Orchestrates client video workflow at the sequence level.

    This is the client-video equivalent of pipeline.py for series.
    It reads client plans, manages sequence state, and drives StepRunner.
    """

    def __init__(self, project: str, episode: str = "EP001"):
        self.project = project
        self.episode = episode
        self.project_config = load_client_project_config(project)
        self.plan = self._load_plan()
        self.sequence_state = self._load_sequence_state()
        self.paths = ProjectPaths(project)
        self.store = ExecutionStore(self.paths, state_profile="client_video")
        self.step_runner = StepRunner(self.store, self.paths)
        self.element_manager = ElementManager(project)
        self.grid_explorer = GridExplorer(self.paths)

    # --- Sequence Lifecycle ---

    def get_status(self) -> dict:
        """Return status of all sequences."""

    def execute_sequence(self, seq_id: str, start_frame: Path = None) -> list[StepResult]:
        """Full sequence execution: resolve elements → build prompts → multi-shot."""

    def iterate_sequence(self, seq_id: str, prompt_overrides: dict = None,
                         start_frame: Path = None) -> list[StepResult]:
        """Re-run a sequence with adjustments. Increments take number."""

    # --- Grid Exploration ---

    def explore_grid(self, seq_id: str, prompt: str = None,
                     style: str = "2x2") -> GridResult:
        """Generate start frame options for a sequence."""

    def pick_start_frame(self, seq_id: str, grid_result: GridResult,
                         quadrant: int) -> Path:
        """Select and crop a start frame from a grid."""

    def approve_start_frame(self, seq_id: str) -> None:
        """Lock the start frame for a sequence."""

    # --- Element Management ---

    def resolve_elements(self, element_ids: list[str]) -> dict:
        """Resolve element IDs from plan to API payload."""

    # --- State Management ---

    def _load_sequence_state(self) -> dict: ...
    def _save_sequence_state(self) -> None: ...
    def _transition_sequence(self, seq_id: str, to_state: str) -> None: ...
```

### Call Flow: Execute a Sequence

```
JT (via Console or Python):
  runner.explore_grid("SEQ04", prompt="Suburban street, golden hour")
  → GridExplorer.generate_grid() → Gemini API → saves grid image
  → Returns GridResult with grid path + quadrant coordinates

JT reviews grid, picks quadrant 3:
  runner.pick_start_frame("SEQ04", grid_result, quadrant=3)
  → GridExplorer.pick_quadrant() → crops image → saves clean frame
  → sequence_state["SEQ04"]["status"] = "start_frame_pending"

JT approves:
  runner.approve_start_frame("SEQ04")
  → sequence_state["SEQ04"]["status"] = "start_frame_approved"

JT triggers generation:
  runner.execute_sequence("SEQ04")
  → resolve_elements(["blue_car", "driver"]) via ElementManager
  → build multi_prompt_sequence from plan shots
  → StepRunner.execute_multi_shot(batch, prompts, "kling-o3", start_frame, "16:9", elements)
  → sequence_state["SEQ04"]["status"] = "generating" → "review"
  → Returns StepResults

JT reviews, wants adjustment:
  runner.iterate_sequence("SEQ04", prompt_overrides={"shot_3": "Closer angle..."})
  → Increments take, re-runs with modified prompts
```

---

## 5. Risk Analysis

### Risk 1: Sequence State / ExecutionStore Split Brain
**What could go wrong:** Sequence state says SEQ04 is "complete" but ExecutionStore says the underlying shots are "video_generating" (API call failed silently).

**Mitigation:** `ClientSequenceRunner._transition_sequence()` must verify downstream state before transitioning. Never mark a sequence "complete" without checking that all StepResults returned successfully. Add a `reconcile()` method that cross-checks sequence state against ExecutionStore.

### Risk 2: StepRunner API Changes Break Client Runner
**What could go wrong:** A StepRunner refactor changes `execute_multi_shot`'s signature, breaking `ClientSequenceRunner` without anyone noticing (no shared test suite).

**Mitigation:** Write 3-5 integration tests that call `ClientSequenceRunner.execute_sequence()` with mocked API responses. These tests exercise the actual StepRunner interface and will fail immediately if it changes. This is a 1-hour investment that pays for itself on the first refactor.

### Risk 3: Grid Exploration Scope Creep
**What could go wrong:** GridExplorer grows to handle every possible start frame workflow (Gemini grids, DALL-E grids, manual upload, video frame extraction, inpainting). It becomes its own mini-pipeline.

**Mitigation:** Keep GridExplorer focused on exactly two operations: generate grid, crop quadrant. Manual upload and video frame extraction are just "set start frame path" — they don't need GridExplorer.

### Risk 4: Console Integration Becomes a Rewrite
**What could go wrong:** The Console is so deeply structured around the 13-step pipeline that adding client support requires touching every component.

**Mitigation:** Phase it. Day 1: don't touch the Console at all. Run client workflow from Python console using `ClientSequenceRunner` directly. Week 2: add a single "Client Sequences" tab.

### Risk 5: Plan Format Proliferation
**What could go wrong:** Each new client project invents its own plan format. You end up with 5 formats and 5 parsers.

**Mitigation:** Formalize the current client plan format as `ClientPlanV1` with a JSON schema. Future client projects use the same format.

### Hidden Coupling Point: `ProjectPaths`
If `ProjectPaths` assumes series directory structure, `ClientSequenceRunner` will hit path errors on init. Audit `ProjectPaths` before starting implementation.

### Hidden Coupling Point: `execute_multi_shot` Batch ID Format
StepRunner may use the batch ID to derive file paths or state keys. If it expects `EP001_SC003` and you pass `EP001_SEQ04`, path collisions or lookup failures could occur. Verify the batch ID is treated as an opaque string.

---

## 6. Implementation Sequence

### Day 1: Ship Path (4 hours)
1. Create `models/client_plan.py` — dataclasses for ClientPlan, ClientSequence, ClientShot
2. Create `client_runner.py` — `ClientSequenceRunner` with `execute_sequence()` and `iterate_sequence()`
3. Create `grid_explorer.py` — `GridExplorer` with `generate_grid()` and `pick_quadrant()`
4. Create `sequence_state.json` for Driver Beware, pre-populate with current progress (SEQ08-09 complete)
5. Test: run SEQ10 through `ClientSequenceRunner` end-to-end

### Day 2: Polish (4 hours)
6. Add state profile support to ExecutionStore (small, safe change)
7. Extend `client_bridge.py` to read/write sequence state
8. Add `reconcile()` method to cross-check sequence state vs ExecutionStore
9. Write 3-5 integration tests with mocked API responses

### Week 2: Console Integration
10. Add `project_type` detection to Console
11. Build "Client Sequences" status board
12. Wire up sequence actions (explore → pick → approve → generate → review)

---

## 7. Final Recommendation

**Build `ClientSequenceRunner` as a thin, disciplined wrapper around StepRunner.** It fills the exact architectural gap that exists: there's no sequence-level orchestrator for non-series projects. The series pipeline has `pipeline.py` for this role; client video gets `client_runner.py`.

The key principles:
- **StepRunner is the execution engine.** It doesn't know or care whether it's running series or client work. It takes parameters and generates output.
- **Orchestrators are workflow-specific.** `pipeline.py` orchestrates the 13-step series workflow. `ClientSequenceRunner` orchestrates the sequence-based client workflow. Both call the same StepRunner.
- **State lives at the right level.** Shot execution state in ExecutionStore. Sequence workflow state in `sequence_state.json`. Project configuration in `project_config.json`. No mixing.
- **Don't abstract prematurely.** If a third workflow type emerges (e.g., music video, documentary), THEN consider whether `ClientSequenceRunner` and `pipeline.py` should share a base class. Not before.

This gives you a working client pipeline in a day, doesn't risk the series pipeline, and creates a clean pattern for future non-series projects.