Here is my architectural review and concrete action plan for the Recoil Studios visual pipeline, focusing on the immediate needs of the "Driver Beware" client project.

---

# 1. Architectural Review & The Core Misunderstanding

The fundamental issue here is a conflation of **Execution** and **Orchestration**. 

*   **`StepRunner` is your Execution Engine.** It is actually incredibly well-designed for reuse. It handles atomic operations: calling the API, managing takes, saving files, logging costs, and updating state. It doesn't care *why* it's generating a video, only *how*.
*   **`pipeline.py` is your Orchestrator.** It is deeply, hopelessly coupled to the Recoil scripted series workflow. 

Your current pain points aren't coming from `StepRunner`; they are coming from trying to shove a sequence-based, multi-shot client workflow through a shot-based, 13-step Recoil orchestrator.

**Conclusion:** `StepRunner` is highly reusable. `pipeline.py`, `recoil_bridge.py`, and `scene_planner.py` are completely useless for client work and should be bypassed entirely.

---

# 2. Evaluating the Options

### Option A: Fork StepRunner (`ClientStepRunner`)
**Verdict: Hard No.**
Forking the execution engine means you now have two places to fix API bugs, update Kling SDKs, manage cost logging, and tweak take logic. They *will* diverge, and as a solo developer, you will drown in maintenance debt. 

### Option C: Build New (`ClientPipeline`)
**Verdict: No.**
Building from scratch discards battle-tested take management, validation critics, and cost tracking. It violates the "ship NOW" constraint.

### Option B: Lightweight Wrapper (`ClientSequenceRunner`)
**Verdict: YES. This is the only viable path.**
You keep `StepRunner` exactly as it is. You write a new orchestrator (`client_pipeline.py`) that acts as the bridge between your client JSON plan and the `StepRunner`. It translates sequences into the inputs `StepRunner` expects.

---

# 3. Answers to Your 6 Specific Questions

**1. Which option is best for a solo developer shipping NOW?**
Option B. Keep `StepRunner` as your execution engine. Build a `ClientSequenceRunner` to handle the sequence-level orchestration.

**2. Should the grid exploration workflow be formalized?**
**Yes.** Because it hits the Gemini API, costs money, and generates artifacts, it needs take management. 
*Add* `execute_grid_exploration(prompt, aspect_ratio="16:9")` to `StepRunner`. 
*Add* a utility method `extract_grid_quadrant(grid_take_path, quadrant_1_to_4)` to your client orchestrator to slice the image and save it to the `clean/` start frames folder.

**3. How should sequence-level state be tracked?**
For client projects using multi-shot, **the Sequence *is* the Shot.** 
In `ExecutionStore`, treat `SEQ01` as the `shot_id`. The individual shots inside the sequence (`Extreme wide...`, `Tracking...`) are just an array of prompts passed to `execute_multi_shot`. Do not try to track state for individual prompts within a Kling multi-shot call; the API doesn't support it anyway.

**4. Should client projects use a simplified ExecutionStore state machine?**
**Do not rewrite the state machine.** Instead, define a **Client State Subset**. 
Just use: `pending` → `video_generating` → `video_complete` → `approved`. 
Skip the previz and keyframe states entirely. The `ExecutionStore` doesn't care if you skip states, as long as the terminal states (`video_complete`, `approved`) match so your UI knows when to render the video player.

**5. What's the right integration point with the Production Console?**
Because you are mapping `SEQ01` to `shot_id`, the **Dailies tab will work out of the box** (it just looks for video takes associated with an ID).
The **Board tab will break** because it expects the Recoil flat-shot format. 
*Fix:* Write a quick adapter in the Console backend that flattens `sequences` into UI cards, or simply hide the Board tab for client projects via a feature flag (`if project_config.type == 'client_video'`) and just use the Dailies tab for now.

**6. How should we handle the different plan format?**
**Convert on load (Adapter Pattern).** Do not change your JSON file—it makes sense for humans. When `client_bridge.py` loads `ep_001_plan.json`, it should yield a list of `SequenceExecutionUnit` objects. The orchestrator iterates over these units, not "shots".

---

# 4. Proposed Concrete Architecture

Here is exactly what you should build to ship this week.

### 1. New File: `client_pipeline.py`
This replaces `pipeline.py` for client work. 

```python
class ClientSequenceRunner:
    def __init__(self, project_id, episode_id="EP001"):
        self.paths = ProjectPaths(project_id)
        self.store = ExecutionStore(self.paths)
        self.step_runner = StepRunner(self.store, self.paths)
        
        # Load client specific data
        self.config = load_client_project_config(project_id)
        self.plan = load_client_storyboard(project_id, episode_id)
        self.bible = load_client_bible(project_id)
        self.element_manager = ElementManager(self.bible)

    def run_sequence_video(self, sequence_id):
        """Orchestrates a single sequence from plan to multi-shot video"""
        # 1. Find sequence in plan
        seq_data = next(s for s in self.plan['sequences'] if s['id'] == sequence_id)
        
        # 2. Resolve Elements
        fal_elements = self.element_manager.build_fal_elements(seq_data.get('elements', []))
        
        # 3. Extract prompts for multi-shot
        prompts = [shot['prompt'] for shot in seq_data['shots']]
        
        # 4. Find Start Frame (if it exists in clean dir)
        start_frame_path = self._get_approved_start_frame(sequence_id)
        
        # 5. Execute! (Mapping sequence_id -> shot_id)
        self.store.transition(sequence_id, "video_generating")
        
        result = self.step_runner.execute_multi_shot(
            batch=sequence_id, # Using sequence_id as the tracking ID
            multi_prompt_sequence=prompts,
            model=seq_data.get('model', 'kling-o3'),
            start_frame=start_frame_path,
            aspect_ratio=self.config.get("aspect_ratio", "16:9"),
            elements_payload=fal_elements
        )
        
        if result.success:
            self.store.transition(sequence_id, "video_complete")
        return result

    def generate_start_frame_grid(self, sequence_id, prompt):
        """Hits Gemini for a 2x2 grid"""
        # Call new StepRunner method
        pass
```

### 2. Updates to `StepRunner` (in `pipeline.py` or wherever it lives)
Add *one* method for your grid workflow.

```python
    def execute_grid_exploration(self, batch_id, prompt, aspect_ratio="16:9"):
        """Generates a 4-up grid via Gemini for start frame selection"""
        # Standard take management, cost logging, Gemini API call
        # Returns StepResult with path to the grid image
```

### 3. Updates to `client_bridge.py`
Flesh this out to act as the true data layer for the `ClientSequenceRunner`. Ensure `load_client_bible` transforms the manual JSON into the exact dictionary shape that `ElementManager` expects.

---

# 5. Risks & Hidden Coupling Points

Watch out for these landmines as you implement this:

1. **`ElementManager` Schema Mismatch:** The `ElementManager` was built for `global_bible.json` generated by your Narrative Engine. If your manual `client_bible.json` is missing fields (like trigger words, negative prompts, or specific image paths), `build_fal_elements` will throw `KeyErrors`. *Mitigation: Print the output of `load_client_bible` and compare it directly to a Recoil `global_bible.json`.*
2. **Hardcoded Aspect Ratios:** Deep inside `StepRunner` or the `api_client`, there may be hardcoded `"9:16"` strings or resolution mappings (e.g., 720x1280). *Mitigation: Audit `api_client.py` to ensure it properly translates `"16:9"` to `1280x720` for Kling.*
3. **Console UI Regex/Assumptions:** If the Production Console expects `shot_id` to look like `EP01_SC04_SH02`, feeding it `SEQ01` might break parsing logic in the frontend. *Mitigation: Check the Console frontend code for strict shot ID formatting. If it exists, either relax the regex or name your sequences `EP01_SC01_SEQ01`.*
4. **Take Management Overwrites:** Ensure that `execute_multi_shot` inside `StepRunner` respects the `sequence_id` when generating output filenames, so `SEQ01` takes don't accidentally overwrite each other or save into a generic folder.

**Final Recommendation:** Build `client_pipeline.py`, write a simple CLI script to trigger `run_sequence_video("SEQ01")`, and watch it flow through `StepRunner`. You will have a working client pipeline in less than 4 hours.