# Gemini Architecture Consultation Log

## Overview

5-round architecture review between Claude (builder) and Gemini 3.1 Pro Preview (consultant) for the Starsend visual production platform. Conducted Feb 26, 2026.

## How to Re-Run

When models change (new Gemini version, new capabilities, pricing shifts), re-consult:

```bash
# Start a new consultation with updated context
python3 gemini_consult.py --round 1

# After reviewing Gemini's response, write Claude's reply to:
#   gemini_consultation/claude_response_round_1.md
# Then continue:
python3 gemini_consult.py --round 2
# ... up to round N
```

The script automatically loads all prior round transcripts as context for continuity.

## Re-Consultation Triggers

Re-run the consultation when any of these change:
- Gemini model version (e.g., Gemini 4, new image generation architecture)
- API capabilities (new aspect ratios, higher ref limits, native multi-image output)
- Pricing changes that affect tier economics
- New competitor models worth evaluating (Kling 3, Seedance 3, etc.)
- Production findings that contradict consultation assumptions

## Round Index

| Round | Date | Topic | Key Decisions |
|-------|------|-------|---------------|
| 1 | 2026-02-26 | Initial analysis | Aspect ratio flaw (27% crop loss), recency bias discovery, grid templates rejected, ENV sanitization validated |
| 2 | 2026-02-26 | Pushback + merge | Expression Transfer via grayscale refs, wide-shot prompt branching, reference ordering corrected, color contamination mitigation |
| 3 | 2026-02-26 | Convergence | Final architecture locked, complexity tiers (simple/standard/complex), EP001 cost model ($9.20), no output chaining, 7-ref cap for two-character shots |
| 4 | 2026-02-26 | Grid method pushback | Grids reinstated at 4K (1365x1365 sub-panels), text-only prompting validated, shared-seed > independent candidates, grids as planning engine |
| 5 | 2026-02-26 | Structured grids | Visual Anchors pattern, 4 grid types (Scene Coverage, Director's Take, Action Burst, Skip), storyboard-mapped grid positions, exact Shot 2 test prompt |

## Files

```
gemini_consultation/
├── README.md                      ← This file
├── gemini_response_round_1.md     ← Gemini's initial analysis
├── claude_response_round_1.md     ← Claude's pushback on ordering, expression bug, color contamination
├── gemini_response_round_2.md     ← Expression Transfer, wide-shot branching, final ordering
├── claude_response_round_2.md     ← Confirmations + remaining concerns (expressions, two-char, cost)
├── gemini_response_round_3.md     ← Final architecture, risk mitigations, EP001 test protocol
├── claude_response_round_3.md     ← Grid method pushback (4K resolution, shared-seed argument)
├── gemini_response_round_4.md     ← Gemini concedes grids, approves hybrid pipeline
├── claude_response_round_4.md     ← Structured grid prompting discovery (community techniques)
└── gemini_response_round_5.md     ← Visual Anchors, 4 grid types, exact test prompt
```

## Architecture Decisions by Model Assumption

Decisions tagged with the model behavior they depend on. When re-consulting after model changes, check these first:

| Decision | Depends On | Model Behavior |
|----------|-----------|----------------|
| Recency bias ordering | Gemini attention mechanism | Last image Part before text gets highest weight |
| Expression Transfer (grayscale) | Cross-attention modularity | Muscle geometry transfers without identity bleed |
| Color contamination from white-bg | Global illumination sampling | Model samples lighting from ref image backgrounds |
| Text-only grid prompting | Semantic vs pixel interpretation | Text "3x3 grid" activates layout concept; uploaded gridlines become scene geometry |
| Shared-seed grid consistency | Single-call denoising | All grid panels share one noise latent when generated in one call |
| Wide-shot face degradation | Latent space resolution | Face < ~40x40 latent pixels = hallucination |
| Blank Stare Bug | Visual token dominance | Ref image expression overrides text emotion prompts |
| Positive > negative constraints | Diffusion process mechanics | "Flawless hands" works better than "no deformed hands" |
| No output chaining | Generation drift | Sequential ref→gen→ref causes progressive degradation |
| Visual Anchors block | Instruction-following weights | Explicit "MUST REMAIN CONSTANT" triggers continuity policing |

## Cost Assumptions (verify on model change)

| Model | Cost/Image | Used For |
|-------|-----------|----------|
| gemini-3-pro-image-preview | $0.134 | Final renders, ENV anchors |
| gemini-3.1-flash-image-preview | ~$0.039 | Grid exploration, planning |
| gemini-2.5-flash-image | $0.039 | Fallback exploration |
