Architecture
The principles and systems that power every workflow.
Why structured workflows beat vibe coding
"Vibe coding" — giving an AI agent a loose prompt and hoping it figures things out — works for throwaway scripts. It falls apart for production code. Here's why:
Vibe coding skips discovery
An ad-hoc prompt jumps straight to code. No blast radius analysis, no existing test inventory, no knowledge base lookup. The agent doesn't know what it doesn't know — and neither do you until the PR breaks something.
Workflows enforce discovery first
5-7 discovery phases run before a single line of code changes. The agent maps the blast radius, checks the knowledge base, inventories existing tests, and defines requirements — THEN codes.
Vibe coding produces no tests
Ask an AI to "fix this bug" and you get a code change. No unit tests, no E2E tests, no regression check. You're shipping untested code and calling it "AI-assisted development."
Workflows auto-generate all test layers
TEST_GEN auto-writes unit + E2E tests. BASELINE captures pre-change health. VERIFY runs the new tests. BLAST_RADIUS_RUN catches regressions. Four mandatory test phases — none skippable.
Vibe coding has no memory
Every prompt starts from zero. The agent doesn't know that this module has a quirk, that this API times out under load, or that the last three people who touched this file introduced the same regression.
Workflows learn and remember
RECALL reads KNOWLEDGE_BASE.md before discovery begins. LEARN writes back after every workflow. CODEBASE_INSIGHTS.md captures architecture patterns. The system gets smarter with every run.
Vibe coding is untraceable
When the PR is reviewed, there's no evidence trail. Why was this approach chosen? What alternatives were considered? What tests validate it? The reviewer has to trust the AI — or re-investigate everything themselves.
Workflows produce a document chain
16+ linked documents from INTAKE to CLOSE. Every decision traces back to evidence. The PR reviewer can follow DIAGNOSIS → FIX_PLAN → TEST_GEN → VERIFY and understand exactly why every change was made.
The framework doesn't slow down AI-assisted development — it makes it trustworthy. Same speed, but with evidence, tests, and traceability that production code demands.
1. Orchestrator Loop
The orchestrator implements a state machine loop that drives the agent through each phase until a terminal signal is emitted or the agent is blocked.
- Load RULES.md
- Read STATE_FILE — current phase, subphase, attempt counter
- Load phase prompt
- Execute phase — reads upstream docs, produces output
- Process signal
- Update STATE_FILE
- Route — advance, loopback, block, or complete
2. Ralph Loops
Idempotent retry pattern that makes every phase safe to re-execute.
Idempotency Check
If output doc exists and is complete, emit PHASE_COMPLETE without re-doing work.
Attempt Counter
On failure: increment counter, rewrite state, re-run. At attempt >= 3: emit BLOCKED_NEEDS_HUMAN.
Loopback Signals
When later phases discover earlier analysis was wrong: REDIAGNOSE REDESIGN REARCHITECT.
State Rewrite
Always full cat > STATE_FILE << 'EOF' replacement. Never sed/edit.
# Example state rewrite — always atomic, never incremental
cat > .workflow/STATE_FILE.md << 'EOF'
phase: DIAGNOSIS
subphase: root_cause_analysis
attempt: 2
completed_phases:
- INTAKE: 2026-02-27T10:00:00Z
- REPRODUCE: 2026-02-27T10:05:00Z
- RECALL: 2026-02-27T10:06:00Z
- CODE_TRACE: 2026-02-27T10:15:00Z
auto_pr: true
EOF3. LISA — Layered Information State Architecture
Information organized into three layers with different lifecycles and scopes.
State File
Current phase, subphase, attempt counter, completed phases with timestamps. Single file, rewritten in full on every transition.
Phase Documents
One document per phase. Each reads upstream docs and produces exactly one output. Once written, immutable.
Knowledge Stores
Three shared files that persist across all workflow runs and grow smarter over time:
CODEBASE_INSIGHTS.md
The enterprise context document for all workflow agents. Deeply investigated (not surface-scanned) from the survey — 10 sections purpose-built for what agents need: architecture with versions and build details, every module/component/hook listed by name, complete initialization chain in execution order, test config with path mappings and commands, dependency graph with blast-radius rules, override/resolution mechanism, and 8–12 real import examples. Quality-validated by the VALIDATE phase. Adapts to any tech stack. Read by RECALL (all workflows). Written by LEARN (appends fragile areas, component consumers, API quirks). Refreshed by /refresh-insights (re-surveys repo and regenerates without a full generator run).
KNOWLEDGE_BASE.md
Starts empty. Populated by LEARN phases with reusable patterns — diagnostic approaches that worked, architecture decisions, slice strategies, known anti-patterns. Read by RECALL, DIAGNOSIS, DESIGN, ARCHITECTURE.
TOOL_RETRO.md
Seeded by the generator with known-working commands from the survey (dev server, test runners, build checks). LEARN appends commands that failed and their workarounds. Read by RECALL so downstream phases never retry known-broken commands.
Compounding effect
Run 1 starts with survey-seeded insights. LEARN writes new discoveries. Run 2 reads survey + Run 1 findings and skips known areas. After 5-10 runs, the knowledge stores contain deep institutional knowledge — fragile modules, API quirks, test gaps — that no single engineer carries in their head. If your codebase changes significantly (new modules, updated deps), run /refresh-insights to re-survey without re-running the full generator.
4. Signal-Based Routing
Every phase emits exactly one signal. Four types: advance, block, loopback, terminal.
| Signal | Type | Meaning |
|---|---|---|
| PHASE_COMPLETE | advance | Phase finished, proceed to next |
| BLOCKED_NEEDS_HUMAN | block | Cannot proceed, human intervention required |
| SCOPE_ESCALATION | block | Discovered scope larger than expected |
| REDIAGNOSE | loopback | Root cause wrong, retry DIAGNOSIS (bug-fix) |
| REDESIGN | loopback | Design failed, retry DESIGN (feature-enhance) |
| REARCHITECT | loopback | Architecture invalid, retry (feature-build) |
| RESLICE | loopback | Slice plan needs restructuring (feature-build) |
| DEPENDENCY_BLOCKED | block | External dependency unavailable (feature-build) |
| BUG_FIXED | terminal | Bug-fix workflow complete |
| ENHANCEMENT_SHIPPED | terminal | Feature-enhance workflow complete |
| FEATURE_SHIPPED | terminal | Feature-build workflow complete |
5. Document Chain
Upstream phases produce documents that downstream phases read, creating a traceable path from problem to solution.
6. Test Architecture
Tests categorized into three groups with distinct ownership and execution timing.
New Tests
Written in TEST_GEN for new behavior. Run in IMPLEMENT and VERIFY.
Updated Tests
Existing tests with changed assertions. Run in IMPLEMENT and VERIFY.
Existing Tests
Unchanged regression tests. Run in BASELINE and BLAST_RADIUS.
| Phase | Purpose | Categories |
|---|---|---|
| BASELINE | Capture health before changes | C |
| IMPLEMENT | Red-green cycle | A, B |
| VERIFY | Confirm fix/feature end-to-end | A, B |
| BLAST_RADIUS | Check for regressions | C |
7. Universal Rules
Rules loaded from RULES.md at the start of every orchestrator iteration. Development workflows use Rules 1-17. Story readiness workflows add Rules 17s and 18.
1Verify Every Action▶
2No Vague Language▶
3Read Human Guidance First▶
4Rewrite State, Don’t Edit▶
5Blocker at 3 Attempts▶
6Check Completion Before Work▶
7Stay Scoped▶
8Mandatory Phase Order▶
9Both Test Layers Always▶
10Scope Defined by Requirements▶
11Recall Before Discovery▶
12App Lifecycle for E2E▶
13Feature Flag Gate▶
13bThree Test Categories▶
14Design Context Non-Blocking▶
15Dependency Decisions Explicit▶
16Real Timestamps▶
17Ticket Gate▶
17sRead All Linked Confluence Pages (Story)▶
18PRD Traceability (Story)▶
8. Integrations
External tools that enrich context and enforce quality gates. All designed to degrade gracefully.
Ticket Gate
Dedicated TICKET_GATE phase after INTAKE. Workflow-specific checklists (6/6/8 required items). For features: fetches linked stories to check dependency maturity, interface contracts, and transitive blockers. Three verdicts: READY, NEEDS_REVIEW, INSUFFICIENT.
Figma
Non-blocking. Unavailable design context proceeds with design_available: false.
Feature Flags
Feature-build workflow: all code behind flags (default OFF). Feature-enhance: flags conditional on category and risk.