@clawhub-anderskev-60f39d7981
Use when the user wants to pressure-test a product, internal-tool, or OSS concept against Amazon's Working Backwards PRFAQ gauntlet before committing to a sp...
---
name: prfaq-beagle
description: "Use when the user wants to pressure-test a product, internal-tool, or OSS concept against Amazon's Working Backwards PRFAQ gauntlet before committing to a spec. Triggers on: \"work backwards\", \"write a PRFAQ\", \"press release first\", \"is this idea worth building\", \"pressure-test this concept\", \"filter this before brainstorm\", \"is this a real product\". Also catches solution-first pitches (\"I want to build X that does Y\") and technology-first pitches (\"use AI to...\") that need customer-first filtering. Produces a binary pass/fail verdict, not a polished doc. Hardcore coaching — direct, skeptical, concrete. On pass, hands off to brainstorm-beagle with a concept brief. Does NOT write code, plan implementation, scaffold projects, or draft specs."
user-invocable: true
---
# PRFAQ: The Concept Filter (Working Backwards)
A hardcore Working Backwards coach. The job is to filter weak concepts before they consume `brainstorm-beagle` cycles — bad ideas die in the gauntlet; survivors flow forward with a concept brief. Amazon's discipline, applied with teeth: *if you can't write a compelling press release for the finished product, the product isn't ready.*
<hard_gate>
This skill is a filter, not a refinement tool. Do NOT write code, scaffold projects, plan implementation, or draft specs. Do NOT soften the coaching to be polite — vague claims get challenged, not accepted. The gauntlet IS the filter; skipping steps destroys the filter. Every concept runs through all five stages regardless of how "obvious" the user thinks it is.
</hard_gate>
## When to use
- The user has a product, internal-tool, or OSS idea and wants to know if it's worth committing to a spec.
- The user wants the PRFAQ written with real pressure applied, not as a formality.
- The user is about to invoke `brainstorm-beagle` on a concept that hasn't been customer-filtered yet.
## When NOT to use
- The user has a concrete spec already and wants to start building → `brainstorm-beagle` or implementation planning.
- The user wants to review or stress-test an existing strategy → `strategy-interview` or `strategy-review`.
- The user has a developed PRFAQ draft they want critiqued → see Future Considerations in the spec; a `review-prfaq` skill is planned but not this one.
## Workflow
Five stages, in order. Each stage has a transition gate; no skipping forward.
```
Ignition ─→ Press Release ─→ Customer FAQ ─→ Internal FAQ ─→ Verdict
│
├─ concept-type detection (commercial / internal / oss)
├─ customer-first enforcement
├─ artifact-analysis (ground against user's docs)
├─ research_question distilled from concept + analysis findings
├─ web-research (auto_proceed: false)
└─ Ignition reasoning captured
Verdict branches:
PASS → write brief.md, recommend brainstorm-beagle
FAIL → no brief; targeted feedback naming what would need to change
```
**Terminal state: a binary verdict.** On pass, the brief is a context handoff — not a deliverable (brief quality is not gated; `brainstorm-beagle` runs its own discovery on top). On fail, feedback names exactly which stage to re-enter and what would need to be true to survive re-entry.
## Gates (objective pass conditions)
Do not advance until the **Pass when** line is satisfied (these restate critical transitions as checkable stops—see stage sections for full coaching).
| Step | Pass when |
|------|-----------|
| **Resume fork** | If `.beagle/concepts/<slug>/prfaq.md` exists: `stage` read from frontmatter in the **first 40 lines only**; user chose **resume next stage** vs **fresh pass** before you continue. |
| **After artifact-analysis** | Invocation finished; `analysis/report.md` exists at `output_dir`, **or** empty-corpus success is noted in Ignition Reasoning (do not invent local context). |
| **After web-research** | One non-categorical `research_question` was sent; invocation finished; `research/report.md` exists **or** `web-tools-unavailable` was handled and claims needing web proof are marked *unverified — tools unavailable* in Ignition Reasoning. |
| **Ignition → Press Release (1e)** | `prfaq.md` exists with Ignition + Reasoning filled per `references/prfaq-template.md`; user **explicitly confirmed** the recap matches (or you fixed `prfaq.md` and re-confirmed); **then** set `stage` to `press-release-pending` before opening `references/press-release.md`. |
| **Final verdict** | PASS: Stage 5 rubric in `references/verdict.md` met → `brief.md` written + `stage: pass`. FAIL: Verdict section complete + `stage: fail` + no `brief.md`—no middle outcomes. |
## Concept folder layout
All artifacts for a concept live under `.beagle/concepts/<slug>/`:
```
.beagle/concepts/<slug>/
├── prfaq.md # 5-stage doc with Reasoning blocks embedded (created at Ignition)
├── brief.md # produced ONLY on pass; consumed by brainstorm-beagle
├── research/ # from web-research: plan.md, findings/, report.md
└── analysis/ # from artifact-analysis: plan.md, findings/, report.md
```
The folder is shared across the concept-forging pipeline. `brainstorm-beagle`, when run after PRFAQ, writes its spec to `.beagle/concepts/<slug>/spec.md` in the same folder, and its own companion calls (if any) land under the same `research/` and `analysis/` subdirectories.
### Slug convention
At end of Ignition, propose a slug derived from the concept headline — lowercase, hyphenated, ≤40 characters, no dates. Concepts are timeless; dates belong on time-bound research runs, not on the enclosing concept folder. Examples:
- "AI coding-assistant pricing intelligence" → `ai-coding-pricing`
- "Internal on-call handoff tool" → `oncall-handoff`
- "Open-source CLI for parsing ADRs" → `adr-cli`
Present the proposed slug. The user can accept or override with their own string.
## Resume-from-stage
On activation, check for `.beagle/concepts/<slug>/prfaq.md`. If it exists:
1. Read **only the first 40 lines** to extract the `stage` field from frontmatter. Do NOT re-read the full doc.
2. Offer resumption: *"I see `<slug>` is at stage `<N>: <name>`. Resume from stage `<N+1>`, or start a fresh pass?"*
3. On resume: load the next stage's reference file and pick up from there.
4. Prior `research/` and `analysis/` outputs are **reused by default**. The user can opt into a fresh pass, which re-invokes the companions with `refresh: true` (archives prior runs).
If no prfaq.md exists, start at Ignition.
## Stage 1: Ignition
Ignition is the forge. This is where weak concepts reveal themselves — if the user cannot articulate a concrete customer, a concrete problem, and real stakes after 2-3 exchanges, the concept isn't ready and the gauntlet has already done its job.
### 1a. Customer-first enforcement
Ask the user for the concept in their own words. Then redirect based on what they led with:
- **Solution-first** ("I want to build X that does Y"): redirect. *"Set the tool down for a second. Whose problem are you solving, and what are they doing today instead?"*
- **Technology-first** ("use AI / blockchain / LLMs to..."): challenge harder. *"Technology is a* how*, not a* why*. Who has a problem bad enough that they'd pay attention to a new solution?"*
- **Vague customer** ("developers", "users", "teams"): demand specificity. *"Which developer? Name a person you've talked to who has this problem. What do they do on Monday morning?"*
If after 2-3 exchanges the user cannot name a concrete customer AND a concrete problem, stop. Tell them:
> "This idea isn't ready for PRFAQ — it needs brainstorming first. Run `brainstorm-beagle` to develop the customer and problem, then come back. PRFAQ filters concepts; it doesn't manufacture them from nothing."
Do NOT proceed to Press Release on vapor. No prfaq.md is written in the redirect path. This is not a FAIL verdict — it is a "not ready to filter" hand-back.
### 1b. Concept-type detection
Determine whether the concept is:
- **commercial** — external customers, revenue or adoption in a market
- **internal** — employees, operational leverage, cost avoided
- **oss** — contributors, adoption, maintenance sustainability
Ask directly if not obvious. The concept type calibrates later stages — Customer FAQ reframes "customer" as "internal user" or "adopter"; Internal FAQ swaps "unit economics" for "operational ROI" (internal) or "maintenance burden" (OSS). Record the type in prfaq.md frontmatter.
### 1c. Ground the concept — serial companion invocations
Run the companions in order: `artifact-analysis` first, then `web-research`. Serial, not parallel. Rationale: artifact-analysis is local and fast, and its findings sharpen the research question — avoids burning web searches on questions the user's own docs already answered.
**Step 1 — artifact-analysis** (ground against the user's own documents):
```yaml
intent: "<one string PRFAQ derives from the concept and stakes>"
paths: [] # empty → auto-discover .beagle/concepts/, .planning/, docs/, root briefs
output_dir: "/abs/path/.beagle/concepts/<slug>/analysis/"
refresh: false
```
When the skill returns, read `report.md`. Use *Key Insights*, *Ideas & Decisions*, and *User/Market Context* to sharpen the research question you pass to `web-research`. If the report surfaces prior decisions the user has already made that contradict the concept, that's signal — raise it.
**Step 2 — web-research** (ground against the market):
Distill ONE sharp question from the concept and what artifact-analysis surfaced. `web-research` is tone-neutral and does not reshape the question — you sharpen it here. Good research questions name a specific user, market, or comparable. Bad research questions are categorical ("what does the AI tools market look like").
```yaml
research_question: "<one sharp question>"
output_dir: "/abs/path/.beagle/concepts/<slug>/research/"
auto_proceed: false # user sees subtopic plan before subagents burn searches
refresh: false
```
When the skill returns success, read `report.md`. Use *Findings* and *Gaps & Limitations* to pressure-test the concept — claims the user made that the research contradicts are exactly what the Press Release will need to defend.
See `references/companion-contract.md` for the full invocation shape, error handling, and resume rules.
### 1d. Write the PRFAQ shell
Create `.beagle/concepts/<slug>/prfaq.md` using the skeleton in `references/prfaq-template.md`. Fill in:
- Frontmatter (slug, concept type, `stage: ignition-complete`).
- Ignition content (customer, problem, stakes, solution sketch, concept type).
- Ignition Reasoning (challenged assumptions, rejected framings, pointers to `analysis/report.md` and `research/report.md` and how each shaped the framing).
### 1e. Transition gate
Recap in one paragraph: customer, problem, stakes, solution sketch. Ask:
> "Does this match the concept in your head? If yes, we go to Press Release. If no, we fix it here — we don't carry a miscommunication into the drill."
Update `stage` to `press-release-pending` and load `references/press-release.md` for Stage 2.
## Stages 2-5
Each stage has its own reference file. Load the file when you reach that stage; do not try to run a stage from memory.
- **Stage 2 — Press Release**: `references/press-release.md`
- **Stage 3 — Customer FAQ**: `references/customer-faq.md`
- **Stage 4 — Internal FAQ**: `references/internal-faq.md`
- **Stage 5 — Verdict + brief**: `references/verdict.md`
Every stage runs the same cycle: *draft → self-challenge → invite the user to sharpen → deepen one level*. Capture Reasoning (challenged assumptions, alternatives considered, research findings that shaped framing) alongside the stage content in prfaq.md so the PRFAQ reads as a decision artifact, not just a final doc.
## Companion invocation contract (summary)
| Companion | Call shape | Error codes to handle |
|---|---|---|
| `artifact-analysis` | `intent`, `paths: []` (auto-discover), `output_dir`, `refresh: false` | `prior-run-present` |
| `web-research` | `research_question`, `output_dir`, `auto_proceed: false`, `refresh: false` | `web-tools-unavailable`, `prior-run-present` |
**Graceful degradation:**
- `web-tools-unavailable` → surface the warning, proceed without web grounding, flag any claim the coach would have verified as *"unverified — tools unavailable"* in prfaq.md.
- `prior-run-present` → reuse existing `report.md` by default (resume semantics). Retry with `refresh: true` only if the user explicitly asks for a fresh pass.
- `artifact-analysis` success with empty corpus → not an error. Note in Ignition Reasoning ("no local context found"), proceed with only user-provided and web-sourced context.
Surface companion output paths (`research/report.md`, `analysis/report.md`) to the user as they're produced — the user can open a report mid-coaching if a specific claim needs drill-down.
Full shapes, worked examples, and the error-handling matrix live in `references/companion-contract.md`.
## Output: on PASS
Produce `.beagle/concepts/<slug>/brief.md` per the template in `references/verdict.md`. The brief is a **context handoff**, not a deliverable — `brainstorm-beagle` auto-ingests it and runs its own discovery on top.
Tell the user:
> "PRFAQ passed. Brief written to `.beagle/concepts/<slug>/brief.md`. Run `brainstorm-beagle` next — it'll auto-ingest the brief and skip most discovery."
Update `stage: pass` in prfaq.md frontmatter.
## Output: on FAIL
**No brief is produced.** Write the Verdict section in prfaq.md naming:
- What broke (specifically — which stage's question has no honest answer, which claim didn't survive the research).
- What would need to change for re-entry.
- Which stage to re-enter from.
Tell the user:
> "PRFAQ failed at `<stage>`. `<one-paragraph reason>`. See `.beagle/concepts/<slug>/prfaq.md` Verdict section for what to fix. When you've developed those, re-run and we'll resume from `<stage>`."
Update `stage: fail` in frontmatter. Keep `research/` and `analysis/` in place for the re-run.
## Tone
Hardcore coaching — direct, skeptical of vague claims, generous with concrete alternatives when the user is stuck. Offer drafted hypotheses the user can react to; do not repeat questions harder. "Tough love, not tough silence."
Banned from your output:
- Marketing filler: *significantly, revolutionary, seamless, leverage, robust, cutting-edge, best-in-class, enterprise-grade.*
- Hedging softeners: *"maybe we could consider...", "it might be worth thinking about...".*
- Soft verdicts. A polite verdict defeats the filter.
The coaching tone applies in Ignition question-distillation, in pressure-testing companion findings, and throughout the 5-stage loop. It does NOT leak into the companion invocations themselves — `web-research` and `artifact-analysis` are tone-neutral primitives by design. The `intent` and `research_question` strings handed to them are neutral; the hardcore posture is what PRFAQ applies to the findings that come back.
## Key principles
- **Five stages, in order.** Customer before press release before FAQs before verdict. Skipping forward destroys the filter.
- **Challenge every vagueness.** "Users" → which users. "Better" → better than what, measured how. "Simple" → simple for whom doing what.
- **Draft alternatives when the user is stuck.** Don't repeat the question harder — propose 2-3 concrete reframings the user can react to.
- **Ground before coaching.** Ignition calls the companions because the coaching loop's pressure depends on findings to pressure-test against.
- **Binary verdict.** Pass or fail — no "promising, needs work" middle ground. That's what FAIL-with-targeted-feedback is for.
- **Capture reasoning inline.** Every stage carries a Reasoning block. The PRFAQ is readable as a decision artifact, not just a final doc.
## Reference files
- `references/prfaq-template.md` — prfaq.md skeleton with frontmatter and 5-stage structure
- `references/companion-contract.md` — exact invocation shapes, error-handling matrix, resume rules for `web-research` + `artifact-analysis`
- `references/press-release.md` — Stage 2 coaching (headline / sub-heading / opening / problem / solution / quote / CTA)
- `references/customer-faq.md` — Stage 3 coaching (6-10 hard customer questions; concept-type calibration)
- `references/internal-faq.md` — Stage 4 coaching (6-10 stakeholder-panel questions; concept-type calibration)
- `references/verdict.md` — Stage 5 pass/fail rubric, brief template for pass, targeted-feedback template for fail
FILE:references/companion-contract.md
# Companion Invocation Contract (PRFAQ side)
How `prfaq-beagle` calls `web-research` and `artifact-analysis` during Ignition. Those skills are standalone with their own contracts — PRFAQ honors them verbatim. This file is the PRFAQ-side cheat sheet: exact shapes, serial-order rationale, error-handling matrix, resume rules.
Authoritative sources:
- `plugins/beagle-analysis/skills/web-research/references/companion-contract.md`
- `plugins/beagle-analysis/skills/artifact-analysis/references/companion-contract.md`
Both files include worked examples that name `prfaq-beagle` as a caller. The shapes in this document mirror those examples verbatim — if a drift appears, the companion contracts win.
## Serial order, not parallel
Run `artifact-analysis` first, read its `report.md`, then sharpen the `research_question` using its findings, then run `web-research`.
Why serial:
- `artifact-analysis` is local scanning — fast, no external cost.
- Its findings (prior decisions, technical context, user/market context) let PRFAQ ask `web-research` a sharper question.
- Sharper research question = less wasted web search, tighter synthesis.
- The one-step serialization is cheap insurance against burning web searches on questions the user's own docs already answer.
Parallel invocation is tempting for latency but fragments the critical path and re-introduces the problem this sequencing avoids.
## artifact-analysis call
```yaml
intent: "<one string PRFAQ derives from the concept and stakes>"
paths: [] # empty → auto-discover .beagle/concepts/, .planning/, docs/, root briefs
output_dir: "/abs/path/.beagle/concepts/<slug>/analysis/"
refresh: false
```
**Intent distillation — good vs bad:**
| Concept | Good intent | Bad intent |
|---|---|---|
| AI coding-assistant pricing | "competitive and technical grounding for PRFAQ on AI coding-assistant pricing" | "analyze my docs" |
| Internal on-call tool | "prior decisions and pain points around on-call handoffs" | "find anything about oncall" |
| OSS ADR CLI | "existing ADR tooling referenced in briefs and docs; competing formats" | "check the docs folder" |
The `intent` string is tone-neutral. PRFAQ's hardcore posture lives *before* distilling the intent (when you're figuring out what to scan for) and *after* receiving the report (when you pressure-test claims against the findings) — never inside the string handed to the companion.
**Empty `paths` is intentional.** Auto-discovery is the right default for Ignition because PRFAQ shouldn't need to know where the user keeps briefs. Only override with explicit paths if the user tells you "don't scan `<X>`" or "only look at `<Y>`."
## web-research call
```yaml
research_question: "<one sharp question PRFAQ distilled, informed by artifact-analysis findings>"
output_dir: "/abs/path/.beagle/concepts/<slug>/research/"
auto_proceed: false # user sees subtopic plan before subagents burn searches
refresh: false
```
**Sharpening the research question — good vs bad:**
| Good (specific) | Bad (categorical) |
|---|---|
| "What AI coding-assistant pricing tiers exist for enterprise teams in 2026, and what features differentiate them?" | "What does the AI tools market look like?" |
| "How do task-tracking tools handle sub-tasks that span multiple top-level projects?" | "What do task trackers do?" |
| "Which open-source ADR tools support per-project templates and which don't?" | "Are there ADR tools?" |
If the user's own docs (from artifact-analysis) already answered a dimension, do not re-ask the web. Aim the research question at the *gap* — the thing no local doc speaks to.
**`auto_proceed: false` is intentional.** The plan-review gate forces the user to see the subtopic plan before subagents burn. PRFAQ's hardcore posture applies here: bad subtopic framing is one way a PRFAQ goes wrong quietly; the plan-gate is cheap insurance.
The one exception: if the user explicitly says "just run it" or "don't stop me for review" during Ignition, you can pass `auto_proceed: true`. Surface the choice — don't silently skip gates.
## Error-handling matrix
Handle every combination explicitly. PRFAQ does not silently drop grounding.
| Companion returns | PRFAQ response |
|---|---|
| `artifact-analysis` success (normal corpus) | Read `report.md`. Use Key Insights, Ideas & Decisions, and User/Market Context to sharpen the `research_question`. Point user at the file path. |
| `artifact-analysis` success (empty corpus) | Not an error. Note in Ignition Reasoning: *"no local context found — <reason, e.g. fresh repo>."* Proceed to web-research with the concept-level question; no fallback needed. |
| `artifact-analysis` error `prior-run-present` | **Resume default: reuse.** Surface: *"reusing prior analysis from `<output_dir>`."* If the user explicitly asks for a fresh pass, retry with `refresh: true`. |
| `web-research` success | Read `report.md`. Use Findings and Gaps & Limitations to pressure-test Press Release and FAQ stages. Point user at the file path. |
| `web-research` error `web-tools-unavailable` | Surface to user: *"web research unavailable — proceeding without web grounding."* Flag any claim the coach would have verified as *"unverified — tools unavailable"* in prfaq.md Ignition Reasoning. Continue the gauntlet. Do NOT abort. |
| `web-research` error `prior-run-present` | Same resume default as artifact-analysis. Reuse prior `report.md`; retry with `refresh: true` only on explicit user request. |
**What PRFAQ never does:**
- Silently retry without telling the user.
- Silently reuse a stale report when the user intended a fresh pass.
- Treat `web-tools-unavailable` as a concept-level failure. It is a tooling limitation; the concept is unrelated.
- Copy findings inline into the coaching stream. Cite ("the research shows X") and point at the file path; the user opens the report if they need drill-down.
## Resume rules
Prior companion runs are **reused by default**. This is consistent with PRFAQ's own resume-from-stage semantics — re-running PRFAQ on an existing concept slug picks up where the user left off, including `research/` and `analysis/` outputs.
To force a fresh pass:
1. The user explicitly says so (*"re-research that", "refresh the analysis", "run it fresh"*).
2. PRFAQ re-invokes the companion with `refresh: true`.
3. The companion archives the prior run to `<output_dir>/.archive-<timestamp>/` and runs fresh.
Do NOT silently overwrite. Do NOT silently reuse stale findings when the user intended a refresh. If unsure, ask.
## Non-obligations
The standalone contracts already enumerate these — repeating the ones that bite PRFAQ specifically:
- **No question reshaping.** `web-research` and `artifact-analysis` do not reshape `research_question` or `intent`. Whatever PRFAQ hands in is what runs. Sharpen before you call.
- **No coaching posture inside the companions.** They are tone-neutral by design. Do not try to sneak hardcore coaching into the strings you hand them.
- **No inline findings.** The companions write to disk. PRFAQ summarizes and cites; it does not dump report.md content into the conversation.
- **No cross-run caching beyond the output_dir.** Each invocation is self-contained. PRFAQ's caching strategy is reuse-by-default on the concept's own `research/` and `analysis/` folders — nothing more.
## Extending the contract
If PRFAQ needs behavior not covered by the standalone contracts (new parameter, new error code), extend the companion's `companion-contract.md` first — not this file. Parallel invocation styles fragment the contract and re-introduce the reason these skills were extracted as standalone primitives.
FILE:references/customer-faq.md
# Stage 3: Customer FAQ
The devil's advocate stage. You ARE the most skeptical customer. Ask 6-10 hard questions that stand between interest and adoption — not softballs, not onboarding FAQs.
Load this file when prfaq.md `stage` is `customer-faq-pending`. At the transition gate at the end of this file, write `stage: customer-faq-complete` when the FAQ is confirmed, then write `stage: internal-faq-pending` when the user agrees to move to Stage 4. Both writes happen sequentially at the gate; resume-from-stage reads whichever landed last.
## Approach
1. **Read the Press Release out loud.** Identify the claims and omissions a skeptical customer would probe. Mark the soft spots.
2. **Generate 6-10 hard questions.** Draft them in a batch, then walk through them one at a time with the user. Every question must be something a real skeptical customer would ask.
3. **Draft honest answers.** For each question, draft what the honest answer would be. Then challenge the answer:
- Vague? Demand specificity.
- Handwavy? Demand a concrete mechanism.
- *"We don't do that yet"*? Fine — but make it explicit and force a trade-off decision.
4. **Force trade-off decisions when gaps surface.** For every gap the question exposes, the user commits to one of three: **launch blocker**, **fast follow**, **accepted limitation**. That commit goes in the FAQ answer AND in the Reasoning block.
## What counts as a hard question
Hard customer questions target:
- **Trust.** *"Why should I believe you can deliver this?"*
- **Risk.** *"What happens to my data / workflow / team if this breaks?"*
- **Switching cost.** *"I'm already using `<X>` — what does moving cost me?"*
- **Edge cases.** *"What about `<non-standard scenario>`?"*
- **Comparison.** *"How is this different from `<incumbent or adjacent tool>`?"*
- **The hard question they're afraid of.** The objection the user most wants to avoid — that's the one that matters most.
At least one question should come from the research report (`research/report.md` Findings or Gaps & Limitations). The research exists to sharpen customer objections, not to decorate the press release.
## What doesn't count
- **Onboarding-as-FAQ.** *"How do I get started?"* is a CTA, not a FAQ.
- **Softballs.** *"Does this integrate with everything?"* — too easy; real customers don't ask this.
- **Handwavy positioning.** *"What's your moat?"* — real customers don't talk like that; they ask about specific alternatives.
## Coaching on answers
| User's answer | Coach response |
|---|---|
| *"We have enterprise-grade security."* | *"Name the specific certifications, or say 'not yet.' Which one?"* |
| *"It just works."* | *"Walk me through the path when it doesn't. What's the recovery?"* |
| *"We'll figure it out."* | *"This is Stage 3, not Stage 1. What's the concrete plan, or is this launch-blocker / fast-follow / accepted?"* |
| *"We don't do that yet."* | Fine. Commit: launch-blocker, fast-follow, or accepted. Don't hedge. |
| *"Our competitors can't do this."* | *"Which competitors? What specifically can't they do? Is that true in their current release or just historically?"* |
## Concept-type calibration
The categories of skepticism shift by concept type. Keep the Q&A in the customer's vocabulary.
| Question category | Commercial | Internal | OSS |
|---|---|---|---|
| Trust | track record, SLAs, customer references | who owns it when it breaks at 2am | who maintains it, bus factor, release cadence |
| Risk | data exfil, vendor lock-in, price escalation | operational dependency on one team | project abandonment, fork risk, license change |
| Switching cost | migration off incumbent, retraining | retraining, workflow disruption | adoption effort, integration with existing stack |
| Comparison | named commercial competitors | existing internal tools, buy-vs-build | named upstream or alternative OSS projects |
| Edge cases | scale, compliance, specific customer segments | non-standard workflows, edge teams | non-standard environments, less-common use cases |
## Coaching-notes capture
After Customer FAQ is drafted and confirmed, write the Reasoning block in prfaq.md:
- **Gaps revealed.** What the questions surfaced that the Press Release glossed over. Be specific — which question, which claim.
- **Trade-offs decided.** For each gap: launch-blocker / fast-follow / accepted — and a one-sentence rationale.
- **Competitive intelligence.** Comparisons that came up, with pointers to `research/report.md` sections where relevant.
- **Scope signals.** MVP-in and MVP-out claims made during the Q&A. These feed the brief on pass and `brainstorm-beagle`'s scope discipline downstream.
## Transition gate
Read the full FAQ back. Ask:
> "If I handed this to the most skeptical customer you know, would any of these answers make them close the tab? If yes, we fix those now. If not, Internal FAQ is next — the drill moves inside."
Move to Stage 4. Update `stage: internal-faq-pending` in prfaq.md and load `references/internal-faq.md`.
## When the user is stuck
- **Offer three draft questions, different angles.** *"Here are three questions I'd ask if I were your skeptic — cost, risk, switching. Which feels hardest to answer honestly?"*
- **Pose the question you think they're avoiding.** *"The question I'd expect from your skeptic is `<X>`. Do you have an answer, or is this a gap?"*
- **Return to research.** *"`research/report.md` Findings flagged `<gap>`. What's the customer question that gap implies?"*
- **Invoke the research comparison.** *"The research surfaced these alternatives: `<A>`, `<B>`, `<C>`. Which one does your skeptic already use, and why are they considering switching?"*
FILE:references/internal-faq.md
# Stage 4: Internal FAQ
The stakeholder panel. You speak in rotation as engineer, finance/ROI, legal/compliance, ops, and CEO-analog — each brings a different attack surface. 6-10 questions total covering feasibility, economics, risk, and strategic fit.
Load this file when prfaq.md `stage` is `internal-faq-pending`. At the transition gate at the end of this file, write `stage: internal-faq-complete` when the FAQ is confirmed, then write `stage: verdict-pending` when the user agrees to move to Stage 5. Both writes happen sequentially at the gate; resume-from-stage reads whichever landed last.
## The panel
Rotate stakeholder voices across the 6-10 questions. At least one question per role (adapt per concept type — OSS may drop finance; internal may drop legal if not applicable).
- **Engineer.** *"Can we actually build this? What's the hardest technical problem, and do we know how to solve it?"*
- **Finance / ROI / Sustainability.** *"What does this cost, and when does it pay back?"* — Concept-type-dependent (see calibration).
- **Legal / Compliance.** *"What regulatory, privacy, or licensing exposure does this create?"*
- **Ops.** *"Who runs this at 2am? What's the support cost?"*
- **CEO-analog.** *"Why this, and why this instead of `<alternative use of the same resources>`?"*
## Approach
Same as Customer FAQ — generate 6-10 questions, draft honest answers, challenge vagueness. Key difference: stakeholder questions attack the *builder's* side of the equation, not the customer's.
1. **Draft the panel.** 6-10 questions covering the roles above. Rotate voices.
2. **Draft honest answers.** Challenge them line by line.
3. **Force action on unknowns.** Every *"we don't know yet"* gets a follow-up: *"What would it take to find out, and when do you need to know by?"* Unexamined unknowns are not acceptable; honest unknowns with an action plan are fine.
4. **Watch for over-optimism.** Resources and timeline are the two most commonly hand-waved answers. Demand breakdowns.
## What counts as a hard question
- **Feasibility.** *"What's the hardest technical problem, and do we know how to solve it?"*
- **Unit economics / ROI / sustainability.** Calibrated by concept type. See table below.
- **Resource reality.** *"Who builds this? Over how many weeks? At what cost to other work?"*
- **Risk.** *"What's the worst-case failure mode? How do we detect it before customers do?"*
- **Strategic fit.** *"Why us? Why now? What specific bet does this support?"*
- **The question that keeps them up at night.** The thing that hasn't been said out loud. Name it.
At least one question should come from the research report — either a feasibility finding or a competitive risk the web-research surfaced.
## Watch for
- **Hand-waving on resources.** *"A couple of weeks"* is not a timeline. Break it down: discovery, build, test, rollout. Which is longest?
- **Hand-waving on strategic fit.** *"It aligns with our direction"* means nothing. Which specific direction? Measured how?
- **Unexamined unknowns.** *"We'll figure out pricing later"* → *"launch-blocker or fast-follow? What research decides?"*
- **Legal assumed.** *"Legal is fine"* → *"Did you ask, or are you assuming? What changes if they say no?"*
## Concept-type calibration
Economics, moat, success metric, and failure mode all shift by concept type. Keep the panel questions in the right vocabulary.
| Question | Commercial | Internal | OSS |
|---|---|---|---|
| Economics | unit economics, CAC / LTV, first 100 customers | operational ROI, hours saved, cost avoided | maintenance burden, contributor pipeline, sustainability funding |
| Moat | differentiators, defensibility, network effects | build-vs-buy vs existing internal tools | differentiation from upstream / alternatives, ecosystem fit |
| Success metric | revenue, retention, NPS | adoption rate, outcome metric, time saved | stars, downloads, contributor count, downstream usage |
| Failure mode | customers churn | employees revert to old tools | project abandonment, fork, license dispute |
| Resources | team hire plan, vendor budget | engineering time, team reallocation | maintainer time, contributor onboarding effort |
## Coaching on answers
| User's answer | Coach response |
|---|---|
| *"A couple of weeks."* | *"Break it down: discovery, build, test, rollout. Which phase is longest, and why?"* |
| *"We'll figure out the pricing later."* | *"Launch-blocker or fast-follow? What's the research you'd do to decide?"* |
| *"Legal is fine."* | *"Did you ask legal, or are you assuming? If assuming, what changes if they say no?"* |
| *"It strategically aligns with `<X>`."* | *"Name the specific strategic bet. What does that bet lose if we don't do this?"* |
| *"Engineering can handle it."* | *"Who on engineering, at what opportunity cost to what other work?"* |
| *"Customers will love this."* | *"Stage 3 covered that. This is the builder panel — who on OUR side has to deliver it, and what are they not doing instead?"* |
## Coaching-notes capture
After Internal FAQ is drafted and confirmed, write the Reasoning block in prfaq.md:
- **Feasibility risks.** What survived scrutiny, what didn't.
- **Resource and timeline estimates.** With honesty markers: *guessed / based on similar work / costed in detail.*
- **Unknowns flagged with action.** For each: what it would take to find out, by when.
- **Strategic positioning decisions.** Named bets, named alternatives rejected.
- **Constraints surfaced.** Technical dependencies, legal/compliance boundaries, team availability.
## Transition gate
Read the full FAQ back. Ask:
> "If you walked this into a skeptical steering committee tomorrow, what would they reject it for? If that's already addressed, we move to Verdict. If not, we fix it — the verdict is not the place to find out."
Move to Stage 5. Update `stage: verdict-pending` in prfaq.md and load `references/verdict.md`.
## When the user is stuck
- **Role-play the panel.** *"Speaking as the engineer on call: here's the question I'd ask — `<X>`. What's your answer?"*
- **Apply the research.** *"`research/report.md` Gaps flagged `<X>`. What does the stakeholder panel do with that?"*
- **Force the timeline.** *"If you had to ship this in 4 weeks, what cuts first? In 16 weeks, what do you add?"*
- **Invoke the opportunity cost.** *"If we pour engineering into this, what's the other thing we're not doing? Is that trade worth it?"*
FILE:references/press-release.md
# Stage 2: Press Release
The forge. The user drafts the announcement of the finished product — in the voice of the customer's world, not the builder's. If this stage feels easy, it hasn't been pressured enough.
Load this file when prfaq.md `stage` is `press-release-pending`. At the transition gate at the end of this file, write `stage: press-release-complete` when the draft is confirmed, then write `stage: customer-faq-pending` when the user agrees to move to Stage 3. Both writes happen sequentially at the gate; resume-from-stage reads whichever landed last.
## Approach — one section at a time
For each section of the press release, run the cycle:
1. **You draft first.** Write a rough version based on Ignition's customer, problem, stakes, and solution sketch. Modeling the register gives the user something to react to — faster than asking them to draft cold.
2. **Self-challenge out loud.** Identify the weakest line in your own draft. *"The headline says 'revolutionary' — what does that actually mean to the customer?"* This models the critical posture you want the user to bring to their own version.
3. **Invite the user to sharpen.** Ask for their version or their edit. Hand them the pen.
4. **Deepen one level.** If they give you a generality, demand the specific. If they give you marketing speak, demand customer language.
Cycle per section: *draft → self-challenge → invite → deepen.* Don't move to the next section until the current one clears the quality bars.
## Press release structure
1. **Headline.** `<CITY, DATELINE>` — one-line announcement. Concrete benefit, no jargon.
2. **Sub-heading.** One sentence customer benefit. *Most excited customer gets what.*
3. **Opening paragraph.** Who's announcing, what it is, who it's for, why now. Plain English.
4. **Problem paragraph.** The status quo this replaces. What the customer does today, how it fails them.
5. **Solution paragraph.** What the product does — still WHAT, not HOW. No architecture, no technology names unless they're part of the customer-visible value.
6. **Customer quote.** A real-sounding quote from a named persona. The quote should be something a skeptical friend would send without rolling their eyes.
7. **How to get started.** Concrete first action — link, install command, CTA.
## Quality bars — embodied, not listed
Challenge every draft against these. Do NOT read them aloud to the user. Catch violations as they happen.
- **Mom test.** Would a smart non-expert understand what this is and why it matters? If not, there's jargon to kill or abstraction to concretize.
- **So-what test.** For every claim, ask: *and? so what? why does the customer care?* Each line has to survive the drill.
- **No weasel words.** Ban *significantly, revolutionary, seamless, cutting-edge, leverage, enterprise-grade, best-in-class, robust.* If the user reaches for one, ask: *measured how, compared to what?*
- **No technology-for-its-own-sake.** *"Uses AI"* is not a benefit. *"Writes your status update in 10 seconds"* is. Only name technology when it's the customer-visible value.
- **Real quote.** *"This saves our team hours"* is a placeholder, not a quote. A real quote sounds like a real person complaining about a specific frustration or celebrating a specific outcome. Name the persona specifically (not "Jane, IT manager").
## Concept-type calibration
| Section | Commercial | Internal | OSS |
|---|---|---|---|
| Headline frame | *"Company X launches Y"* | *"Team X rolls out Y"* | *"Project X releases Y"* |
| Sub-heading | customer benefit, market position | operational leverage, cost avoided | adopter benefit, ecosystem fit |
| Problem | what paying customers put up with today | what employees waste hours on today | what contributors / adopters re-solve today |
| Customer quote | paying customer persona | internal user with a specific role | early adopter, maintainer, or downstream user |
| How to get started | *"Sign up at..."* | *"Available in `<internal tool>` now"* | *"`pip install X`"*, *"`git clone ...`"* |
## Coaching-notes capture
After Press Release is drafted and confirmed, write the Reasoning block in prfaq.md:
- **Rejected headlines.** Drafts considered and why dropped. One-liners — just enough to trace the decision.
- **Weasel words caught.** The phrases the coach pushed back on and what replaced them.
- **Differentiators explored.** Positioning choices made; alternatives discussed but not taken.
- **Out-of-scope that surfaced.** Technical constraints, timeline, team context that came up but don't belong in a press release.
## Transition gate
Read the full press release back to the user as a single block. Ask:
> "If I sent this to the most skeptical customer you know, would they read past the headline? If yes, we move to Customer FAQ. If you're hesitating, we fix it now — skepticism compounds in Stage 3."
Only move to Stage 3 after the user confirms. Update `stage: customer-faq-pending` in prfaq.md frontmatter and load `references/customer-faq.md`.
## When the user is stuck
Do not repeat questions harder. Offer concrete alternatives:
- **Two headline drafts, different frames.** *"Headline A leads with the benefit; headline B leads with the pain relieved. Which matches how your customer would describe it?"*
- **A counter-example.** *"Here's what the bad version sounds like: `<weasel-word-heavy draft>`. What would your customer say that version misses?"*
- **A forced constraint.** *"Write the opening paragraph in 40 words or fewer. What survives the cut is the actual value."*
- **Borrow from research.** *"The research report's Key Insights #2 said `<X>`. Does that belong in the sub-heading, the problem paragraph, or neither?"*
The goal is always to give the user something concrete to react to. A blank prompt in Stage 2 is a failure of coaching.
FILE:references/prfaq-template.md
# PRFAQ Document Template
Use this skeleton when creating `.beagle/concepts/<slug>/prfaq.md` at the end of Ignition. Fill in the Ignition section immediately; leave Stages 2-5 as empty headings to be completed as each stage transitions. Update the `stage` frontmatter field at every transition.
## The `stage` field
Valid values, in order:
- `ignition-pending` — only written if you pause mid-Ignition (rare)
- `ignition-complete` — Ignition done; Press Release not started
- `press-release-pending` — about to start Press Release
- `press-release-complete` — Press Release drafted and confirmed
- `customer-faq-pending`
- `customer-faq-complete`
- `internal-faq-pending`
- `internal-faq-complete`
- `verdict-pending`
- `pass` — terminal success
- `fail` — terminal failure
Resume-from-stage reads only the first 40 lines of prfaq.md to find this field. Keep the frontmatter at the top of the file.
## Reasoning blocks
Each stage has a visible `### <Stage> Reasoning` subsection — prose, not HTML comments. This keeps the PRFAQ readable as a decision artifact: anyone opening the file in the future sees what was challenged, which framings were rejected, and how research findings shaped the framing. The Reasoning block is part of the stage's output, not metadata.
## Skeleton
Copy everything below the horizontal rule into the new prfaq.md, then fill in.
---
```markdown
---
name: <concept headline — human-readable one line>
slug: <kebab-case slug>
concept_type: commercial | internal | oss
stage: ignition-complete
created: YYYY-MM-DD
---
# <concept name> — PRFAQ
## 1. Ignition
**Customer:** <specific persona — not "developers">
**Problem:** <what the customer does today and why it fails them>
**Stakes:** <what happens if this works; what happens if it doesn't — both sides>
**Solution sketch:** <one paragraph — high-level WHAT, never HOW>
**Concept type:** <commercial / internal / oss — one-sentence rationale>
### Ignition Reasoning
- **Challenged assumptions:** <what the coach pressure-tested during the opening exchanges>
- **Rejected framings:** <alternatives considered and why dropped>
- **Companion findings:** <pointers to `analysis/report.md` and `research/report.md`, and one sentence each on how they shaped the framing>
- **Unverified claims:** <only if web-tools-unavailable was hit — claims that would have been checked>
## 2. Press Release
**Headline:** `<CITY, DATELINE>` — <one-line announcement>
**Sub-heading:** <concrete customer benefit in one sentence>
**Opening paragraph:**
<who's announcing, what it is, who it's for, why now. Plain English.>
**Problem paragraph:**
<the status quo this replaces — what the customer does today, how it fails them>
**Solution paragraph:**
<what the product does — WHAT, not HOW. No architecture, no technology names unless part of customer value.>
**Customer quote:**
> "<real-sounding quote — a specific frustration relieved or a specific outcome achieved>"
> — <named persona, title, company/context>
**How to get started:**
<concrete first action — link, URL, install command, whatever's real>
### Press Release Reasoning
- **Rejected headlines:** <drafts considered and why dropped>
- **Weasel words caught:** <phrases the coach pushed back on>
- **Differentiators explored:** <positioning choices made; alternatives discussed but not taken>
- **Out-of-scope that surfaced:** <technical constraints, timeline, team context that came up but don't belong in the PR>
## 3. Customer FAQ
### Q1: <hard skeptic question>
<honest answer — specific, committed, no hedging>
### Q2: <hard skeptic question>
<honest answer>
### Q3: <hard skeptic question>
<honest answer>
<... 6-10 questions total. At least one should be "the hard question they're afraid of" — the objection the user most wants to avoid.>
### Customer FAQ Reasoning
- **Gaps revealed:** <what the questions surfaced that the Press Release glossed over>
- **Trade-offs decided:** <for each gap: launch blocker / fast follow / accepted limitation — and why>
- **Competitive intelligence:** <comparisons that came up, with pointers to research/report.md where relevant>
- **Scope signals:** <MVP-in and MVP-out claims made during the Q&A>
## 4. Internal FAQ
### Q1 (Engineer): <feasibility question>
<honest answer>
### Q2 (Finance / ROI / Sustainability): <economics question, calibrated by concept type>
<honest answer>
### Q3 (Legal / Compliance): <regulatory / privacy / licensing question>
<honest answer>
### Q4 (Ops): <on-call / support / maintenance question>
<honest answer>
### Q5 (CEO-analog): <strategic fit question>
<honest answer>
<... 6-10 questions total across the panel. Rotate stakeholder voices. At least one should be "the thing that keeps them up at night" — the unspoken risk.>
### Internal FAQ Reasoning
- **Feasibility risks:** <what survived scrutiny, what didn't>
- **Resource / timeline estimates:** <with honesty markers: guessed / based-on-similar-work / costed-in-detail>
- **Unknowns with action:** <for each unknown: what it would take to find out, by when>
- **Strategic positioning decisions:** <named bets, named alternatives rejected>
- **Constraints surfaced:** <technical dependencies, legal/compliance boundaries, team availability>
## 5. Verdict
**Result:** PASS | FAIL
### Forged in steel
<parts of the concept that are clear, compelling, and defensible — these survived pressure>
### Cracks in the foundation
<for PASS: minor gaps worth watching; for FAIL: the lethal cracks>
- <crack> — to address: <what it would take>
- <crack> — to address: <what it would take>
<on PASS, add:>
### Handoff
Brief written to `.beagle/concepts/<slug>/brief.md`. Consumed by `brainstorm-beagle` to produce the spec.
<on FAIL, add:>
### What would need to change
1. <concrete change>
2. <concrete change>
### Where to re-enter
Stage `<N>: <name>`. <One sentence on why starting there, not from scratch — the earlier stages that succeeded are reusable.>
```
FILE:references/verdict.md
# Stage 5: Verdict, Brief, Fail Feedback
Binary outcome. No three-tier middle ground — a concept that is "promising but needs work" returns **FAIL** with targeted feedback naming exactly what would need to change. That is the filter's whole job.
Load this file when prfaq.md `stage` is `verdict-pending`. Terminal states are `pass` or `fail`.
## Rubric
The concept **PASSES** when ALL of these are true after the full gauntlet:
1. **Customer is named.** A specific persona, not a category. *"Enterprise platform engineers at companies with >200 engineers and a dedicated developer-productivity team"* beats *"developers."*
2. **Problem is concrete.** Described in terms of what the customer does today and why it fails them.
3. **Stakes are clear on both sides.** What happens if this works AND what happens if it doesn't.
4. **Solution survived the Press Release drill.** No weasel words, no technology-for-its-own-sake, passes the mom test and the so-what test.
5. **Customer FAQ has no lethal gaps.** Trade-offs are committed (launch-blocker / fast-follow / accepted). Comparisons are honest.
6. **Internal FAQ has no lethal gaps.** Feasibility, economics (calibrated per concept type), and risk have honest answers. Unknowns have actions and timelines.
The concept **FAILS** when one or more of:
- Customer remained vague through all four prior stages.
- Problem is a solution-in-search-of-a-problem (the user leads with what they want to build, not what the customer needs).
- Stakes are missing on one side (*"this would be cool"* without *"and if we don't, here's what breaks"*).
- Press Release still required weasel words to sound compelling.
- Customer FAQ has a *skeptic-closes-the-tab* question with no plan for it.
- Internal FAQ has feasibility / economic / legal red flags with no mitigation.
- Research findings directly contradict a load-bearing claim and the user did not reconcile the contradiction.
## Framing: "forged in steel" vs "cracks in the foundation"
Present the verdict as specific findings, not a score.
- **Forged in steel.** Parts of the concept that are clear, compelling, and defensible — these survived pressure.
- **Cracks in the foundation.** Genuine risks or unresolved contradictions. For every crack, name what it would take to address.
Present the verdict directly. Do NOT soften it to be polite. A soft verdict defeats the filter. Constructive framing ≠ soft framing — name the cracks specifically and name what closing them looks like.
## On PASS: produce the brief
Write `.beagle/concepts/<slug>/brief.md`. The brief is a **context handoff** for `brainstorm-beagle`, not a deliverable — brief quality is not gated; `brainstorm-beagle` runs its own discovery on top. Keep it dense and factual; no coaching notes in the brief itself (those live in prfaq.md Reasoning blocks).
### brief.md template
```markdown
# <concept name> — Concept Brief
**Status:** PASSED PRFAQ on YYYY-MM-DD
**Slug:** <slug>
**Concept type:** commercial | internal | oss
## Customer
<specific persona — one paragraph, straight from prfaq.md Ignition>
## Problem
<what the customer does today and why it fails them — one paragraph>
## Solution concept
<one-paragraph distillation from the Press Release solution section — WHAT, not HOW>
## Stakes
- **If this works:** <outcome>
- **If it doesn't:** <outcome>
## Forged decisions
<bulleted list of decisions committed during the gauntlet:
- MVP-in vs MVP-out choices
- Launch-blocker vs fast-follow vs accepted-gap calls
- Concept-type-specific scope bounds
- Named alternatives rejected and why>
## Open questions
<bulleted list — things the PRFAQ surfaced but did not close. Each with a one-sentence "what it would take to answer" note so brainstorm-beagle knows whether to resolve inline or defer.>
## Research pointers
- Artifact analysis: `.beagle/concepts/<slug>/analysis/report.md`
- Web research: `.beagle/concepts/<slug>/research/report.md`
## PRFAQ reference
`.beagle/concepts/<slug>/prfaq.md` — full drill transcript including Reasoning blocks per stage.
```
### Update prfaq.md Verdict section
```markdown
## 5. Verdict
**Result:** PASS
### Forged in steel
- <finding>
- <finding>
- <finding>
### Cracks worth watching
- <minor gap> — to close: <what it would take>
- <minor gap> — to close: <what it would take>
### Handoff
Brief written to `.beagle/concepts/<slug>/brief.md`. Consumed by `brainstorm-beagle` to produce the spec.
```
Set `stage: pass` in frontmatter.
### Tell the user
> "PRFAQ passed. Brief written to `.beagle/concepts/<slug>/brief.md`. Run `brainstorm-beagle` next — it'll auto-ingest the brief and skip most of its normal discovery."
## On FAIL: targeted feedback
**No brief is produced.** The PRFAQ failed its filter; manufacturing a brief would defeat the purpose.
### Update prfaq.md Verdict section
```markdown
## 5. Verdict
**Result:** FAIL
### What broke
<one paragraph — the specific breakage. Not "it needs more work." Name which stage's question has no honest answer, or which claim doesn't survive the research findings, or which lethal gap was not closed.>
### Cracks in the foundation
- <crack> — to address: <what it would take>
- <crack> — to address: <what it would take>
- <crack> — to address: <what it would take>
### What would need to change
1. <concrete change>
2. <concrete change>
3. <concrete change>
### Where to re-enter
Stage `<N>: <stage name>`. <One sentence on why starting there. Stages earlier than N are reusable — prfaq.md, research/, and analysis/ stay in place; the user can pick up from <N> after addressing the cracks.>
```
Set `stage: fail` in frontmatter. Keep prfaq.md, `research/`, and `analysis/` in place for the re-run. Do NOT delete or archive.
### Tell the user
> "PRFAQ failed at `<stage>`. `<one-paragraph reason>`. See `.beagle/concepts/<slug>/prfaq.md` Verdict section for what to fix. When you've developed those, re-run — we'll resume from `<stage>`."
## Graceful redirect (handled at Ignition, referenced here)
Not every "fail" is a PRFAQ verdict. If during **Ignition** the user cannot articulate a customer or a problem after 2-3 exchanges, issue the *not-ready-to-filter* redirect instead:
> "This idea isn't ready for PRFAQ — it needs brainstorming first. Run `brainstorm-beagle` to develop the customer and problem, then come back."
In the redirect path, NO prfaq.md is written. No research or analysis runs. This is not a FAIL verdict — it is a pre-gauntlet hand-back. Recording it as a fail would pollute the PRFAQ file with non-verdicts.
The verdict rubric above applies only to concepts that made it through all four prior stages. If Ignition bounced the concept, the decision was already made — no Stage 5 is run.
## Why binary
A three-tier verdict ("forged / needs heat / cracked") sounds gentler but defeats the filter. The user walks away feeling graded, not filtered. PRFAQ exists to answer ONE question: *is this worth committing spec cycles to?* The answer is yes or no. Cracks worth watching go into the brief on pass; cracks worth NOT proceeding go into the fail feedback. Either way, the user leaves with a specific next action.
Use when the user wants a cited, structured read of local documents and project knowledge. Triggers on: "analyze these docs", "scan my project for context",...
---
name: artifact-analysis
description: "Use when the user wants a cited, structured read of local documents and project knowledge. Triggers on: \"analyze these docs\", \"scan my project for context\", \"read the docs folder\", \"summarize what's in .beagle/concepts/\", \"extract context from docs/\", \"what's in this folder\", \"go read everything in X and tell me what's there\". Also invoked programmatically by other beagle skills (prfaq-beagle Ignition, brainstorm-beagle reference points, strategy-interview context grounding) via the companion contract. Does NOT trigger on codebase lookups (\"find this function\", \"search the repo\"), web research (use web-research), LLM-as-judge evaluation (use llm-judge), or document editing (use humanize-beagle). Produces a written scan plan, parallel-subagent findings, and a cited synthesis report on disk — never inline prose, never unsourced claims."
---
# Artifact Analysis
Turn a set of local paths (or a beagle project's conventional knowledge locations) into a cited, structured extraction of insights, context, decisions, and raw detail.
The deliverable is always on disk: a written scan plan the caller can audit, one findings file per slice, and a synthesized report with path-anchored citations. Nothing returns as inline prose, and no claim ships without a source path + verbatim excerpt behind it.
## When to use
- A user asks for a local-document read — "analyze the docs folder", "scan the project for context", "extract what's in .beagle/concepts/".
- Another beagle skill invokes this one programmatically as a grounding companion (see `references/companion-contract.md`).
- The caller wants auditable output: a plan written before extraction, findings files per slice, and a citation-backed synthesis report.
## When NOT to use
- Codebase lookups ("where is this function defined", "grep for this symbol"). Use Grep/Glob.
- Web research. Use `web-research`.
- Comparative evaluation of two implementations or source credibility adjudication. Use `llm-judge`.
- Rewriting or editing the scanned documents. Use `humanize-beagle` or the file tools.
- PDF / image OCR / format conversion. First version reads plain text and markdown only. `beagle-core:docling` is the future path.
- Paywalled or authentication-gated remote sources. This is a local-filesystem primitive.
- Coaching, challenge, or reshaping of the caller's question. That belongs to the caller.
## Workflow
Four steps, in order. No step is skippable.
### Hard gates
Advance to the next step only when the **pass condition** is true—confirm using files under `output_dir` (and tool output), not memory.
| After | Pass condition |
| --- | --- |
| `plan.md` written | `plan.md` exists and includes intent, resolved paths, slices, per-slice briefs, skip patterns, budgets applied, and synthesis approach (same fields as **The scan plan (`plan.md`)**). |
| Subagent dispatch | Either the **empty corpus** path was taken (no subagents; `plan.md` documents zero readable documents) **or** every slice listed in `plan.md` has `findings/<slice-slug>.md` on disk. |
| `report.md` written | `report.md` exists; headings match `references/report-template.md` (seven sections plus `## Sources`). |
| Before return to caller | Every row of `references/failure-modes.md` → **Verification checklist (orchestrator runs at end)** is checked off, *or* any failed check is recorded under `## Gaps & Limitations` in `report.md` as that failure-modes file prescribes. |
1. **Write `plan.md`** — resolved paths (with any auto-discovery applied), intent summary (when provided), per-slice briefs, skip patterns, and how findings will be synthesized.
2. **Dispatch subagents** — spawn 1-3 parallel subagents over non-overlapping slices of the resolved paths. Each writes `findings/<slice-slug>.md` under `output_dir`.
3. **Synthesize `report.md`** — fold findings into the seven fixed sections with path-anchored citations.
4. **Verify before returning** — satisfy the last **Hard gates** row; execute the numbered checklist in `references/failure-modes.md` (**Verification checklist (orchestrator runs at end)**). Any check that fails becomes an entry in `Gaps & Limitations` per that file—do not return a deliverable with silent checklist failures.
```
Receive paths + optional intent ──→ Auto-discover if paths empty
↓
Write plan.md (no user-confirmation pause)
↓
Dispatch subagents (up to 3 parallel)
↓
Collect findings/<slice>.md files
↓
Synthesize report.md
↓
Return paths to caller
```
Unlike `web-research`, artifact-analysis does **not** pause for a plan review gate. Local scanning is cheap; `plan.md` is written for auditability so a reader weeks later can tell what each subagent was told. Unlike web-research, there is no fail-fast on missing tools — filesystem tooling (Read, Glob, Grep) is assumed present in the Claude Code environment.
## Inputs
| Field | Type | Required | Default | Purpose |
| ------------ | ------------------ | -------- | ------------- | ----------------------------------------------------------------------------------- |
| `intent` | string | no | — | What the caller is looking for / why. When absent, the skill extracts anything structurally important. |
| `paths` | list of strings | no | auto-discover | Directories and/or explicit files. When absent, auto-discover (see below). |
| `output_dir` | absolute path | no | derived | Where `plan.md`, `findings/`, and `report.md` land. |
| `refresh` | bool | no | `false` | When true, allow overwriting a prior run in the same `output_dir`. |
The skill does not parse caller-specific structures. Callers pass an intent string and/or a path list.
### Auto-discovery
When `paths` is absent or empty, scan beagle's conventional knowledge locations:
- `.beagle/concepts/` — concept specs and analysis folders.
- `.planning/` — roadmap, state, and phase artifacts.
- `docs/` — project documentation.
- Top-level files matching `README*`, `BRIEF*`, `OVERVIEW*`, `CONTEXT*`, `CLAUDE.md`, `AGENTS.md`.
Resolved paths (including any auto-discovery) are listed verbatim in `plan.md` so the caller can see exactly which files were included.
### Intent modes
- **`intent` present** — extraction is targeted to what is relevant to the intent. Off-topic material goes into `Raw Detail Worth Preserving` only when it is a specific quote or metric worth keeping.
- **`intent` absent** — generic-salient-extraction mode. Subagents extract anything structurally important (insights, decisions, technical constraints, user/market context) without an interpretive filter.
## Output location
If the caller passes `output_dir`, use it verbatim. Otherwise derive the default:
```
.beagle/analysis/<slug>/
```
**Slug derivation** (stable so re-running the same input on the same day lands on the same folder):
1. If `intent` is present, slug from the intent string: lowercase, strip punctuation, collapse whitespace to single hyphens, truncate to 60 characters on a word boundary (cut at the last hyphen before 60; if no hyphen exists before position 60, hard-cut at 60).
2. If `intent` is absent, slug from the first scanned path's basename using the same rules.
3. Prepend `YYYY-MM-DD-`.
**Re-run protection.** Before writing anything, check whether `output_dir` already contains `plan.md` or `report.md`. If it does and `refresh` is not `true`, refuse with a message naming the existing folder. When `refresh: true`, archive the prior contents into `<output_dir>/.archive-<timestamp>/` before starting fresh. See `references/failure-modes.md`.
Callers embedding artifact-analysis in a concept-folder convention (e.g. `prfaq-beagle`) pass `output_dir` explicitly so the analysis sits next to its consumer: `.beagle/concepts/<concept-slug>/analysis/`.
## The scan plan (`plan.md`)
The plan is written before any subagent runs. It is the audit trail, not a review gate — the skill does not pause for user confirmation.
`plan.md` contains:
- **Intent** — the input string, verbatim, or `"generic salient extraction"` when intent is absent.
- **Resolved paths** — every path that will actually be scanned, with a note next to any entry that came from auto-discovery vs. caller-specified.
- **Slices** — how the resolved paths are partitioned across 1-3 subagents. Slices are non-overlapping.
- **Per-slice briefs** — one paragraph per slice summarizing what that subagent is told to extract. Derived mechanically from the spec so a reader of `plan.md` can predict what each subagent was told.
- **Skip patterns** — the denylist applied to this run (see `references/skip-patterns.md`).
- **Budgets applied** — subagent count for this run (1-3) and the skim threshold in effect (see Budget defaults below).
- **Synthesis approach** — how the per-slice findings will combine into `report.md`.
Report: `Wrote plan.md` and proceed to dispatch. No pause, no gate.
## Subagent dispatch
Up to 3 subagents run concurrently over non-overlapping slices. Each gets a mechanically-derived brief built from `plan.md` — no interpretation drift between the plan and the briefs. The brief template lives in `references/subagent-brief.md`.
Each subagent:
- Scans its assigned slice of paths, honoring skim strategies (sharded-doc: read index first; large-doc: TOC/headings first) and skip patterns.
- Writes `findings/<slice-slug>.md` under `output_dir`.
- Returns one terse status line to the orchestrator (path + status), never inline findings.
The orchestrator waits for all subagents to finish, then verifies every expected findings file exists before synthesis. A missing file is a silent failure, recorded in `Gaps & Limitations` — see `references/failure-modes.md`.
### Skim strategies
Subagents do not read everything end-to-end. They apply:
- **Sharded documents** (folder with `index.md` plus multiple sub-files) — read `index.md` first, then only the sub-files the index points to as relevant.
- **Large documents** (single file > ~50 pages or > ~2000 lines) — read the TOC, executive summary, and section headings first; pull full content only from sections relevant to the intent (or structurally important when intent is absent). The findings file records which sections were skimmed vs. read fully.
- **Short documents** (single file, moderate size) — read end-to-end.
### Skip patterns
Sensitive, binary, and vendor/build paths are skipped silently without the caller re-specifying them. The default denylist lives in `references/skip-patterns.md`. Each subagent records the paths it skipped under the `paths_skipped` frontmatter field so the audit trail shows exactly what was excluded.
## Citations
Every claim in a findings file and in `report.md` carries a citation. The shape lives in `references/citation-schema.md`. At a glance:
- **Required fields:** `path` (relative to the scanned root when possible), `excerpt` (a verbatim quoted string from the document).
- **Optional fields:** `lines` (line range or single line, only when the subagent naturally has them — never synthesized), `heading` (the nearest enclosing heading), `document_type`.
Inline references use `[^n]` footnotes; the full citation sits in the numbered `Sources` section at the bottom of the report.
## Synthesis (`report.md`)
The report has a fixed seven-section layout, in this order. Every section is required, every time — even when a section is thin, the report includes a bullet saying so.
1. `## Documents Found` — each included path with a one-line note on its relevance.
2. `## Key Insights` — the highest-signal observations from the corpus, grouped by theme.
3. `## User / Market Context` — users, customers, competition, market data surfaced from the documents.
4. `## Technical Context` — platforms, constraints, integrations, dependencies.
5. `## Ideas & Decisions` — each tagged `accepted`, `rejected`, or `open` with rationale. Rejected ideas are preserved deliberately so future work does not re-propose them.
6. `## Raw Detail Worth Preserving` — specific quotes, data points, metrics, and other detail that would be lost to summarization.
7. `## Gaps & Limitations` — what the corpus could not establish; which paths were empty, skipped, or unreadable; which subagents failed.
`Gaps & Limitations` is required even when the scan looks complete. Honest accounting of what was and was not in the corpus is part of the product. The full literal skeleton the skill copies from lives in `references/report-template.md`.
## Failure modes
- **Partial success** — one or more subagents fail. The skill continues with what succeeded and enumerates each failed slice under `Gaps & Limitations`, including the last-known brief and the stub-file reason.
- **Empty corpus** — path resolution (auto-discovery + explicit paths) yields zero readable documents. The skill writes `plan.md` and a minimal `report.md` with a single "no documents found" bullet under `Gaps & Limitations`, does not spawn subagents, and returns cleanly to the caller. Callers decide how to proceed.
- **Silent-failure detection** — every subagent writes at least a stub `findings/<slice-slug>.md` with `status:` frontmatter (`ok`, `empty`, `failed`) before returning. Missing file after dispatch = silent failure, recorded in `Gaps & Limitations`.
- **Re-run protection** — covered under "Output location" above; details in `references/failure-modes.md`.
Unlike `web-research`, artifact-analysis does **not** fail-fast on missing tools. Filesystem tooling (Read, Glob, Grep) is assumed present in the Claude Code environment. If it is somehow absent, the skill will surface that as a subagent failure under partial-success rather than aborting the whole run.
Full rules and the structured error shape live in `references/failure-modes.md`.
## Budget defaults
Tunable knobs, not hard-coded invariants:
| Knob | Default |
| ----------------------- | -------------- |
| Parallel subagents | 1-3 |
| Slice overlap | none (enforced) |
| Skim threshold (large) | > ~2000 lines or > ~50 pages |
A caller that needs narrower or broader scope overrides by passing a more specific `paths` list — one path per subagent forces narrower slicing; a single folder lets the skill partition internally.
## Companion invocation contract
Other beagle skills invoke this one via a small, documented contract. The minimal call passes only `intent`; the full call adds `paths`, `output_dir`, and `refresh`.
Worked examples for the three known callers (`prfaq-beagle`, `brainstorm-beagle`, `strategy-interview`) plus the success and refusal return shapes live in `references/companion-contract.md`. Callers are expected to honor the contract verbatim rather than invent parallel invocation styles.
## Tone
This skill is a tone-neutral primitive. It does not:
- Coach the caller on which documents matter.
- Adjudicate document quality or claim credibility (that is `llm-judge`'s job).
- Reshape the caller's intent into a different question.
- Adopt a posture (hardcore, Socratic, warm) — that is the caller's job.
- Editorialize in findings or the report.
If the caller is a coaching skill (`prfaq-beagle`, `brainstorm-beagle`), the coaching happens before and after this skill runs. Inside this skill, the intent is treated as final.
## Out of scope
- Scanning paywalled or authentication-gated remote sources. Use the file tools to extract content first, then pass paths in.
- LLM-as-judge evaluation of document quality or claim credibility. Use `llm-judge`.
- Coaching, challenge, or opinionated reshaping of the intent.
- Rewriting or editing the scanned documents. Read-only by design.
- Binary / image OCR, PDF text extraction, or format conversion. First version reads plain text and markdown only; `beagle-core:docling` is the future path.
- Multi-language analysis. English-only today.
- Caching or re-use of prior findings across invocations.
- Long-running or scheduled scans.
## Reference files
- `references/subagent-brief.md` — template the orchestrator mechanically fills from `plan.md` when dispatching each subagent.
- `references/citation-schema.md` — required and optional citation fields, footnote convention, and a well-formed example.
- `references/report-template.md` — literal `report.md` skeleton with all seven fixed sections.
- `references/failure-modes.md` — partial-success, empty-corpus, silent-failure detection, and re-run protection rules.
- `references/companion-contract.md` — programmatic invocation shape with worked examples for `prfaq-beagle`, `brainstorm-beagle`, and `strategy-interview`.
- `references/skip-patterns.md` — default denylist (sensitive, binary/media, vendor/build) applied to every run.
FILE:references/citation-schema.md
# Citation Schema
Every claim in a findings file and in `report.md` carries a citation. The shape is small on purpose — enough metadata to verify without re-reading the full document, not so much that subagents start fabricating fields.
## Required fields
- **`path`** — the document the claim was drawn from. Relative to the scanned root when possible (e.g. `docs/architecture.md`); absolute only when the path lives outside the scanned tree.
- **`excerpt`** — a verbatim quoted string from the document that supports the claim. Keep it short enough to read at a glance (typically 1-3 sentences), long enough to stand on its own.
If either required field is missing or would have to be fabricated, do not include the citation. Drop the claim or mark it as unverified in `Gaps & Limitations`.
## Optional fields
Include only when the subagent naturally has them. Never synthesize.
- **`lines`** — line range (`L42-L58`) or single line (`L42`) where the excerpt appears. Include when reading a text/markdown file with stable line numbers. Omit for documents where line granularity is ambiguous (prose documents without visible line numbers, transcluded markdown, rendered output). Never guess.
- **`heading`** — the nearest enclosing heading, copied verbatim. Useful for anchoring the reader in long documents.
- **`document_type`** — one of:
- `spec` — specification, requirements doc, or formal brief.
- `adr` — architecture decision record.
- `readme` — README, overview, or project-root context doc.
- `planning` — roadmap, state, phase, or plan artifact.
- `concept` — beagle concept spec or analysis.
- `transcript` — meeting notes, interview transcript, or chat log.
- `other` — anything that does not fit.
Omit fields you do not have. An incomplete citation with two real fields beats a five-field citation with three guessed values.
## Footnote convention
In findings and in `report.md`, claims use `[^n]` inline footnote markers:
```markdown
The skill must not pause for user confirmation before spawning subagents[^3].
```
The numbered `Sources` section at the bottom of `report.md` lists each citation in order, matching the footnote number. Numbering is global across the report, not per-section.
## Example — well-formed citation block
In `report.md`, the `Sources` section entries look like this:
```markdown
[^1]: **Path**: .beagle/concepts/artifact-analysis/spec.md
**Excerpt**: "Partial-success behavior: if a subagent fails, the skill continues with the remaining findings and records the failure explicitly under Gaps & Limitations."
**Lines**: L29
**Heading**: Requirements
**Document type**: spec
[^2]: **Path**: docs/architecture.md
**Excerpt**: "All long-running jobs run on the worker queue, never the request thread."
**Heading**: Background Processing
**Document type**: readme
```
Citation `[^2]` omits `Lines` because the subagent did not have a stable line-number anchor. That is correct behavior — omit rather than guess.
## Incomplete-metadata policy
If a claim is genuinely load-bearing but the document lacks a clear line number or heading, keep the citation with the required fields (`path` + `excerpt`) only. Do not invent a line number to satisfy a schema slot. If the absence makes the claim hard to verify, note the uncertainty in `Gaps & Limitations` rather than citing confidently.
FILE:references/companion-contract.md
# Companion Invocation Contract
Other beagle skills invoke `artifact-analysis` via this contract. It is small on purpose — no required inputs (!), three optional parameters for scoping and one for overwrite control, two return shapes.
Callers are expected to honor the contract verbatim rather than invent parallel invocation styles. If a new caller needs behavior that the contract does not support, extend the contract here first, not in the calling skill.
## Minimal call
```yaml
# No parameters at all — auto-discover every beagle knowledge location, generic-salient-extraction mode.
```
With no inputs, the skill scans `.beagle/concepts/`, `.planning/`, `docs/`, and top-level README/brief/overview files, applies the default skip denylist, and extracts anything structurally important. Output lands at `.beagle/analysis/<YYYY-MM-DD>-<first-path-basename>/`.
## Full call
```yaml
intent: "competitive and technical grounding for PRFAQ on AI coding-assistant pricing"
paths:
- .beagle/concepts/ai-coding-pricing/
- docs/
output_dir: "/abs/path/.beagle/concepts/ai-coding-pricing/analysis/"
refresh: false
```
## Input semantics
- **`intent`** — one string describing what the caller wants. When present, extraction is weighted toward relevance to the intent. When absent, the skill runs in generic-salient-extraction mode. Tone-neutral; the skill does not reshape the intent.
- **`paths`** — directories and/or explicit files. When absent or empty, the skill auto-discovers beagle's conventional knowledge locations (see SKILL.md §Auto-discovery). When present, only the listed paths are scanned — auto-discovery does not run.
- **`output_dir`** — absolute path. Callers embedding artifact-analysis in their own concept folder (e.g. `prfaq-beagle`) pass this explicitly so the analysis sits next to its consumer.
- **`refresh`** — when `true`, a prior run in `output_dir` is archived to `<output_dir>/.archive-<timestamp>/` before the new run starts. See `failure-modes.md` for the archive rule.
## Return shapes
This skill returns one of two shapes. Callers that invoke multiple beagle companions should handle the union of error codes across all companions they call — sibling companions (e.g. `web-research`) may return different error codes (notably `web-tools-unavailable`).
- **success** — artifacts written. An empty-corpus run still returns this shape with a minimal `report.md`; the caller detects empty-corpus by reading the report, not by catching an error.
- **error: `prior-run-present`** — `output_dir` already holds a prior run and `refresh` is false; nothing written.
### Success
```yaml
plan: "<output_dir>/plan.md"
report: "<output_dir>/report.md"
findings_dir: "<output_dir>/findings/"
```
The caller receives absolute paths. All evidence lives on disk — nothing returns inline.
### Success with empty corpus
Empty-corpus is **not** an error. When path resolution yields zero readable documents, the skill still returns the Success shape above: `plan.md` exists (with an empty `Resolved paths` list), `report.md` exists (with every section present plus a `Gaps & Limitations` entry explaining no readable documents were found), and `findings_dir` exists (empty). Callers detect empty-corpus by reading `report.md`, not by catching an error.
### Refused (prior run present, no refresh)
```yaml
error: "prior-run-present"
detail: "<output_dir> already contains plan.md or report.md. Pass refresh: true to archive and overwrite."
```
The caller decides whether to retry with `refresh: true`, pick a different `output_dir`, or surface the refusal to its user.
## Worked examples
### `prfaq-beagle` — Ignition grounding
The PRFAQ's Ignition phase needs competitive and technical grounding from the user's own documents (PRDs, briefs, prior decisions). PRFAQ passes:
```yaml
intent: "competitive and technical grounding for PRFAQ on AI coding-assistant pricing"
paths: [] # empty → auto-discover .beagle/concepts/, .planning/, docs/
output_dir: "/abs/path/.beagle/concepts/ai-coding-pricing/analysis/"
refresh: false
```
Empty `paths` triggers auto-discovery so PRFAQ does not have to know where the user keeps briefs. `output_dir` lands inside the PRFAQ concept folder so the audit trail travels with the concept.
### `brainstorm-beagle` — reference-point grounding
Mid-brainstorm, the user says "go read the docs folder and tell me what we already decided." Brainstorm calls:
```yaml
intent: "prior decisions and reference points relevant to the current idea"
paths:
- docs/
- .beagle/concepts/
output_dir: "/abs/path/.beagle/concepts/task-sub-tasks/analysis/"
refresh: false
```
Explicit `paths` keeps the scan narrow. `output_dir` lands inside the brainstorm concept folder so the reference points can link straight to `report.md`.
### `strategy-interview` — context grounding
During strategy interview Phase 1 discovery, the user needs a structured read of prior strategy artifacts. Strategy-interview calls:
```yaml
intent: "background context for platform-team H1 2026 strategy discovery"
paths:
- .beagle/strategy/
- docs/
output_dir: "/abs/path/.beagle/strategy/platform-team-h1-2026/analysis/"
refresh: false
```
`output_dir` lands inside the strategy interview's working-state folder so the analysis sits alongside `state.md`, `evidence.md`, and `composition.md`.
### Standalone user — "analyze these docs"
A user types "read everything in docs/ and tell me what's there" with no prior context. The skill is triggered directly and runs:
```yaml
# Triggered by user chatter. No intent passed → generic-salient-extraction mode.
paths:
- docs/
# output_dir defaults to .beagle/analysis/<YYYY-MM-DD>-docs/
```
The user gets a `report.md` they can open immediately — no caller-specific wrapping.
## Non-obligations
The contract is explicit about what this skill does **not** do:
- **No intent reshaping.** The caller hands in an intent (or nothing). The skill does not argue with it, sharpen it, or reframe it.
- **No coaching posture.** `artifact-analysis` is tone-neutral. Callers that need a coaching tone (`prfaq-beagle`'s hardcore coach, `brainstorm-beagle`'s thinking partner) apply that tone before and after, not inside.
- **No inline findings.** Every deliverable is a file. Callers that want inline prose should summarize from `report.md` themselves.
- **No cross-run caching.** Each invocation stands alone. Callers that need caching build it themselves at the call site.
- **No doc editing or rewriting.** Read-only by design.
## Extending the contract
If a new caller needs behavior not covered here, add a field to the input table in SKILL.md first, document it in this file with a worked example, then update caller skills to use it. Parallel-invocation styles fragment the contract and re-introduce the reason this skill exists.
FILE:references/failure-modes.md
# Failure Modes
Four failure cases the skill handles explicitly. Silent failures are the worst kind — every rule below exists to make a failure visible to the caller and preserve what succeeded.
Unlike `web-research`, artifact-analysis does **not** fail-fast on missing tools. Filesystem tooling (Read, Glob, Grep) is assumed present in the Claude Code environment. If it is somehow absent, that surfaces as a per-subagent failure under partial-success rather than aborting the whole run.
## Partial success
One or more subagents fail; others return valid findings.
**Behavior:** continue with the successful findings. Do not abort the run.
In `report.md`, under `Gaps & Limitations`, enumerate every failed slice:
- Name the slice.
- Include the subagent's last-known brief (or a one-line summary of what it was asked to scan).
- Include the `reason` line from the stub findings file (see "Silent-failure detection" below).
Example:
```markdown
## Gaps & Limitations
- **Slice "planning-folder"** (status: failed) — subagent returned "Read timeout on .planning/phase-3/PLAN.md after 3 retries". Caller may retry this slice alone, or re-run with `refresh: true` after investigating the file.
```
## Empty corpus
Path resolution (auto-discovery + explicit paths) yields zero readable documents after the skip denylist is applied.
**Behavior:** do not spawn subagents. Return cleanly.
The skill writes:
- `plan.md` with an empty `Resolved paths` list and a note that no readable documents were found.
- `report.md` with every section present but containing a single bullet each, plus a `Gaps & Limitations` entry:
```markdown
## Gaps & Limitations
- No readable documents found under the resolved paths. The caller passed <paths> and auto-discovery surfaced <auto-discovered paths>; every entry was either absent, empty, or matched the skip denylist. Pass a broader `paths` list, or point at a different folder.
```
Returning cleanly lets callers handle "no documents found" gracefully (e.g. fall back to asking the user to paste content) rather than crashing on an expected-but-uncommon state.
## Silent-failure detection (stub-file rule)
Context exhaustion and tool errors can cause a subagent to return without producing any output file. The orchestrator has no way to distinguish that from "the subagent finished but the file is missing for some other reason" — so the contract requires every subagent to write at least a stub file before returning.
**Contract (enforced by `subagent-brief.md`):**
- Every subagent writes `findings/<slice-slug>.md` with a `status:` frontmatter field: `ok`, `empty`, or `failed`.
- On `empty` or `failed`, the file includes a one-line `reason:` field.
- On `ok`, `reason` is omitted.
**Orchestrator check, post-dispatch:**
For every expected slice, test that the findings file exists. For any missing file, record a silent-failure entry under `Gaps & Limitations`:
```markdown
- **Slice "<name>"** — subagent returned without producing a findings file (likely context exhaustion or tool error). Last known brief: "<brief summary>".
```
This is why "legitimately empty" results must use `status: empty` with a reason rather than writing nothing — so empty-but-ok is never confused with silent context loss.
## Re-run protection
Each run is supposed to be self-contained and auditable. Silently overwriting a prior run destroys the audit trail; silently appending produces incoherent findings.
**Rule:** before writing anything to `output_dir`, check whether it already contains `plan.md` or `report.md`.
- **If it does and `refresh` is not `true`:** refuse with a message naming the existing folder.
```
Refusing to write: <output_dir> already contains a prior analysis run. Pass `refresh: true` to archive and overwrite, or choose a different output_dir.
```
- **If it does and `refresh: true`:** move the existing contents to `<output_dir>/.archive-<YYYYMMDD-HHMMSS>/` first, then proceed with a fresh run. The archive preserves the audit trail.
- **If it does not:** proceed normally.
This rule applies even when the default slug matches a prior run on the same day — stable slugs are a feature (callers can re-derive the folder), but the user must explicitly opt in to overwriting.
## Verification checklist (orchestrator runs at end)
Before returning success to the caller, verify:
- [ ] `plan.md` exists at `<output_dir>/plan.md`.
- [ ] `findings/<slice-slug>.md` exists for every slice in `plan.md` (unless the empty-corpus path was taken).
- [ ] Every findings file has `status:` frontmatter.
- [ ] Every `status: empty` or `status: failed` file has a `reason:` line.
- [ ] `report.md` exists at `<output_dir>/report.md`.
- [ ] `report.md` has all seven top-level sections in order: `Documents Found`, `Key Insights`, `User / Market Context`, `Technical Context`, `Ideas & Decisions`, `Raw Detail Worth Preserving`, `Gaps & Limitations`.
- [ ] `report.md` has a `Sources` section and every `[^n]` footnote in the body has a matching entry.
Any check that fails becomes an entry in `Gaps & Limitations` — the run does not silently produce a broken deliverable.
FILE:references/report-template.md
# Synthesis Skeleton
The synthesis document, saved as `report.md` under the run's output directory, uses a fixed seven-section layout. Sections appear in this order, every time, even when one is short. `Gaps & Limitations` is required even when findings look complete — honest accounting of what could not be established is part of the product.
Copy the skeleton below into the synthesis file and fill each section.
## Layout
```markdown
# Analysis: <intent, verbatim from plan.md — or "Generic extraction of <slug>" when intent is absent>
## Documents Found
- `<relative path 1>` — <one-line note on what this document covers and why it was included>
- `<relative path 2>` — <...>
- `<relative path N>` — <...>
Include every path that was actually scanned (end-to-end or skimmed). Paths that were under the skip denylist go in `Gaps & Limitations`, not here.
## Key Insights
Group by theme. Every claim carries a [^n] footnote.
### <Theme 1>
<Bullets or short paragraphs. Each factual claim has a footnote.>
### <Theme 2>
<...>
### <Theme N>
<...>
## User / Market Context
<Users, customers, competition, market data surfaced from the documents. Footnote every specific claim. If the corpus does not contain user/market content, write a single bullet: "No user/market content surfaced by this scan." — do not delete the section.>
## Technical Context
<Platforms, constraints, integrations, dependencies, operational considerations. Footnote every specific claim. If the corpus does not contain technical content, write a single bullet noting so — do not delete the section.>
## Ideas & Decisions
Every idea carries a tag — `accepted`, `rejected`, or `open` — and rationale. Preserve rejected ideas so future work does not re-propose them.
- **[accepted]** <idea> — <rationale from the document> [^n]
- **[rejected]** <idea> — <rationale for rejection> [^n]
- **[open]** <idea> — <what is still undecided> [^n]
## Raw Detail Worth Preserving
Specific quotes, metrics, and data points that would be lost to summarization. Each entry is a verbatim excerpt (or near-verbatim with explicit ellipsis) plus a footnote.
- "<quote>" [^n]
- "<data point>" [^n]
## Gaps & Limitations
- <What the corpus could not establish, and why. One bullet per gap.>
- <Any path that was empty, unreadable, or skipped via the denylist — name the path and the reason.>
- <Any subagent that failed or returned empty — name the slice and the reason from its stub file.>
- <Claims that were dropped because the citation would have had to be fabricated.>
This section is never empty. If the scan was clean, include at minimum a bullet naming what follow-up reading would sharpen the picture.
## Sources
[^1]: **Path**: <relative path>
**Excerpt**: "<verbatim quote>"
**Lines**: <optional, omit if absent>
**Heading**: <optional, omit if absent>
**Document type**: <optional, omit if absent>
[^2]: **Path**: <...>
**Excerpt**: "<...>"
[^n]: <...>
```
## Rules
- **Title line** uses the intent verbatim from `plan.md`. When intent is absent, use `Generic extraction of <slug>`.
- **`Documents Found`** lists every included path (read or skimmed). Skipped paths go in `Gaps & Limitations`.
- **`Key Insights`** groups by theme, not by source document. If one theme only came from one document, that is fine — one bullet, one footnote.
- **`User / Market Context`, `Technical Context`, `Raw Detail Worth Preserving`** — never delete these sections. If the corpus has nothing to say, include a single bullet saying so. Prefer `"No <section> content surfaced by this scan."` over synthesized content — an empty placeholder is correct when the corpus is thin; fabricated content is not.
- **`Ideas & Decisions`** preserves rejected ideas with the rationale for rejection, not just accepted ones. This is the feature, not a nice-to-have.
- **`Gaps & Limitations`** is never empty. At minimum, name what follow-up reading would sharpen the answer.
- **`Sources`** uses global numbering — `[^1]` through `[^n]` across the whole document, not per-section. Citation shape per `citation-schema.md`.
## Sourcing discipline
- Every bullet in `Key Insights`, `User / Market Context`, `Technical Context`, and `Ideas & Decisions` that makes a specific factual claim carries a footnote. Broad synthesis statements can stand without a footnote only if they summarize cited claims elsewhere in the document.
- `Raw Detail Worth Preserving` is 100% cited — every entry is an excerpt with a footnote.
- If a claim cannot be cited, either drop it or move it to `Gaps & Limitations` as something the corpus did not establish.
- `Sources` entries match every `[^n]` used in the body. Orphan citations (listed in `Sources` but never referenced) should be removed.
FILE:references/skip-patterns.md
# Default Skip Patterns
The subagent's default denylist. Applied without the caller re-specifying. Every skipped path is recorded under the `paths_skipped` frontmatter field so the audit trail shows exactly what was excluded.
Three groups: sensitive, binary/media, vendor/build.
## Sensitive
Secrets, credentials, and user-level config. Never scanned.
- `.env`, `.env.*`
- `*.pem`, `*.key`, `*.p12`, `*.pfx`, `*.crt`, `*.cer`
- `.aws/`, `.ssh/`, `.gnupg/`
- `credentials.*`, `secrets.*`
- `*.htpasswd`, `*.netrc`
## Binary / media
Formats the first version does not parse. Passed over without an error. `beagle-core:docling` is the future path for richer parsing (PDF / DOCX / PPTX) when a caller needs it.
- Images: `*.png`, `*.jpg`, `*.jpeg`, `*.gif`, `*.webp`, `*.ico`, `*.svg`, `*.bmp`, `*.tiff`
- Documents: `*.pdf`, `*.docx`, `*.doc`, `*.pptx`, `*.ppt`, `*.xlsx`, `*.xls`
- Archives: `*.zip`, `*.tar`, `*.tar.gz`, `*.tgz`, `*.gz`, `*.bz2`, `*.xz`, `*.7z`, `*.rar`
- Audio/video: `*.mp3`, `*.wav`, `*.flac`, `*.ogg`, `*.mp4`, `*.mov`, `*.mkv`, `*.avi`, `*.webm`
- Databases: `*.sqlite`, `*.sqlite3`, `*.db`
- Binaries: `*.exe`, `*.dll`, `*.so`, `*.dylib`, `*.a`, `*.o`, `*.class`, `*.jar`
## Vendor / build
Generated output and vendored dependencies. Noise, not signal.
- `.git/`
- `node_modules/`
- `.venv/`, `venv/`, `env/`
- `__pycache__/`, `*.pyc`
- `dist/`, `build/`, `out/`, `target/`
- `coverage/`, `.coverage`
- `.next/`, `.nuxt/`, `.svelte-kit/`
- `.cache/`, `.parcel-cache/`
- `vendor/`, `Pods/`
- `.DS_Store`, `Thumbs.db`
## Rules
- **Silent skip, audited record.** Subagents do not stop to explain why a path was skipped. Every skipped path lands in `paths_skipped`.
- **Case-insensitive extension matching.** `.ENV` is skipped just like `.env`.
- **Directory patterns match as prefixes.** `node_modules/` skips everything under it.
- **No runtime extension.** The denylist is read-only during a run. A caller that wants to scan a normally-skipped path passes it explicitly in `paths` — the brief records it, and the subagent reads it. (Explicit caller intent overrides the default.)
- **Sensitive list is non-overridable.** `.env`, `*.pem`, `*.key`, and credential files are always skipped even when explicitly passed. If a caller needs those contents, they paste them in directly rather than routing through this skill.
FILE:references/subagent-brief.md
# Subagent Brief Template
The orchestrator builds one brief per slice, mechanically, from `plan.md`. Every subagent gets the same shape so a caller reading `plan.md` can predict what each subagent was told.
Fill the template verbatim — no paraphrasing, no interpretation drift. Subagents return one terse status line to the orchestrator; all findings land in the output file.
## Template
```
You are one of up to 3 parallel artifact-analysis subagents. Scan a single slice of paths and write findings to disk.
Slice: <slice name, copied from plan.md>
Intent: <intent string, verbatim from plan.md — or "generic salient extraction" if absent>
Paths to scan:
- <path 1 from plan.md for this slice>
- <path 2 from plan.md for this slice>
- <...>
What to extract:
- Key insights — structurally important observations, decisions, and themes
- User / market context — users, customers, competition, market data
- Technical context — platforms, constraints, integrations, dependencies
- Ideas & decisions — tagged accepted / rejected / open (preserve rejected ideas with rationale)
- Raw detail worth preserving — specific quotes, metrics, and data points
- Gaps — what the corpus could not establish; empty/unreadable paths
When intent is present, weight extraction toward what is relevant to that intent. When intent is absent, extract anything structurally important without an interpretive filter.
Skim strategy:
- Sharded documents (folder with index.md plus multiple files): read index.md first, then only the sub-files the index points to as relevant.
- Large documents (single file > ~2000 lines or > ~50 pages): read the TOC, executive summary, and section headings first; pull full content only from relevant sections. Record skimmed-vs-read status per file.
- Short documents: read end-to-end.
Skip patterns: apply the default denylist from references/skip-patterns.md (sensitive, binary/media, vendor/build). Record every skipped path under the paths_skipped frontmatter field.
Output path: <output_dir>/findings/<slice-slug>.md
Citation rules: every claim carries a [^n] footnote. Citations use source path (relative to the scanned root when possible) + verbatim excerpt. Include lines and heading only when the subagent naturally has them; never synthesized or guessed. See references/citation-schema.md for the full shape.
Required frontmatter on the output file:
---
status: ok | empty | failed
slice: <slice name>
brief_hash: <hash of this brief, supplied by the orchestrator>
started_at: <ISO timestamp>
finished_at: <ISO timestamp>
paths_read: [<paths read end-to-end>]
paths_skimmed: [<paths where only TOC/index/headings were consumed>]
paths_skipped: [<paths skipped via the default denylist>]
reason: <one line — required when status is empty or failed, omit otherwise>
---
Partial-failure protocol: always write the output file, even if the subagent fails or finds nothing. Use status: failed with a one-line reason on tool errors or exhaustion. Use status: empty with a reason when the slice is legitimately empty ("all paths under this slice were under the skip denylist" or "all files were unreadable"). Never exit without writing the file — absence of the file is treated as a silent failure.
Return: a single status line in this exact shape, and nothing else:
<path-to-findings-file> <status>
Example: /abs/path/findings/planning-folder.md ok
```
## Orchestrator responsibilities
- **Build `<slice-slug>`** by the same rule as the run slug in SKILL.md (lowercase, punctuation stripped, whitespace → hyphens, truncated to 60 chars on a word boundary). Keep it stable across runs so `refresh: true` can match archived prior files.
- **Compute `brief_hash`** over the filled-in brief text before dispatch so the findings file's provenance is verifiable.
- **Guarantee non-overlapping slices.** The orchestrator partitions resolved paths before building briefs; no path appears in more than one slice.
- **Verify every expected file exists** after all subagents return. Any missing file = silent failure per `failure-modes.md`.
- **Never merge findings into one file.** Synthesis happens later, in `report.md`.
## Subagent responsibilities
- **Write the output file even on failure.** Absence is treated as silent context exhaustion.
- **Do not reshape the intent or the slice.** If the brief is wrong, flag it in `reason` and return `status: failed` — do not silently pivot to a different scope.
- **Apply skim strategies.** Do not read a 2000-line file end-to-end when the TOC tells you only three sections are relevant. Record the skim/read decision in the frontmatter path lists.
- **Apply skip patterns silently.** Every skipped path goes in `paths_skipped` — no need to note it in the findings body.
- **No inline returns.** The only thing that crosses back to the orchestrator is the status line. All evidence lives in the findings file.
Use as the follow-up to brainstorm-beagle when a spec has an Open Questions section (or quietly carries latent gaps) that need closing before planning or imp...
---
name: resolve-beagle
description: "Use as the follow-up to brainstorm-beagle when a spec has an Open Questions section (or quietly carries latent gaps) that need closing before planning or implementation can begin. Triggers on: \"resolve the open questions\", \"close the gaps in this spec\", \"research the open items\", \"finalize my spec\", \"make this spec implementation-ready\", \"answer the TBDs\". Also triggers whenever the user points at a brainstorm-beagle spec and asks for research, proposals, or answers to unresolved items. Orchestrates parallel research subagents when available (falls back to inline sequential research otherwise), proposes answers one at a time for user approval, then rewrites the spec in place so it arrives at planning with no known gaps. Does NOT write code, design implementation, or create plans — it only produces a complete spec."
---
# Resolve: Close Spec Gaps
Take a spec produced by `brainstorm-beagle` and close its remaining gaps — both the explicit Open Questions and the latent ones the self-review missed — by researching, proposing answers, and rewriting the spec in place.
The terminal state is a spec with no known open questions and no placeholder requirements. Planning can start immediately after.
<hard_gate>
This skill does not write code, scaffold projects, design architecture, or create implementation plans. It only edits the spec document. "Answering an open question" means proposing a WHAT/WHY answer with rationale — never a HOW. If a question turns out to require implementation design, defer it with a note and move on.
</hard_gate>
## Gates (pass before next step)
Objective pass conditions so steps are not skippable by assertion alone:
1. **Spec located** — The target file path is known and `Read` succeeds (or the user supplied a valid path after you listed 3–5 recent `docs/specs/` candidates).
2. **Gap list published** — One message lists every explicit Open Question bullet **and** each latent gap you will treat as in-scope. **Do not dispatch research** until the user adjusts the list (add/remove/defer) **or** explicitly tells you to proceed with that list.
3. **Research artifact per gap** — Before you present a proposal for gap *G*, you have a structured note for *G*: recommended answer, 1–2 rejected alternatives with reasons, and evidence (file:line, URL, or in-spec citation). No proposal without that artifact.
4. **One proposal queue** — Do not open the next gap’s proposal until the current gap has a clear outcome (accepted, revised wording applied, rejected with what happens next, or deferred with reason).
5. **Rewrite reconciled** — After editing, you read the full spec once and resolve cross-section contradictions; then run the self-review checklist from `../brainstorm-beagle/references/spec-reviewer.md` with failures fixed in-session unless the user opts for a follow-up pass.
6. **Commit** — No `git commit` unless the user answered yes to the commit prompt in § Committing.
## Workflow
1. **Locate the spec** — explicit path from the user, or the most recent file in `docs/specs/`
2. **Extract gaps** — parse Open Questions *and* audit for latent issues (placeholders, vague requirements, missing rationale, contradictions)
3. **Show the gap list** — present everything you plan to close in one summary so the user can add or remove items before research starts
4. **Dispatch research** — one task per gap, in parallel via subagents when available; otherwise sequentially inline
5. **Propose answers** — one proposal at a time, with recommendation + alternatives + evidence
6. **Rewrite the spec in place** — migrate resolved items to the right sections; nothing silently dropped
7. **Self-review** — same checklist `brainstorm-beagle` uses (see `../brainstorm-beagle/references/spec-reviewer.md`)
8. **Ask about committing** — prompt the user whether to commit the edit; don't commit unprompted
```
Locate spec → Extract gaps → Show list ──→ User adds/removes
→ Dispatch research (parallel if possible)
→ Propose answers (one at a time)
→ User decision → next
→ Rewrite spec in place
→ Self-review (fix inline)
→ Ask about committing
```
## Locating the spec
If the user gave a path, use it.
Otherwise, list the 3–5 most recently modified files in `docs/specs/` and ask: "Work on `<most recent>`, or another one?" Don't scan the whole directory tree — specs are top-level per `brainstorm-beagle`'s convention.
If no spec directory exists, ask the user for the path.
## Extracting gaps
Two categories count as gaps:
**Explicit gaps** — every bullet under the spec's `Open Questions` heading is one research task.
**Latent gaps** — issues that slipped past the brainstorm's self-review. Scan the spec for:
| Problem | What it looks like |
|---------|-------------------|
| Placeholder | TBD, TODO, "to be determined", ellipsis used as content |
| Vague requirement | "fast", "simple", "good", "user-friendly", "intuitive" — nothing to verify against |
| Missing rationale | Constraint or Out-of-Scope item with no "why" |
| Contradiction | Requirement conflicts with another requirement, with a constraint, or with Out of Scope |
| Untestable success | No observable way to verify the requirement was met |
| Implementation leakage | A requirement prescribes HOW instead of describing WHAT |
The reason to treat latent gaps as first-class: a spec that says "fast" or "good UX" hasn't been answered just because nothing was explicitly flagged. Planning will trip over those same words. Close them here.
Before dispatching research, show the combined list to the user in one message — "here's what I'm planning to close" — and let them add, remove, or defer items. Don't ask permission one-by-one; that's the proposal step.
## Dispatching research
Each gap gets exactly one research task. Classify each task first — the type determines which tools the research needs:
| Task type | Looks like | Tools |
|-----------|-----------|-------|
| Codebase pattern | "How does the existing `--start-at` pattern work?" "Where is `SKILL_MAP` defined?" | Grep, Glob, Read |
| External / API | "What does the Claude SDK expose for sub-agent spawning?" "Does Codex have hooks?" | WebSearch, WebFetch, Context7 (if available) |
| Design tradeoff | "What should the merged report format be?" "How should deduplication work?" | Reasoning + analogous reference points already in the spec |
| Scope / policy | "Should config/docs files route to a stack or fallback?" | Reasoning tied to the spec's own Core Value and Constraints |
### With subagents (preferred)
When the Task tool (or equivalent subagent tool) is available, dispatch each research task as an independent subagent — **all in the same turn**, so they run in parallel. Each subagent gets its own context window, which matters: gaps often come with large supporting context (the spec, the codebase) that you don't want crammed into a single conversation.
See `references/subagent-prompts.md` for the prompt templates (one per task type).
The research return must be structured: recommended answer, 1–2 alternatives with why rejected, and concrete evidence (file:line citations, URLs, or references to existing spec sections). Cap each return at ~300 words — you want decisions, not transcripts.
### Without subagents
If no subagent tool is available, do the research yourself, one question at a time. Work in cheapest-first order: codebase questions, then external, then tradeoffs, then scope/policy. Produce the same structured proposal for each. This is slower but never leaves gaps unresolved.
## Proposing answers
Present proposals **one at a time**. For each:
- **The gap** — restated in one line
- **Recommended answer** — your best call, with WHY
- **Alternatives** — 1–2 credible options and why they were rejected
- **Evidence** — file:line citations, URLs, or references to decisions already in the spec
The user can accept, revise, or reject. Follow the thread if they want to discuss; move on once decided.
**Order matters.** Resolve in this rough sequence:
1. Codebase-reality gaps first (they may inform design tradeoffs)
2. Policy/scope gaps second (they shape everything downstream)
3. Format/presentation gaps last (they depend on the above)
When a gap can't be resolved without input only the user has (budgets, stakeholder preferences, unstated constraints), don't guess — ask directly. It's cheaper than proposing the wrong answer and unwinding it.
## Rewriting the spec
As decisions land, migrate them to where they belong:
| Gap type | Destination after resolution |
|----------|-----------------------------|
| Architectural or policy decision | New entry under **Key Decisions** (with alternatives considered) |
| Concrete behavior | New entry under **Requirements** (assigned must/should/out-of-scope) |
| Hard limit | New entry under **Constraints** (with rationale) |
| External reference discovered during research | New entry under **Reference Points** |
| Vague requirement replaced | Rewrite the existing requirement inline; do not duplicate |
| Intentionally deferred | Stays in **Open Questions** with a `deferred: <reason>` suffix |
Two rules that matter:
- **Never silently drop an Open Question.** Either resolve it, migrate it with rationale, or explicitly defer it with a reason. Dropping a question without a trail is how specs regress between sessions.
- **Rewrite the whole spec, not just the touched sections.** Changes ripple: a new Key Decision may contradict an old Requirement, a resolved Open Question may invalidate a Constraint. Read the spec end-to-end after edits and reconcile.
## Self-review
Run the checks from `../brainstorm-beagle/references/spec-reviewer.md`:
- No placeholders
- No contradictions
- No implementation leakage
- All requirements testable
- Constraints and Out-of-Scope items have rationale
Fix anything that surfaces inline before handing back to the user. If new gaps appear during the rewrite (they sometimes do), add them to a "new gaps surfaced" list and ask the user whether to resolve them now or leave for a later pass.
## Committing
After the rewrite, summarize and ask:
> "Spec updated. Resolved N explicit questions and M latent gaps. Want me to commit this as `docs: resolve open questions in <topic> spec`?"
If yes, commit. If no, leave the working tree for the user. Do not commit unprompted — the user may want to review the diff first or bundle it with other changes.
## Key Principles
- **Close the loop.** The goal is a spec with nothing unanswered that blocks planning. Half-resolved is worse than clearly deferred.
- **One research task per gap.** Don't blur tasks together — it makes the proposal step harder to review and weakens the evidence trail.
- **Evidence over opinion.** Every proposal cites something — a file, a URL, or an existing spec decision. "I think" is not enough.
- **Still WHAT, not HOW.** Answers land as decisions in the spec, not as implementation designs. If an answer requires implementation thinking, defer it.
- **Defer honestly.** A question too expensive to answer now stays an Open Question with a one-line reason. Better to admit a known gap than paper over it.
- **Parallelize when you can.** Subagents run in isolation — fan out aggressively. Sequential research is the fallback, not the default.
- **Don't interrupt unnecessarily.** Show the gap list once, then only stop the user for decisions that need human judgment.
FILE:references/subagent-prompts.md
# Subagent Prompt Templates
Use these when dispatching research tasks to subagents. One task per subagent. Launch all subagents in a single turn so they run in parallel.
Every template ends with the same **return format** — a structured proposal the orchestrator can drop directly into the spec. Don't let subagents return free-form prose; decisions are the deliverable, not transcripts.
## Return format (shared across all templates)
Every subagent must return exactly this structure, ~300 words max:
```
## Recommended answer
<one-sentence answer, then a short paragraph on WHY>
## Alternatives considered
- <alternative 1> — rejected because <reason>
- <alternative 2> — rejected because <reason>
## Evidence
- <file:line citation, URL, or reference to an existing spec section>
- <additional citations as needed>
## Open sub-questions (if any)
- <anything the subagent couldn't resolve and needs the user or orchestrator to decide>
```
If the subagent cannot produce a recommendation with evidence, it should say so explicitly in "Open sub-questions" rather than guess.
---
## Template: Codebase pattern
Use when the question is about how the existing codebase works — an API, a pattern, a file layout, a convention that the spec references.
```
You are researching one question for an in-progress project spec.
Spec path: <absolute path>
Question: <the exact open question, verbatim>
Your job: read the relevant code and return a structured proposal answering the question. Use Grep, Glob, and Read. Do not modify any files.
Scope hints:
- <file/dir hints from the spec, if any>
- <reference points mentioned in the spec>
Return in this exact format:
## Recommended answer
<one-sentence answer, then a short paragraph on WHY — grounded in what you found>
## Alternatives considered
- <alt 1> — rejected because <reason, citing code>
- <alt 2> — rejected because <reason, citing code>
## Evidence
- <file:line> — <one-line description of what's there>
- <file:line> — <one-line description>
## Open sub-questions
- <anything that needs user input or is blocked>
Cap the return at ~300 words. Cite file paths and line numbers. Decisions, not transcripts.
```
## Template: External / API
Use when the question is about an external system — a framework's capabilities, an SDK's API, a tool's behavior.
```
You are researching one question for an in-progress project spec.
Spec path: <absolute path>
Question: <the exact open question, verbatim>
Your job: use WebSearch, WebFetch, and Context7 (if available) to find authoritative answers — official docs, source repositories, maintainer statements. Avoid blog posts and secondhand summaries.
Focus area: <framework / SDK / tool name>
Why it matters: <one-line context pulled from the spec's Problem Statement or Key Decisions>
Return in this exact format:
## Recommended answer
<one-sentence answer, then a short paragraph on WHY — grounded in what the official source says>
## Alternatives considered
- <alt 1> — rejected because <reason, citing source>
- <alt 2> — rejected because <reason, citing source>
## Evidence
- <URL> — <one-line description of what it says>
- <URL> — <one-line description>
## Open sub-questions
- <anything version-specific, ambiguous, or that needs user confirmation>
Cap the return at ~300 words. Prefer primary sources. Decisions, not transcripts.
```
## Template: Design tradeoff
Use when the question is a product/design judgment call with no single "right answer" in the codebase or external docs — format decisions, deduplication heuristics, UX structure.
```
You are researching one question for an in-progress project spec.
Spec path: <absolute path>
Question: <the exact open question, verbatim>
Your job: reason about the tradeoff and produce a concrete recommendation grounded in the spec's own Core Value, Problem Statement, and Key Decisions. Reference the spec heavily — the right answer usually already lives implicit in the rest of the document.
Do NOT propose implementation details. The answer goes into the spec as a WHAT/WHY decision, not a HOW.
Return in this exact format:
## Recommended answer
<one-sentence answer, then a short paragraph on WHY — explicitly citing which spec section supports the choice>
## Alternatives considered
- <alt 1> — rejected because <reason, ideally citing a spec section it would undermine>
- <alt 2> — rejected because <reason>
## Evidence
- Spec section "<section name>" — <what it says that informs this>
- <additional analogies or reference points if relevant>
## Open sub-questions
- <anything that needs the user's judgment (preferences, budgets, unstated constraints)>
Cap the return at ~300 words. Ground the call in the spec. Decisions, not transcripts.
```
## Template: Scope / policy
Use when the question is about inclusion/exclusion — what routes where, what's covered, what's explicitly out.
```
You are researching one question for an in-progress project spec.
Spec path: <absolute path>
Question: <the exact open question, verbatim>
Your job: recommend the scope/policy answer that best preserves the spec's Core Value and is consistent with its existing Out-of-Scope rationale. Read the spec end-to-end before deciding — scope answers regress fast if you only look at the local question.
Do NOT propose implementation details. The answer goes into the spec as a WHAT/WHY decision.
Return in this exact format:
## Recommended answer
<one-sentence answer, then a short paragraph on WHY — showing it aligns with Core Value and existing Out-of-Scope reasoning>
## Alternatives considered
- <alt 1> — rejected because <reason, ideally showing it conflicts with a stated constraint>
- <alt 2> — rejected because <reason>
## Evidence
- Spec section "Core Value" — <relevant phrase>
- Spec section "Out of Scope" — <relevant rationale>
- <other spec sections as needed>
## Open sub-questions
- <anything that needs user input on future intent>
Cap the return at ~300 words. Consistency with the spec matters more than novelty. Decisions, not transcripts.
```
Reviews Rust macro code for hygiene issues, fragment misuse, compile-time impact, and procedural macro patterns. Use when reviewing macro_rules! definitions,...
---
name: macros-code-review
description: "Reviews Rust macro code for hygiene issues, fragment misuse, compile-time impact, and procedural macro patterns. Use when reviewing macro_rules! definitions, procedural macros, derive macros, or attribute macros."
---
# Macros Code Review
## Review Workflow
1. **Check `Cargo.toml`** -- Note Rust edition (2024 reserves `gen` keyword, affecting macro output), proc-macro crate dependencies (`syn`, `quote`, `proc-macro2`), and feature flags (e.g., `syn` with minimal features)
2. **Check macro type** -- Determine if reviewing declarative (`macro_rules!`), function-like proc macro, attribute macro, or derive macro
3. **Check if a macro is needed** -- If the transformation is type-based, generics are better. Macros are for structural/repetitive code generation that generics cannot express
4. **Scan macro definitions** -- Read full macro bodies including all match arms, not just the invocation site
5. **Check each category** -- Work through the checklist below, loading references as needed
6. **Gates** -- Complete **Gates** below before reporting; do not substitute informal “I verified.”
## Gates (before reporting findings)
Complete in order. **Do not emit findings** until **Gate 4** passes for each issue.
**Gate 1 — Crate context (on disk)**
**PASS when:** You opened the reviewed crate’s `Cargo.toml` (workspace member path if applicable) and recorded Rust `edition`, whether the crate is `proc-macro = true`, and relevant proc-macro dependencies or `syn` / `quote` feature flags.
**Blocks rationalization:** Edition 2024 findings (`gen`, `unsafe extern`, generated `unsafe` bodies) and `syn` “full” vs minimal flags require this — do not flag edition-specific macro output without matching `edition` from the file.
**Gate 2 — Macro definitions read**
**PASS when:** For every macro you critique, you read the full definition (all `macro_rules!` arms, or the proc-macro entry plus helpers you rely on), not only call sites or partial expansions.
**Artifact:** At least one path per macro to the defining `.rs` file(s) you used.
**Gate 3 — Per-finding evidence**
**PASS when:** Each planned issue has `[FILE:LINE]` from the current tree for the macro definition, attribute/derive site, or generated code location you are discussing (not from memory, docs-only, or another branch).
**Gate 4 — Pre-report protocol**
**PASS when:** You loaded and applied `beagle-rust:review-verification-protocol`, including **Macro-Specific Verification** for hygiene, fragment type, and proc-macro performance claims. **Then** add findings.
## Output Format
Report findings as:
```text
[FILE:LINE] ISSUE_TITLE
Severity: Critical | Major | Minor | Informational
Description of the issue and why it matters.
```
## Quick Reference
| Issue Type | Reference |
|------------|-----------|
| Fragment types, repetition, hygiene, declarative patterns | [references/declarative-macros.md](references/declarative-macros.md) |
| Proc macro types, syn/quote, spans, testing | [references/procedural-macros.md](references/procedural-macros.md) |
## Review Checklist
### Declarative Macros (`macro_rules!`)
- [ ] Correct fragment types used (`:expr` vs `:tt` vs `:ident` -- wrong choice causes unexpected parsing)
- [ ] Repetition separators match intended syntax (`,` vs `;` vs none, `*` vs `+`)
- [ ] Trailing comma/semicolon handled (add `$(,)?` or `$(;)?` at end of repetition)
- [ ] Matchers ordered from most specific to least specific (first match wins)
- [ ] No ambiguous expansions -- each metavariable appears in the correct repetition depth in the transcriber
- [ ] Variables defined in the macro use macro-internal names (hygiene protects variables, not types/modules/functions)
- [ ] Exported macros (`#[macro_export]`) use `$crate::` for crate-internal paths, never `crate::` or `self::`
- [ ] Standard library paths use `::core::` and `::alloc::` (not `::std::`) for `no_std` compatibility
- [ ] `compile_error!` used for meaningful error messages on invalid input patterns
- [ ] Macro placement respects textual scoping (defined before use) unless `#[macro_export]`
### Procedural Macros
- [ ] `syn` features minimized (don't enable `full` when `derive` suffices -- reduces compile time)
- [ ] Spans propagated from input tokens to output tokens (errors point to user code, not macro internals)
- [ ] `Span::call_site()` used for identifiers that should be visible to the caller
- [ ] `Span::mixed_site()` used for identifiers private to the macro (matches `macro_rules!` hygiene)
- [ ] Error reporting uses `syn::Error` with proper spans, not `panic!`
- [ ] Multiple errors collected and reported together via `syn::Error::combine`
- [ ] `proc-macro2` used for testing (testable outside of compiler context)
- [ ] Generated code volume is proportionate -- proc macros that emit large amounts of code bloat compile times
### Derive Macros
- [ ] Derivation is obvious -- a developer could guess what it does from the trait name alone
- [ ] Helper attributes (`#[serde(skip)]` style) are documented
- [ ] Trait implementation is correct for all variant shapes (unit, tuple, struct variants)
- [ ] Generated `impl` blocks use fully qualified paths (`::core::`, `$crate::`)
### Attribute Macros
- [ ] Input item is preserved or intentionally transformed (not accidentally dropped)
- [ ] Attribute arguments are validated with clear error messages
- [ ] Test generation patterns (`#[test_case]` style) produce unique test names
- [ ] Framework annotations document what code they generate
### Edition 2024 Awareness
- [ ] Macro output does not use `gen` as an identifier (reserved keyword -- use `r#gen` or rename)
- [ ] Generated `unsafe fn` bodies use explicit `unsafe {}` blocks around unsafe ops
- [ ] Generated `extern` blocks use `unsafe extern`
### Generics vs Macros
Flag a macro when the same result is achievable with generics or trait bounds. Macros are appropriate when:
- The generated code varies structurally (not just by type)
- Repetitive trait impls for many concrete types
- Test batteries with configuration variants
- Compile-time computation that `const fn` cannot express
## Severity Calibration
### Critical (Block Merge)
- Macro generates unsound `unsafe` code
- Hygiene violation in macro that outputs `unsafe` blocks (caller's variables leak into unsafe context)
- Proc macro panics instead of returning `compile_error!` (crashes the compiler)
- Derive macro generates incorrect trait implementation (violates trait contract)
### Major (Should Fix)
- Exported macro uses `crate::` or `self::` instead of `$crate::` (breaks for downstream users)
- Exported macro uses `::std::` instead of `::core::`/`::alloc::` (breaks `no_std` users)
- Wrong fragment type causing unexpected parsing (`:expr` where `:tt` needed, or vice versa)
- Proc macro enables `syn` full features unnecessarily (compile time cost)
- Missing span propagation (errors point to macro definition, not invocation)
- No error handling in proc macro (panics on bad input instead of `compile_error!`)
### Minor (Consider Fixing)
- Missing trailing comma/semicolon tolerance in repetition patterns
- Matcher arms not ordered most-specific-first
- Macro used where generics would be clearer and equally expressive
- Missing `compile_error!` fallback arm for invalid patterns
- Helper attributes undocumented
### Informational (Note Only)
- Suggestions to split complex `macro_rules!` into a proc macro
- Suggestions to reduce generated code volume
- TT munching or push-down accumulation patterns that could be simplified
## Valid Patterns (Do NOT Flag)
- **`macro_rules!` for test batteries** -- Generating repetitive test modules from a list of types/configs
- **`macro_rules!` for trait impls** -- Implementing a trait for many concrete types with identical bodies
- **TT munching** -- Valid advanced pattern for recursive token processing
- **Push-down accumulation** -- Valid pattern for building output incrementally across recursive calls
- **`#[macro_export]` with `$crate`** -- Correct way to make macros usable outside the defining crate
- **`Span::call_site()` for generated functions** -- Intentionally making generated items visible to callers
- **`syn::Error::to_compile_error()`** -- Correct error reporting pattern in proc macros
- **`trybuild` tests for proc macros** -- Standard compile-fail testing approach
- **Attribute macros on test functions** -- Common pattern for test setup/teardown
- **`compile_error!` in impossible match arms** -- Good practice for catching invalid macro input
FILE:references/declarative-macros.md
# Declarative Macros
## Fragment Types
Each fragment type constrains what tokens the matcher will accept. Using the wrong type causes confusing parse errors or overly greedy matching.
| Fragment | Matches | Use When |
|----------|---------|----------|
| `:ident` | Identifier (`foo`, `my_var`) | Naming generated items, variables, or modules |
| `:expr` | Any expression (`x + 1`, `foo()`) | Values to compute or pass to functions |
| `:ty` | Type (`u32`, `Vec<String>`) | Type parameters for generics or trait impls |
| `:tt` | Single token tree (`foo`, `(a + b)`) | Catch-all when other fragments are too restrictive |
| `:path` | Path (`std::io::Error`, `crate::Foo`) | Importing or referencing items |
| `:pat` | Pattern (`Some(x)`, `1..=5`) | Match arms, `let` bindings |
| `:stmt` | Statement (`let x = 1;`) | Injecting statements into generated blocks |
| `:item` | Item (`fn`, `struct`, `impl`) | Generating top-level definitions |
| `:block` | Block (`{ ... }`) | Function bodies, closures |
| `:meta` | Attribute content (`derive(Debug)`) | Forwarding attributes |
| `:literal` | Literal value (`42`, `"hello"`) | Compile-time constants |
| `:vis` | Visibility (`pub`, `pub(crate)`, empty) | Controlling generated item visibility |
| `:lifetime` | Lifetime (`'a`, `'static`) | Lifetime-generic generated code |
### Common Fragment Mistakes
**`:expr` can be broader than intended** -- It matches full expressions, which can make some macro arms too permissive. Prefer narrower fragments like `:tt` when you need stricter syntax boundaries.
**`:ty` cannot be followed by `>`** -- After matching a type, the parser cannot distinguish `>` as closing a generic vs part of an expression. Structure matchers to avoid this ambiguity.
**`:pat` changed in edition 2021+** -- Now matches `|` patterns (e.g., `A | B`). Use `:pat_param` if you need the pre-2021 behavior that stops at `|`.
## Repetition Syntax
```rust
// Zero or more, comma-separated
$($item:expr),*
// One or more, semicolon-separated
$($stmt:stmt);+
// Zero or more with trailing separator tolerance
$($item:expr),* $(,)?
// Nested repetition (for key-value pairs)
$($key:expr => $value:expr),*
```
The separator goes **between** repetitions. To include a terminator **after each** repetition, place it inside the `$()`:
```rust
// Semicolon AFTER each, not between:
$($key:expr => $value:expr;)*
// Expands: key1 => val1; key2 => val2;
```
## Common Patterns
### Test Battery
Generate multiple test modules from a compact specification:
```rust
macro_rules! test_battery {
($($t:ty as $name:ident),* $(,)?) => {
$(
mod $name {
use super::*;
#[test]
fn basic() { run_test::<$t>(Default::default()) }
#[test]
fn edge_case() { run_test::<$t>(edge_value()) }
}
)*
}
}
test_battery!(u8 as u8_tests, u32 as u32_tests, i64 as i64_tests);
```
### Trait Impl for Many Types
```rust
macro_rules! impl_display_for_newtype {
($($t:ty),* $(,)?) => {
$(
impl ::core::fmt::Display for $t {
fn fmt(&self, f: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
::core::fmt::Display::fmt(&self.0, f)
}
}
)*
}
}
```
Note `::core::fmt::Display` -- not `std::fmt::Display`. Exported macros must use fully qualified paths.
### Counting (TT Munching)
Count items at compile time by recursively consuming tokens:
```rust
macro_rules! count {
() => { 0usize };
($head:tt $($tail:tt)*) => { 1usize + count!($($tail)*) };
}
```
### Push-Down Accumulation
Build output incrementally across recursive calls:
```rust
macro_rules! reverse {
([] $($reversed:tt)*) => { ($($reversed)*) };
([$head:tt $($tail:tt)*] $($reversed:tt)*) => {
reverse!([$($tail)*] $head $($reversed)*)
};
}
// reverse!([a b c]) => (c b a)
```
## Hygiene Rules
### What IS Hygienic (Isolated)
Variables declared inside a macro exist in the macro's namespace. They cannot shadow or be shadowed by caller variables with the same name.
```rust
macro_rules! let_x {
($val:expr) => { let x = $val; };
}
let x = 1;
let_x!(2); // This `x` is in the macro's namespace
assert_eq!(x, 1); // Caller's `x` is unchanged
```
### What is NOT Hygienic (Shared)
Types, modules, and functions defined in a macro are visible at the call site. This is by design -- macros commonly generate `impl` blocks, modules, and functions.
```rust
macro_rules! make_greeter {
() => {
fn greet() -> &'static str { "hello" }
};
}
make_greeter!();
assert_eq!(greet(), "hello"); // Function is visible here
```
### Sharing Identifiers with the Caller
Pass identifiers in from the call site to affect caller scope:
```rust
macro_rules! set_var {
($var:ident, $val:expr) => { $var = $val; };
}
let mut x = 1;
set_var!(x, 42); // `x` originated at call site, so it refers to caller's `x`
assert_eq!(x, 42);
```
Any identifier from an `:expr` or `:ident` passed by the caller resolves in the caller's scope.
## Exported Macro Paths
For macros marked `#[macro_export]`, all paths must be absolute and crate-independent:
```rust
// BAD -- breaks for downstream users
macro_rules! bad_macro {
($val:expr) => { crate::MyType::new($val) };
}
// BAD -- breaks in no_std
macro_rules! also_bad {
($val:expr) => { ::std::vec![$val] };
}
// GOOD
#[macro_export]
macro_rules! good_macro {
($val:expr) => { $crate::MyType::new($val) };
}
// GOOD -- no_std compatible
#[macro_export]
macro_rules! good_vec {
($val:expr) => { ::alloc::vec![$val] };
}
```
## Review Checklist
1. Are fragment types appropriate for what they match? (`:expr` not used where `:tt` is safer?)
2. Do repetitions handle trailing separators? (`$(,)?` at end)
3. Are matchers ordered most-specific-first?
4. Do exported macros use `$crate::` and `::core::`/`::alloc::`?
5. Is there a `compile_error!` fallback for invalid patterns?
6. Does the macro output avoid `gen` as an identifier (edition 2024)?
7. Could this macro be replaced with generics?
FILE:references/procedural-macros.md
# Procedural Macros
## Three Types
| Type | Annotation | Behavior |
|------|-----------|----------|
| Function-like | `#[proc_macro]` | `fn(TokenStream) -> TokenStream` -- replaces invocation |
| Attribute | `#[proc_macro_attribute]` | `fn(attr, item) -> TokenStream` -- replaces annotated item |
| Derive | `#[proc_macro_derive(Trait)]` | `fn(TokenStream) -> TokenStream` -- appends after item |
**Attribute macro gotcha:** The return replaces the item entirely. Forgetting to include the original item in the output deletes it silently.
**Derive macro constraint:** Cannot modify the annotated item. Output is appended. Helper attributes (`#[my_helper(skip)]`) are markers consumed by the derive, not independent macros.
## Parsing with `syn`
Minimize features to reduce compile time:
```toml
# BAD -- enables everything, slow to compile
syn = { version = "2", features = ["full"] }
# GOOD -- only what derive macros need
syn = { version = "2", features = ["derive"] }
```
Standard parsing pattern:
```rust
use syn::{parse_macro_input, DeriveInput};
#[proc_macro_derive(MyTrait)]
pub fn derive_my_trait(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as DeriveInput);
let name = &input.ident;
quote! {
impl MyTrait for #name {
fn method(&self) -> &'static str { ::core::stringify!(#name) }
}
}.into()
}
```
## Generating Code with `quote`
Interpolation with `#var` and repetition with `#(#items)*`:
```rust
let field_names: Vec<_> = fields.iter().map(|f| &f.ident).collect();
let tokens = quote! {
impl ::core::fmt::Debug for #name {
fn fmt(&self, f: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
f.debug_struct(::core::stringify!(#name))
#(.field(::core::stringify!(#field_names), &self.#field_names))*
.finish()
}
}
};
```
## Span Handling
Spans tie generated tokens to source locations. Correct spans produce good error messages.
| Span | Resolution | Use For |
|------|------------|---------|
| `Span::call_site()` | At macro invocation | Generated items visible to callers |
| `Span::mixed_site()` | Variables at def site, types at call site | Private variables (matches `macro_rules!` hygiene) |
| `Span::def_site()` | At macro definition | Unstable/nightly only |
Always propagate input spans to related output tokens:
```rust
// BAD -- error points to macro definition
let method = Ident::new("process", Span::call_site());
// GOOD -- error points to the user's field
let method = Ident::new(&format!("process_{}", field_name), field_name.span());
```
## Error Reporting
A proc macro that panics crashes the compiler. Use `syn::Error` with spans instead:
```rust
// BAD -- unhelpful ICE
panic!("MyTrait requires at least one field");
// GOOD -- compiler error pointing to the struct
return syn::Error::new_spanned(&input.ident, "requires at least one field")
.to_compile_error().into();
```
Collect multiple errors with `syn::Error::combine` instead of returning on first failure:
```rust
let mut errors = Vec::new();
for field in &fields {
if !is_valid(field) {
errors.push(syn::Error::new_spanned(field, "invalid field type"));
}
}
if let Some(first) = errors.into_iter().reduce(|mut a, b| { a.combine(b); a }) {
return first.to_compile_error().into();
}
```
## Compile-Time Cost
1. **Dependency weight** -- `syn` with `full` features takes tens of seconds to compile. Minimize features. Compile proc-macro crates in debug mode (execution speed rarely matters).
2. **Generated code volume** -- The macro saves typing, not compiler work. Large `quote!` blocks repeated across many invocations bloat compile times.
Mitigation: minimize `syn` features, use `proc-macro2` for testing, profile with `cargo build --timings`, prefer declarative macros for simpler cases.
## Testing
**`trybuild`** -- Compile-fail tests with expected `.stderr` output:
```rust
#[test]
fn compile_tests() {
let t = trybuild::TestCases::new();
t.pass("tests/pass/*.rs");
t.compile_fail("tests/fail/*.rs");
}
```
**`proc-macro2`** -- Unit tests outside the compiler context. Use `proc_macro2::TokenStream` and compare `to_string()` output.
**`macrotest`** -- Snapshot-based expansion tests. Expands macros and compares against committed snapshots with `macrotest::expand("tests/expand/*.rs")`.
## Common Patterns
### Derive with Helper Attributes
```rust
#[proc_macro_derive(Builder, attributes(builder))]
pub fn derive_builder(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as DeriveInput);
// Look for #[builder(default)] on fields via field.attrs
}
```
### Attribute Macro for Test Generation
```rust
#[proc_macro_attribute]
pub fn test_with_db(_attr: TokenStream, item: TokenStream) -> TokenStream {
let input_fn = parse_macro_input!(item as syn::ItemFn);
let fn_name = &input_fn.sig.ident;
let fn_body = &input_fn.block;
quote! {
#[test]
fn #fn_name() {
let db = setup_test_db();
let result = ::std::panic::catch_unwind(|| #fn_body);
teardown_test_db(db);
if let Err(e) = result { ::std::panic::resume_unwind(e); }
}
}.into()
}
```
## Review Checklist
1. Is `syn` configured with minimal features?
2. Do generated tokens carry spans from input tokens?
3. Does the macro use `syn::Error` (not `panic!`) for invalid input?
4. Are multiple errors collected and reported together?
5. Is the volume of generated code reasonable?
6. Are there `trybuild` or `macrotest` tests?
7. Does generated code use `::core::`/`::alloc::` paths for `no_std` compatibility?
8. Does the attribute macro preserve the input item?
9. Is `Span::call_site()` used only for intentionally public identifiers?
10. Does generated code avoid `gen` as an identifier (edition 2024)?
Reviews Rust FFI code for type safety, memory layout compatibility, string handling, callback patterns, and unsafe boundary correctness. Use when reviewing e...
---
name: ffi-code-review
description: "Reviews Rust FFI code for type safety, memory layout compatibility, string handling, callback patterns, and unsafe boundary correctness. Use when reviewing extern blocks, #[repr(C)] types, bindgen output, or code calling C/C++ libraries."
---
# FFI Code Review
## Review Workflow
1. **Check Cargo.toml** -- Note Rust edition (2024 has breaking changes to extern blocks and unsafe attributes), `build-dependencies` (bindgen, cc, pkg-config), `crate-type` (`cdylib`, `staticlib`), and `links` key
2. **Check build.rs** -- Verify link directives (`cargo:rustc-link-lib`, `cargo:rustc-link-search`), bindgen configuration, and C source compilation
3. **Check extern blocks** -- Verify calling conventions, symbol declarations, and safety annotations
4. **Check type layout** -- Every type crossing FFI must be `#[repr(C)]` or a primitive FFI type
5. **Check string and pointer handling** -- CStr/CString usage, null checks, ownership transfers
6. **Check callbacks** -- `extern "C" fn` pointers, panic safety across FFI boundary
7. **Gates** -- Complete **Gates** below before reporting; do not skip ahead on “internal verification”
## Gates
Complete in order. **Do not emit findings** until **Gate 4** passes for each issue.
**Gate 1 — Crate context (on disk)**
**PASS when:** You opened the reviewed crate’s `Cargo.toml` (workspace member path if applicable) and recorded `edition =`, plus any of `links`, `crate-type`, or `build-dependencies` that matter for this FFI.
**Blocks rationalization:** Edition-specific findings (`unsafe extern "C" {}`, `#[unsafe(no_mangle)]`, etc.) require this — if `edition` is not 2024, do not flag 2024-only requirements.
**Gate 2 — Linkage and binding sources**
**PASS when:** If the crate links native code or uses bindgen/pkg-config, you opened `build.rs` (or the checked-in bindings entry point). If there is no `build.rs`, you stated that bindings are hand-written and reviewed those `extern` / `include!` sites.
**Artifact:** At least one path you opened (e.g. `build.rs`, `src/ffi.rs`, or `OUT_DIR` bindings via `include!`).
**Gate 3 — Code evidence**
**PASS when:** Every planned finding has a target `[FILE:LINE]` from a full function/block you read, not only diff hunks or partial snippets.
**Gate 4 — Pre-report protocol**
**PASS when:** You loaded and applied `beagle-rust:review-verification-protocol`, including **FFI-Specific Verification** for `repr(C)`, safety comments, ownership/callbacks, or bindgen-heavy code.
## Output Format
Report findings as:
```text
[FILE:LINE] ISSUE_TITLE
Severity: Critical | Major | Minor | Informational
Description of the issue and why it matters.
```
## Quick Reference
| Issue Type | Reference |
|------------|-----------|
| C-to-Rust type mapping, repr(C) layout, enums, opaque types | [references/type-mapping.md](references/type-mapping.md) |
| Safe wrappers, ownership transfer, callbacks, build.rs, testing | [references/safety-patterns.md](references/safety-patterns.md) |
## Review Checklist
### extern Blocks and Calling Conventions
- [ ] Foreign function declarations use `extern "C"` (explicit, not bare `extern`)
- [ ] **Edition 2024**: `extern "C" {}` blocks written as `unsafe extern "C" {}`
- [ ] Functions exposed to C use `extern "C" fn` (not default Rust calling convention)
- [ ] Calling convention matches the foreign library (`"C"`, `"system"` for Win32 API)
- [ ] `#[link(name = "...")]` specifies the correct library name
- [ ] `#[link(name = "...", kind = "static")]` used when statically linking
### Symbol Management
- [ ] Exported functions use `#[no_mangle]` to preserve symbol names
- [ ] **Edition 2024**: `#[no_mangle]` written as `#[unsafe(no_mangle)]`
- [ ] **Edition 2024**: `#[export_name = "..."]` written as `#[unsafe(export_name = "...")]`
- [ ] `#[link_name = "..."]` used when Rust name differs from C symbol
- [ ] Exported items are `pub` (only public `#[no_mangle]` symbols appear in library output)
### Type Layout
- [ ] Every struct/union crossing FFI has `#[repr(C)]` -- Rust's default layout is undefined
- [ ] Primitive types use `std::ffi` / `std::os::raw` equivalents (`c_int`, `c_char`, `c_void`)
- [ ] No bare `i32` where C uses `int` -- use `c_int` (width varies by platform)
- [ ] Quirky C types like `__be32` use byte arrays (`[u8; 4]`), not Rust integers
- [ ] Enums crossing FFI use `#[repr(C)]` or `#[repr(u8)]`/`#[repr(u32)]` with explicit discriminants
- [ ] C-style bitflag enums use a newtype around an integer (or `bitflags` crate), not a Rust enum
- [ ] `#[non_exhaustive]` on enums representing C enumerations that may gain new values
### String Handling
- [ ] C strings use `CStr` (borrowed) or `CString` (owned), never `&str` or `String`
- [ ] `CString::new()` result is checked for interior null bytes (returns `Err` on `\0`)
- [ ] `CString` outlives any `*const c_char` pointer derived from it via `.as_ptr()`
- [ ] Incoming `*const c_char` validated with `CStr::from_ptr()` inside `unsafe`
- [ ] No assumption that C strings are valid UTF-8 -- use `to_str()` which returns `Result`
- [ ] OS paths use `OsStr`/`OsString` and `CStr`, not `&str`
### Ownership and Allocation
- [ ] Clear ownership contract: who allocates, who frees
- [ ] Rust-allocated memory freed by Rust (`Box::from_raw`), C-allocated freed by C
- [ ] `Box::into_raw` / `Box::from_raw` paired correctly for heap transfers
- [ ] `Vec::into_raw_parts` used when passing arrays to C (pointer + length + capacity)
- [ ] Destructor functions exposed for every opaque Rust type given to C
- [ ] No `Drop` running on C-allocated memory (and vice versa)
### Callbacks
- [ ] Callback types are `extern "C" fn(...)`, not closures or `fn(...)`
- [ ] Callbacks use `std::panic::catch_unwind` to prevent panics from unwinding across FFI
- [ ] Callback context passed as `*mut c_void` with safe reconstruction at call site
- [ ] `Option<extern "C" fn(...)>` used for nullable function pointers (niche optimization)
### Bindgen and Build Scripts
- [ ] Bindgen output reviewed for correctness (auto-generated types may need adjustment)
- [ ] `-sys` crate pattern used for raw bindings, separate crate for safe wrappers
- [ ] `build.rs` uses `cargo:rustc-link-lib` and `cargo:rustc-link-search` correctly
- [ ] `links` key in `Cargo.toml` prevents duplicate linking of the same native library
- [ ] Platform-specific bindings generated per-build (not checked in for a single platform)
### Safety Documentation
- [ ] Every `unsafe` block has a `// SAFETY:` comment explaining invariants
- [ ] Every public FFI wrapper function documents safety requirements
- [ ] **Edition 2024**: `unsafe fn` bodies use explicit `unsafe {}` blocks around unsafe ops
## Severity Calibration
### Critical (Block Merge)
- Missing `#[repr(C)]` on types crossing FFI boundary (undefined memory layout)
- Wrong string handling: `&str`/`String` where `CStr`/`CString` required
- Ownership confusion: freeing C-allocated memory with Rust's allocator (or vice versa)
- Panic unwinding across FFI boundary without `catch_unwind`
- Using Rust enum for C bitflags (invalid discriminant = undefined behavior)
- Passing closure where `extern "C" fn` pointer required
### Major (Should Fix)
- Missing safety documentation on `unsafe` blocks or public FFI functions
- No null pointer check on incoming `*const T` / `*mut T` before dereferencing
- `CString` dropped before its pointer is used by C (dangling pointer)
- Missing `#[link(name = "...")]` causing link failures on some platforms
- **Edition 2024**: `extern` block not marked `unsafe extern`
- **Edition 2024**: `#[no_mangle]` not wrapped in `#[unsafe(...)]`
### Minor (Consider Fixing)
- Using `i32` instead of `c_int` for C `int` (correct on most platforms but not portable)
- Missing `#[non_exhaustive]` on enums mapping to extensible C enumerations
- Verbose manual bindings where bindgen would be more maintainable
- Checked-in bindings without platform guards
### Informational
- Suggestions to split raw bindings into a `-sys` crate
- Suggestions to add opaque wrapper types for distinct `*mut c_void` pointers
- Suggestions to use `Option<NonNull<T>>` for nullable pointers
## Valid Patterns (Do NOT Flag)
- **`unsafe extern "C" {}` in edition 2024** -- correct form for foreign declarations
- **`#[unsafe(no_mangle)]` in edition 2024** -- correct form for symbol export
- **`Option<extern "C" fn(...)>` for nullable callbacks** -- niche optimization guaranteed
- **`Option<NonNull<T>>` for nullable pointers** -- zero-cost nullable pointer pattern
- **`*mut c_void` for opaque C types** -- standard when internal layout is irrelevant
- **Distinct empty structs wrapping `c_void` for type-safe opaque pointers** -- prevents pointer confusion
- **`CStr::from_bytes_with_nul_unchecked` with compile-time literal** -- safe when literal is known null-terminated
- **`extern "C-unwind"` for controlled unwinding** -- valid per RFC 2945
- **`include!(concat!(env!("OUT_DIR"), "/bindings.rs"))` in bindgen crates** -- standard pattern
- **`Box::into_raw` / `Box::from_raw` pairs for ownership transfer** -- correct pattern when paired
## Before Submitting Findings
Complete **Gates 1-4 in order** before reporting any issue; Gate 4 incorporates `beagle-rust:review-verification-protocol`.
FILE:references/safety-patterns.md
# Safety Patterns
## Wrapping Unsafe FFI in Safe Rust
The goal of FFI bindings is a safe public API built on unsafe internals. The safe wrapper must enforce all invariants that the C library documents.
```rust
// Raw FFI (typically in a -sys crate)
unsafe extern "C" {
fn widget_create() -> *mut Widget;
fn widget_set_name(w: *mut Widget, name: *const c_char) -> c_int;
fn widget_destroy(w: *mut Widget);
}
// Safe wrapper
pub struct Widget {
ptr: NonNull<ffi::Widget>,
}
impl Widget {
pub fn new() -> Result<Self, Error> {
// SAFETY: widget_create returns null on failure, valid pointer otherwise
let ptr = unsafe { ffi::widget_create() };
NonNull::new(ptr).map(|p| Widget { ptr: p }).ok_or(Error::CreateFailed)
}
pub fn set_name(&mut self, name: &str) -> Result<(), Error> {
let c_name = CString::new(name)?;
// SAFETY: self.ptr is valid (maintained by construction),
// c_name is null-terminated and lives through this call
let ret = unsafe { ffi::widget_set_name(self.ptr.as_ptr(), c_name.as_ptr()) };
if ret == 0 { Ok(()) } else { Err(Error::SetNameFailed) }
}
}
impl Drop for Widget {
fn drop(&mut self) {
// SAFETY: self.ptr was allocated by widget_create
// and has not been freed (we own it)
unsafe { ffi::widget_destroy(self.ptr.as_ptr()) }
}
}
```
### Key principles for safe wrappers:
- Capture `&` vs `&mut` accurately -- if C mutates behind a pointer, take `&mut self`
- Use Rust lifetimes to enforce C's lifetime requirements (e.g., `Device<'ctx>` borrows `Context`)
- Do not implement `Send`/`Sync` unless the C library documents thread safety
- Use `PhantomData<*const ()>` to suppress auto-`Send`/`Sync` for thread-unsafe types
## Ownership Transfer Patterns
### Rust-to-C (giving ownership)
```rust
// Give C a heap-allocated Rust object
#[unsafe(no_mangle)]
pub extern "C" fn create_config() -> *mut Config {
Box::into_raw(Box::new(Config::default()))
}
// C must call this to free -- never free with C's free()
#[unsafe(no_mangle)]
pub extern "C" fn destroy_config(ptr: *mut Config) {
if ptr.is_null() { return; }
// SAFETY: ptr was created by create_config via Box::into_raw
// and has not been freed yet (caller contract)
unsafe { drop(Box::from_raw(ptr)) }
}
```
### C-to-Rust (borrowing C memory)
```rust
// C owns the buffer, Rust borrows it
pub fn process_buffer(ptr: *const u8, len: usize) -> Result<(), Error> {
if ptr.is_null() { return Err(Error::NullPointer); }
// SAFETY: caller guarantees ptr is valid for len bytes
// and the memory won't be freed during this call
let slice = unsafe { std::slice::from_raw_parts(ptr, len) };
// ... use slice ...
Ok(())
}
```
### The Golden Rule
Rust-allocated memory must be freed by Rust. C-allocated memory must be freed by C. Never mix allocators.
## CString Lifetime Pitfall
The most common FFI bug: dropping a `CString` while C still holds a pointer to it.
```rust
// BAD -- dangling pointer! CString is dropped at semicolon
let ptr = CString::new("hello").unwrap().as_ptr(); // DANGLING
unsafe { some_c_function(ptr) }; // undefined behavior
// GOOD -- CString lives long enough
let c_str = CString::new("hello").unwrap();
let ptr = c_str.as_ptr();
unsafe { some_c_function(ptr) }; // c_str still alive
// c_str dropped here, after use
```
## Callback Safety
### Preventing Panics Across FFI
A panic unwinding past an `extern "C"` function boundary is undefined behavior. Always catch panics in callbacks:
```rust
extern "C" fn my_callback(data: *mut c_void) -> c_int {
let result = std::panic::catch_unwind(|| {
// SAFETY: data was passed as our context pointer
let ctx = unsafe { &mut *(data as *mut MyContext) };
ctx.handle_event()
});
match result {
Ok(Ok(())) => 0,
Ok(Err(_)) => -1, // application error
Err(_) => -2, // panic caught, turned into error code
}
}
```
### Passing Context Through Callbacks
C callbacks often take a `void*` context parameter. Use `Box::into_raw` to pass Rust state:
```rust
let ctx = Box::new(MyContext::new());
let ctx_ptr = Box::into_raw(ctx) as *mut c_void;
// SAFETY: register_callback stores ctx_ptr and passes it to on_event
unsafe { ffi::register_callback(on_event, ctx_ptr) };
// Later, in cleanup:
// SAFETY: ctx_ptr was created by Box::into_raw above
unsafe { drop(Box::from_raw(ctx_ptr as *mut MyContext)) };
```
## Error Handling Across FFI
Map C error patterns (return codes, errno, out-parameters) to `Result` in safe wrappers:
```rust
pub fn open_file(path: &CStr) -> Result<FileHandle, Error> {
let fd = unsafe { ffi::open(path.as_ptr(), ffi::O_RDONLY) };
if fd < 0 { Err(Error::from_errno(std::io::Error::last_os_error())) }
else { Ok(FileHandle(fd)) }
}
```
## Build.rs Patterns
```rust
// build.rs -- linking, bindgen, and C compilation
fn main() {
// Link directives
println!("cargo:rustc-link-lib=ssl"); // dynamic (default)
println!("cargo:rustc-link-lib=static=mylib"); // static
println!("cargo:rustc-link-search=native=/usr/local/lib");
println!("cargo:rerun-if-changed=wrapper.h");
// Bindgen: generate Rust bindings from C headers
let bindings = bindgen::Builder::default()
.header("wrapper.h")
.parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
.generate().expect("bindgen failed");
let out = PathBuf::from(env::var("OUT_DIR").unwrap());
bindings.write_to_file(out.join("bindings.rs")).unwrap();
// cc crate: compile bundled C source
cc::Build::new().file("src/native/helper.c").compile("helper");
}
```
Review bindgen output for: correct `#[repr(C)]`, pointer mutability matching headers, platform-aware types (`c_long` not `i64`), distinct opaque types, and excluded internal-only C functions.
## Testing FFI Code
Run tests with sanitizers to catch memory bugs invisible to the compiler:
```bash
RUSTFLAGS="-Z sanitizer=address" cargo +nightly test # use-after-free, overflow
RUSTFLAGS="-Z sanitizer=memory" cargo +nightly test # uninitialized reads
valgrind --leak-check=full ./target/debug/my_ffi_tests # leak detection
```
## Common Pitfalls
| Pitfall | Fix |
|---------|-----|
| Dangling `CString` pointer | Bind `CString` to a variable before `.as_ptr()` |
| Double free | One side allocates, same side frees |
| Use-after-free | Ensure wrapper's `Drop` runs at the right time |
| Missing `repr(C)` | Add `#[repr(C)]` to every type crossing FFI |
| Panic across FFI | Wrap callback bodies in `catch_unwind` |
| Thread-unsafe type across threads | Don't impl `Send`/`Sync` without proof |
## Review Questions
1. Is ownership transfer documented and paired (allocate/free)?
2. Do `CString` values outlive their derived pointers?
3. Are callbacks wrapped in `catch_unwind`?
4. Are `Send`/`Sync` deliberately not implemented for thread-unsafe FFI types?
FILE:references/type-mapping.md
# Type Mapping
## C-to-Rust Primitive Type Table
Always use `std::ffi` or `std::os::raw` types for C interop. Never assume `int` is `i32` on all platforms.
| C Type | Rust Type | Notes |
|--------|-----------|-------|
| `int` | `c_int` | Platform-dependent width |
| `unsigned int` | `c_uint` | Platform-dependent width |
| `char` | `c_char` | Signed or unsigned depending on platform |
| `short` | `c_short` | |
| `long` | `c_long` | 32-bit on Windows, 64-bit on LP64 Unix |
| `long long` | `c_longlong` | |
| `float` | `c_float` / `f32` | |
| `double` | `c_double` / `f64` | |
| `size_t` | `usize` | |
| `ssize_t` | `isize` | |
| `void` | `()` (return) / `c_void` (pointer) | |
| `void*` | `*mut c_void` | |
| `const void*` | `*const c_void` | |
| `char*` | `*mut c_char` | |
| `const char*` | `*const c_char` | |
| `bool` / `_Bool` | `bool` | Only with `#[repr(C)]`; C99+ |
| `int8_t` | `i8` | Fixed-width, always safe |
| `uint8_t` | `u8` | Fixed-width, always safe |
| `int32_t` | `i32` | Fixed-width, always safe |
| `uint64_t` | `u64` | Fixed-width, always safe |
| `__be32` | `[u8; 4]` | Big-endian; don't use `i32` |
## String Types
```rust
use std::ffi::{CStr, CString, c_char};
// Borrowing a C string (incoming from C, null-terminated)
// SAFETY: ptr is a valid null-terminated C string
let c_str: &CStr = unsafe { CStr::from_ptr(ptr) };
let rust_str: &str = c_str.to_str()?; // Fails if not UTF-8
// Creating a C string to pass to C
let c_string = CString::new("hello")?; // Err if contains \0
let ptr: *const c_char = c_string.as_ptr();
// c_string MUST outlive ptr -- dropping c_string invalidates ptr
```
For OS-native paths, use `OsStr`/`OsString` which handle platform encoding.
## Struct Layout with repr(C)
`#[repr(C)]` guarantees field ordering, padding, and alignment match the C ABI.
```rust
// C definition:
// struct Point { int32_t x; int32_t y; };
#[repr(C)]
pub struct Point {
pub x: i32,
pub y: i32,
}
// C definition with padding:
// struct Mixed { char tag; int32_t value; };
// Has 3 bytes of padding between tag and value
#[repr(C)]
pub struct Mixed {
pub tag: c_char,
// 3 bytes padding inserted by repr(C) to align value
pub value: i32,
}
```
### Size and Alignment Verification
Always verify layout matches at compile time or in tests:
```rust
// Compile-time assertions
const _: () = assert!(std::mem::size_of::<Point>() == 8);
const _: () = assert!(std::mem::align_of::<Point>() == 4);
// In tests, compare against C sizeof/alignof if available
#[test]
fn layout_matches_c() {
assert_eq!(std::mem::size_of::<Mixed>(), 8); // 1 + 3pad + 4
assert_eq!(std::mem::align_of::<Mixed>(), 4);
}
```
## Enum Representation
### Fieldless Enums (C-style)
```rust
// Maps to: enum Status { OK = 0, ERR = 1, BUSY = 2 };
#[repr(C)]
pub enum Status {
Ok = 0,
Err = 1,
Busy = 2,
}
```
Use `#[repr(u32)]` or `#[repr(i32)]` when C specifies an exact underlying type.
### Bitflag Enums -- Do NOT Use Rust Enums
C enums used as bitflags produce combined values that are invalid Rust enum discriminants. Use a newtype or `bitflags`:
```rust
// BAD -- value 3 (READ | WRITE) is undefined behavior
#[repr(C)]
enum Permission { Read = 1, Write = 2, Execute = 4 }
// GOOD -- newtype with constants
#[repr(transparent)]
pub struct Permission(pub u32);
impl Permission {
pub const READ: Self = Self(1);
pub const WRITE: Self = Self(2);
pub const EXECUTE: Self = Self(4);
}
```
### Data-Carrying Enums (Tagged Unions)
With `#[repr(C)]`, a data-carrying enum becomes a struct with a discriminant field and a union of variant data:
```rust
#[repr(C)]
pub enum Event {
Click(i32, i32), // tag=0, data=(i32, i32)
KeyPress(c_char), // tag=1, data=c_char
}
// Equivalent C: struct { uint32_t tag; union { ... } data; }
```
## Opaque Types
When C exposes a type whose internals Rust should not access, create a distinct empty type:
```rust
// Instead of bare *mut c_void everywhere:
#[non_exhaustive]
#[repr(transparent)]
pub struct DatabaseHandle(c_void);
#[non_exhaustive]
#[repr(transparent)]
pub struct ConnectionHandle(c_void);
unsafe extern "C" {
fn db_open() -> *mut DatabaseHandle;
fn db_connect(db: *mut DatabaseHandle) -> *mut ConnectionHandle;
fn db_close(db: *mut DatabaseHandle);
}
// Now db_connect(conn) is a compile error -- types are distinct
```
The `#[non_exhaustive]` prevents construction outside the defining crate.
## Nullable Pointers and Option
Rust guarantees that `Option<NonNull<T>>`, `Option<&T>`, `Option<&mut T>`, and `Option<extern "C" fn(...)>` have the same layout as a raw pointer (niche optimization). `None` is null.
```rust
use std::ptr::NonNull;
// Nullable pointer from C -- zero overhead
fn from_c(ptr: *mut Foo) -> Option<NonNull<Foo>> {
NonNull::new(ptr)
}
// Nullable callback
type Callback = Option<extern "C" fn(c_int) -> c_int>;
unsafe extern "C" {
// C signature: void register(int (*cb)(int));
// cb can be NULL
fn register(cb: Callback);
}
```
## Function Pointers
Function pointers across FFI must use `extern "C"` calling convention. Rust closures cannot cross FFI.
```rust
type CCallback = extern "C" fn(data: *mut c_void, status: c_int); // Correct
type BadCallback = fn(data: *mut c_void, status: c_int); // Wrong ABI
// Closures (Fn/FnMut/FnOnce) cannot cross FFI -- unknown size and calling convention
```
A function pointer's calling convention is part of its type: `extern "C" fn()` and `fn()` are different types.
create a pull request with standardized description template
--- name: create-pr description: create a pull request with standardized description template disable-model-invocation: true --- # Create Pull Request Create a pull request with a well-structured description based on the branch changes. ## Instructions ### Gates (run in order) Do not draft or run `gh pr create` until each step passes. 1. **Branch gate:** `git branch --show-current` is not the default branch (`main`, `master`, or the repo’s documented default). **Pass:** branch name is printed and satisfies this. 2. **Evidence gate:** You have run the commands in [Gather Context](#1-gather-context) for the same `main..HEAD` (or `origin/main..HEAD` if local `main` is missing) range you will summarize. **Pass:** you can name at least one commit subject and one area of files changed without inventing details. 3. **Template gate:** The final PR title and body contain no unreplaced placeholders (`<...>`, `TODO`, `TBD`). Optional sections with no content are removed, not left as stubs. **Pass:** a quick scan finds no angle-bracket placeholders or filler tokens. 4. **Create gate:** `gh pr create` exits successfully and prints a PR URL (or the PR number/URL from `gh` output). **Pass:** URL (or id) is recorded; if the command fails, do not claim the PR was created. ### 1. Gather Context First, collect information about the changes: ```bash # Get current branch and verify it's not main git branch --show-current # Get commit history for this branch git log --oneline main..HEAD # Get detailed commit messages for context git log --format="### %s%n%n%b" main..HEAD # Get file change statistics git diff --stat main..HEAD # Get the actual diff for understanding changes git diff main..HEAD ``` ### 2. Analyze the Changes Based on the gathered information, determine: - **What changed**: Categorize changes (features, fixes, refactors, docs, tests) - **Why it changed**: Infer motivation from commit messages and code changes - **Impact**: Breaking changes, new dependencies, migrations needed - **Testing**: What tests were added/modified, how to verify manually ### 3. Check for Related Issues Look for issue references: - In commit messages (e.g., "fixes #123", "closes #456") - In branch name (e.g., `fix/issue-123-description`) - In code comments or TODOs addressed ### 4. Generate PR Description Create the PR using this template structure: ```bash gh pr create --title "<type>(<scope>): <description>" --body "$(cat <<'EOF' ## Summary <1-3 sentence overview of what this PR does and why> ## Changes <Categorized bullet list of changes> ### Added - <new features or capabilities> ### Changed - <modifications to existing functionality> ### Fixed - <bug fixes> ### Removed - <deprecated or removed functionality> ## Motivation <Why were these changes needed? What problem does this solve?> ## Testing <How was this tested?> - [ ] Unit tests added/updated - [ ] Integration tests added/updated - [ ] Manual testing performed ### Manual Testing Steps <If applicable, steps to manually verify the changes> ## Breaking Changes <If any, describe what breaks and migration path. Remove section if none.> ## Related Issues <Link to related issues. Remove section if none.> - Closes #<issue_number> - Related to #<issue_number> ## Checklist - [ ] Code follows project style guidelines - [ ] Self-review completed - [ ] Tests pass locally - [ ] Linting passes - [ ] Documentation updated (if needed) --- Generated with [Claude Code](https://claude.com/claude-code) EOF )" ``` ### 5. Title Format Use conventional commit format for the PR title: - `feat(scope): add new feature` - `fix(scope): correct bug behavior` - `refactor(scope): restructure without behavior change` - `docs(scope): update documentation` - `test(scope): add or modify tests` - `chore(scope): maintenance tasks` ### 6. Apply Labels After creating the PR, apply appropriate labels based on the changes. Use `gh pr edit <number> --add-label <label>`. Check the repository's available labels first: ```bash gh label list ``` #### Common Type Labels | Label | When to Use | |-------|-------------| | `enhancement` | New features, capabilities, or improvements | | `bug` | Bug fixes | | `documentation` | Documentation-only changes | | `breaking-change` | **User-facing** breaking changes requiring migration | #### Breaking Change Criteria Only apply `breaking-change` for **user-facing** changes that require users to modify their: - Configuration files - CLI invocations - API integrations Do NOT apply for internal refactors unless they affect external consumers. ### 7. After Creation After creating the PR: 1. Display the PR URL with applied labels 2. Suggest adding reviewers if appropriate 3. Note if any CI checks need to pass ## Guidelines **DO:** - Be specific about what changed and why - Include testing evidence - Link related issues - Note breaking changes prominently - Remove empty optional sections **DON'T:** - Include irrelevant commits (keep PR focused) - Leave placeholder text in the description - Skip the testing section - Create PRs without running local checks first
Execute YAML test plan, stop on first failure, output rich debug prompt
--- description: Execute YAML test plan, stop on first failure, output rich debug prompt name: run-test-plan disable-model-invocation: true --- # Run Test Plan Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output. ## Prerequisites - **agent-browser skill**: Browser tests require the `agent-browser:agent-browser` skill to be available ## Arguments - `--plan <path>`: Path to test plan (default: `docs/testing/test-plan.yaml`) - `--skip-setup`: Skip setup commands and health checks (for re-running after failure) ## Step 1: Parse Test Plan Read and validate the test plan: ```bash # Resolve plan path from --plan (default shown) PLAN_PATH="-docs/testing/test-plan.yaml" # Check file exists ls "$PLAN_PATH" || { echo "Error: Test plan not found: $PLAN_PATH"; exit 1; } # Validate YAML python3 -c "import yaml; yaml.safe_load(open('$PLAN_PATH'))" || { echo "Error: Invalid YAML: $PLAN_PATH"; exit 1; } ``` Extract from the YAML: - `setup.commands`: List of setup commands - `setup.health_checks`: List of URLs to poll - `tests`: Array of test cases ## Step 2: Run Setup (unless --skip-setup) ### 2a. Check Prerequisites If `setup.prerequisites` exists, verify each one: ```bash # For each prerequisite in setup.prerequisites <prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; } ``` ### 2b. Set Environment Variables If `setup.env` exists, export each variable. Variables using `VAR` syntax should be resolved from the current environment: ```bash # For each key/value in setup.env export <key>="<value>" ``` ### 2c. Build If `setup.build` exists, execute build commands sequentially: ```bash # For each command in setup.build <command> || { echo "Build failed: <command>"; exit 1; } ``` ### 2d. Start Services If `setup.services` exists, start long-running processes and wait for health checks: ```bash # For each service in setup.services nohup <service.command> > .beagle/service-<index>.log 2>&1 & echo $! > .beagle/service-<index>.pid ``` For each service with a `health_check`, poll until ready: ```bash timeout=<service.health_check.timeout or 30> url=<service.health_check.url> elapsed=0 while [ $elapsed -lt $timeout ]; do if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then echo "✓ Health check passed: $url" break fi sleep 2 elapsed=$((elapsed + 2)) done if [ $elapsed -ge $timeout ]; then echo "✗ Health check timeout: $url" exit 1 fi ``` ### 2e. Legacy Setup Format If the plan uses the older flat format (`setup.commands` + `setup.health_checks` instead of `prerequisites`/`build`/`services`), fall back to executing `setup.commands` sequentially and polling `setup.health_checks` as before. ## Step 3: Gate — setup ready before tests Do not start **Step 4** until each condition you can check is true: 1. **Plan load:** The file from `--plan` (default `docs/testing/test-plan.yaml`) exists and parses as YAML (same checks as Step 1). 2. **Setup branch:** - If **not** using `--skip-setup`: Every `setup.prerequisites` check that exists exited 0; every `setup.build` command succeeded; every service `health_check` reached HTTP 200, 301, or 302 within its timeout **or** legacy `setup.health_checks` passed after `setup.commands`. - If using `--skip-setup`: Before TC-01, confirm anything the plan still needs is alive—at minimum one successful `curl` (or equivalent) to each URL in `setup.health_checks` or each `setup.services[].health_check.url` that the tests depend on. 3. **Evidence path:** `mkdir -p docs/testing/evidence` succeeds and the directory exists. If any gate fails, stop, fix setup or flags, and do not execute tests. ## Step 4: Execute Tests Sequentially For each test in the plan: ### 4a. Log Test Start ```markdown ## Running: TC-XX - <test.name> Context: <test.context> ``` ### 4b. Execute Steps For each step in `test.steps`, determine the step type and execute accordingly: **Shell commands (`run:` steps):** The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code: ```bash # Execute the command, capture output and exit code <command> 2>&1 echo "EXIT_CODE: $?" ``` Capture all output for evaluation in step 4c. Shell steps cover: - CLI binary invocations (e.g., `./target/debug/myapp status --all`) - Database queries (e.g., `psql "DATABASE_URL" -c "SELECT ..."`) - File inspection (e.g., `ls -la /path/to/expected/output`) - Process lifecycle checks (e.g., `timeout 5 ./myapp 2>&1 || true`) - Any other command a human would type in a terminal **curl actions (`action: curl` steps):** ```bash curl -X <method> \ -H "Content-Type: application/json" \ <additional headers> \ -d '<body>' \ "<url>" \ -o response.json \ -w "%{http_code}" > status_code.txt # Capture response for evaluation cat response.json cat status_code.txt ``` **agent-browser CLI actions:** Steps starting with `agent-browser` are browser automation commands: ```bash # Navigate agent-browser open <url> # Snapshot interactive elements (always do before interacting) agent-browser snapshot -i # Interact using refs from snapshot output (@e1, @e2, etc.) agent-browser fill @<ref> "<value>" agent-browser click @<ref> # Wait for conditions agent-browser wait --url "<pattern>" agent-browser wait --text "<text>" agent-browser wait --load networkidle # Capture evidence agent-browser screenshot docs/testing/evidence/<test.id>.png ``` **Important:** Always run `agent-browser snapshot -i` before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes. Save screenshots to `docs/testing/evidence/<test.id>.png` ### 4c. Evaluate Result **Gate — artifacts before PASS/FAIL:** - **`run:` steps:** Stdout and stderr captured; exit code recorded (e.g. `EXIT_CODE:` line or equivalent). - **`action: curl` steps:** Response body and HTTP status captured to known paths (e.g. `response.json`, `status_code.txt` or paths the plan specifies). - **agent-browser steps:** After any navigation or DOM change, a fresh `agent-browser snapshot -i` exists before asserting; if the test records evidence, the screenshot file path is created or failure is explicit. Then, using agent reasoning, compare actual outcome against `test.expected`: - Read the expected behavior description - Compare with actual response/screenshot - Determine PASS or FAIL ### 4d. On PASS ```markdown ✓ TC-XX PASSED: <test.name> ``` Continue to next test. ### 4e. On FAIL Stop immediately. Go to Step 6. ## Step 5: On All Tests Pass ```markdown ## Test Results: ALL PASSED | ID | Name | Result | |----|------|--------| | TC-01 | <name> | ✓ PASS | | TC-02 | <name> | ✓ PASS | | ... | ... | ... | **Total:** N/N tests passed ### Evidence Screenshots saved to `docs/testing/evidence/` ### Cleanup Stopping background services... ``` Clean up: ```bash # Kill background services for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done ``` ## Step 6: On Failure - Generate Debug Prompt When a test fails, generate rich debug output: ### 6a. Gather Context ```bash # Get changed files relevant to the failure git diff --name-only $(git merge-base HEAD origin/main)..HEAD # Get recent changes in files mentioned in test.context git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files> ``` ### 6b. Output Debug Report ```markdown ## Test Failure: TC-XX - <test.name> ### What Failed **Test:** <test.name> **Expected:** <test.expected> **Actual:** <Describe what actually happened - response code, error message, screenshot description> ### Relevant Changes in This PR <For each file mentioned in test.context or related to the failure:> - `<file>` (lines X-Y) - <brief description of changes> ### Evidence <If screenshot exists:> - Screenshot: `docs/testing/evidence/<test.id>.png` <If API response:> - Status code: <code> - Response body: ```json <response> ``` ### Error Details <If error message in response or logs:> ``` <error message> ``` ### Suggested Investigation <Based on the error, suggest 2-3 specific things to check:> 1. <First thing to check based on error type> 2. <Second thing related to changed files> 3. <Third thing about environment/setup> ### Debug Session Prompt Copy this to start a new Claude session: --- I'm debugging a test failure in branch `<branch>`. **Test:** <test.name> **Error:** <brief error description> <Summarize what the test was checking and what went wrong> Relevant files: <List changed files related to this test> Help me investigate why <specific failure reason>. --- ``` ### 6c. Preserve Evidence ```bash # Ensure evidence directory exists mkdir -p docs/testing/evidence # Save failure context cat > docs/testing/evidence/<test.id>-failure.md << 'EOF' # Failure Report: <test.id> <Full debug report content> EOF ``` ### 6d. Cleanup and Exit ```bash # Kill background services for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done ``` ## Test Results Summary Table Always output a summary table showing progress: ```markdown ## Test Results | ID | Name | Result | |----|------|--------| | TC-01 | <name> | ✓ PASS | | TC-02 | <name> | ✗ FAIL | | TC-03 | <name> | - SKIP | **Passed:** 1/3 **Failed:** TC-02 ``` Tests after a failure are marked as SKIP (not executed). ## Verification Before completing: ```bash # Verify evidence directory exists ls -la docs/testing/evidence/ # List captured evidence ls docs/testing/evidence/*.png docs/testing/evidence/*.md 2>/dev/null ``` **Pass conditions (all must be true to call the run complete):** - **`ls -la docs/testing/evidence/`** exits 0 (directory exists). - Every test that ran has an explicit PASS or FAIL (not only “felt right”). - If any test failed: a failure artifact exists (`docs/testing/evidence/<test.id>-failure.md` or equivalent) **or** the debug report was emitted with expected vs actual. - Cleanup ran: no stale `.beagle/service-*.pid` pointing at live processes you started, unless the plan says to leave them up. **Verification Checklist:** - [ ] Setup commands executed successfully (or `--skip-setup` with live dependencies confirmed) - [ ] Health checks passed before test execution (Step 3 gate) - [ ] Each executed test has recorded result - [ ] Evidence captured in `docs/testing/evidence/` - [ ] On failure: debug prompt includes expected vs actual - [ ] On failure: relevant PR changes listed - [ ] Background processes cleaned up ## Gates (ordered) 1. **Plan valid:** YAML parses; required keys for your branch (`setup` / `tests`) are present. 2. **Setup healthy:** Step 3 pass conditions met before TC-01. 3. **Per-test artifacts:** Step 4c gate satisfied before marking PASS. 4. **Stop on fail:** First failure → Step 6; later tests = SKIP in the summary table. 5. **Cleanup:** Step 5 or Step 6d executed so background PIDs from this run are released. ## Rules - Stop on first test failure (do not continue to other tests) - Always capture evidence (screenshots, responses) - Include file:line references in debug prompts when possible - Use `--skip-setup` flag to re-run after fixing issues - Never hardcode secrets - use environment variables - Clean up background processes even on failure - Preserve failure evidence for debugging - Make debug prompts copy-paste ready for new sessions
Rewrite AI-generated developer text to sound human — fix inflated language, filler, tautological docs, and robotic tone. Use after review-ai-writing identifi...
---
name: humanize-beagle
description: Rewrite AI-generated developer text to sound human — fix inflated language, filler, tautological docs, and robotic tone. Use after review-ai-writing identifies issues.
disable-model-invocation: true
user-invocable: true
dependencies:
- docs-style
- review-ai-writing
---
# Humanize
Apply fixes from a previous `review-ai-writing` run with automatic safe/risky classification.
## Usage
```text
/beagle-docs:humanize-beagle [--dry-run] [--all] [--category <name>]
```
**Flags:**
- `--dry-run` - Show what would be fixed without changing files
- `--all` - Fix entire codebase (runs review with --all first)
- `--category <name>` - Only fix specific category: `content|vocabulary|formatting|communication|filler|code_docs`
## Instructions
### Hard gates
Advance past destructive or evidence-bound steps only when each **PASS** is true (commands and artifacts—not “I checked mentally”):
1. **G1 — Safe to edit files** — **PASS:** `git status --porcelain` is empty, **or** `git stash push -u -m "beagle-docs: pre-humanize backup"` exits 0.
2. **G2 — Review input is real JSON with expected shape** — **PASS:** `.beagle/ai-writing-review.json` exists **and** the file parses as JSON with a `git_head` key and a `findings` value that is an array (possibly empty). Use the `jq -e` command in step 3, or the same checks with `json.load` in Python. If this fails, stop with a parse/validation error—do not apply fixes.
3. **G3 — References before rewrites** — **PASS:** For each finding you will edit, the `references/*.md` files required by step 4 for that category/type are read in this session before you change text.
4. **G4 — Per-file validation** — **PASS:** Every modified file passes the step 8 check for its type; otherwise run `git checkout -- "$file"` for that file and do not list it as OK in the summary.
5. **G5 — Delete review file only on full success** — **PASS:** Run `rm .beagle/ai-writing-review.json` only when G4 holds for all files you are keeping unchanged from validation failures (aligns with step 10).
### 1. Parse Arguments
Extract flags from `$ARGUMENTS`:
- `--dry-run` - Preview mode only
- `--all` - Full codebase scan
- `--category <name>` - Filter to specific category
### 2. Pre-flight Safety Checks
```bash
# Check for uncommitted changes
git status --porcelain
```
If working directory is dirty, warn:
```text
Warning: You have uncommitted changes. Creating a git stash before proceeding.
Run `git stash pop` to restore if needed.
```
Create stash if dirty:
```bash
git stash push -u -m "beagle-docs: pre-humanize backup"
```
**G1 PASS:** Either the working tree was already clean, or the stash command exited 0.
### 3. Load Review Results
Check for existing review file:
```bash
cat .beagle/ai-writing-review.json 2>/dev/null
```
**If file missing:**
- If `--all` flag: Run `/beagle-docs:review-ai-writing --all` first
- Otherwise: Fail with: "No review results found. Run `/beagle-docs:review-ai-writing` first."
**If file exists, validate JSON and freshness (G2):**
```bash
# Required shape: parseable JSON with git_head and findings array (may be empty)
jq -e 'has("git_head") and ((.findings // []) | type == "array")' .beagle/ai-writing-review.json >/dev/null 2>&1 \
|| { echo "Invalid or incompatible ai-writing-review.json"; exit 1; }
# Get stored git HEAD from JSON
stored_head=$(jq -r '.git_head' .beagle/ai-writing-review.json)
current_head=$(git rev-parse HEAD)
if [ "$stored_head" != "$current_head" ]; then
echo "Warning: Review was run at commit $stored_head, but HEAD is now $current_head"
fi
```
If stale, prompt: "Review results are stale. Re-run review? (y/n)"
### 4. Load Reference Material
Read the appropriate reference files based on the findings being fixed:
- Read `references/vocabulary-swaps.md` when applying `ai_vocabulary_high` or `ai_vocabulary_low` fixes
- Read `references/fix-strategies.md` for strategy details and before/after examples for any category
- Read `references/developer-voice.md` for tone/register guidance when rewriting prose
Only load what you need — if fixing only vocabulary, skip the voice guide.
### 5. Filter Findings
If `--category` is set, filter findings to that category only.
Partition remaining findings by `fix_safety`:
**Safe Fixes** (auto-apply):
- `chat_leak` - Delete conversational artifacts
- `cutoff_disclaimer` - Delete knowledge cutoff references
- `filler_phrase` - Delete filler phrases
- `heading_restatement` - Delete restating first sentence
- `emoji_decoration` - Remove emoji from technical text
- `boldface_overuse` - Remove excessive bold formatting
- `ai_vocabulary_high` - Swap high-signal AI words
- `narrating_obvious` - Delete obvious code comments
- `synthetic_opener` - Delete "In today's..." openers
- `sycophantic_tone` - Delete or neutralize praise
- `vague_authority` - Delete unattributed claims
- `excessive_hedging` - Remove qualifiers
- `generic_conclusion` - Delete summary padding
- `copula_avoidance` - Use "is/are" naturally
- `rhetorical_device` - Delete rhetorical questions
- `em_dash_overuse` - Replace formulaic em dashes with commas, parentheses, or colons
- `thematic_break` - Remove horizontal rules before headings
- `title_case_heading` - Convert AI title-case headings to sentence case
- `curly_quotes` - Normalize curly quotes/apostrophes to straight
- `negative_parallelism` - Delete "Not just X, but also Y" filler constructions
- `challenges_and_prospects` - Delete "Despite its... faces challenges..." formulaic wrappers
**Needs Review Fixes** (require confirmation):
- `promotional_language` - Rewrite with specifics
- `formulaic_structure` - Restructure sections
- `synonym_cycling` - Pick consistent term
- `commit_inflation` - Rewrite commit scope
- `tautological_docstring` - Rewrite or delete docstring
- `exhaustive_enumeration` - Trim parameter docs
- `this_noun_verbs` - Rewrite docstring voice
- `ai_vocabulary_low` - Reduce cluster density
- `apologetic_error` - Rewrite error message
- `rule_of_three` - Simplify three-item lists used as filler comprehensiveness
- `inline_header_list` - Restructure boldfaced inline-header vertical lists
- `unnecessary_table` - Convert small tables to prose
- `regression_to_mean` - Restore specific facts replaced by vague praise
### 6. Apply Safe Fixes
If `--dry-run`:
```markdown
## Safe Fixes (would apply automatically)
| # | File | Line | Type | Action |
|---|------|------|------|--------|
| 1 | README.md | 3 | synthetic_opener | Delete "In today's rapidly evolving..." |
| 2 | src/auth.py | 15 | narrating_obvious | Delete "# Check if user exists" |
| 3 | README.md | 42 | ai_vocabulary_high | Replace "utilize" with "use" |
...
```
Otherwise, apply fixes grouped by file to minimize file I/O:
1. Sort findings by file, then by line number (descending, to avoid offset drift)
2. For each file, apply all safe fixes in reverse line order
3. For git artifacts (`git:commit:*`, `git:pr:*`), skip — these can't be auto-fixed. Report them for manual attention.
### 7. Handle Needs Review Fixes
If `--dry-run`, list them:
```markdown
## Needs Review Fixes (would prompt interactively)
| # | File | Line | Type | Original | Suggested |
|---|------|------|------|----------|-----------|
| 4 | README.md | 8 | promotional_language | "powerful, enterprise-grade solution" | "authentication library" |
...
```
Otherwise, for each fix, prompt interactively:
```text
[README.md:8] Promotional language: "powerful, enterprise-grade solution"
Suggested: "authentication library"
(y)es / (n)o / (e)dit / (s)kip all:
```
Track user choices:
- `y` - Apply this fix as suggested
- `n` - Skip this fix
- `e` - User provides custom replacement
- `s` - Skip all remaining interactive fixes
### 8. Validate Results
For each modified markdown file, verify basic validity:
```bash
# Check for broken markdown (unclosed code blocks, broken links)
# Simple check: matching ``` pairs
grep -c '```' "$file" | awk '{print ($1 % 2 == 0) ? "OK" : "WARNING: odd number of code fences"}'
```
For modified source files, check syntax is still valid:
**Python:**
```bash
python3 -c "import ast; ast.parse(open('$file').read())"
```
**TypeScript/JavaScript:**
```bash
npx -y acorn --ecma2020 "$file" > /dev/null 2>&1
```
If validation fails for any file, revert that file:
```bash
git checkout -- "$file"
echo "Reverted $file due to validation failure"
```
### 9. Report Results
```markdown
## Humanize Summary
### Applied Fixes
- [x] README.md:3 - Deleted synthetic opener
- [x] README.md:42 - Replaced "utilize" with "use"
- [x] src/auth.py:15 - Deleted obvious comment
### Interactive Fixes
- [x] README.md:8 - Rewrote promotional language (user approved)
- [ ] docs/guide.md:22 - Skipped by user
### Skipped (Git Artifacts)
- [ ] git:commit:abc1234 - Chat leak in commit message (amend manually)
### Validation
- README.md: OK
- src/auth.py: OK
### Diff Summary
```
```bash
git diff --stat
```
### 10. Cleanup
On successful completion (all validations pass):
```bash
rm .beagle/ai-writing-review.json
```
If any validation fails, keep the file and report:
```text
Review file preserved at .beagle/ai-writing-review.json
Fix issues and re-run, or restore with: git stash pop
```
## Core Principles
1. **Delete first, rewrite second.** Most AI patterns are padding. Removing them improves the text.
2. **Use simple words.** Replace "utilize" with "use", "facilitate" with "help", "implement" with "add".
3. **Keep sentences short.** Break compound sentences. One idea per sentence.
4. **Preserve meaning.** Never change what the text says, only how it says it.
5. **Match the register.** Commit messages are terse. READMEs are conversational. API docs are precise. Read `references/developer-voice.md` for the full register guide.
6. **Don't overcorrect.** A slightly formal sentence is fine. Only fix patterns that read as obviously AI-generated.
7. **Understand regression to the mean.** LLMs produce the most statistically likely output. Specific, unusual facts get replaced with generic, positive descriptions. When humanizing, restore specificity — replace vague praise with concrete details.
8. **Score density, not individual words.** AI vocabulary words co-occur. One or two may be coincidental; a cluster of 3+ is a strong AI tell.
## Example
```bash
# Preview all fixes without applying
/beagle-docs:humanize-beagle --dry-run
# Fix only vocabulary issues
/beagle-docs:humanize-beagle --category vocabulary
# Full codebase scan and fix
/beagle-docs:humanize-beagle --all
# Preview filler fixes only
/beagle-docs:humanize-beagle --category filler --dry-run
```
## Rules
- Always load reference material before applying fixes (step 4); satisfy **G3** per finding
- Never modify files without a clean working tree or a successful stash (**G1**)
- Apply safe fixes in reverse line order to avoid offset drift
- Never auto-fix git artifacts (commits, PRs) — report them for manual action
- Validate every modified file before considering it done (**G4**)
- Revert files that fail validation
- Do not present the step 9 summary as “complete” until step 8 validation has passed for every file you are keeping
- Remove `.beagle/ai-writing-review.json` only after full success (**G5**); if validation failed partway, keep the file and follow step 10
FILE:references/developer-voice.md
# Developer Voice Guidelines
Good developer writing is:
- **Conversational but precise.** Write like you'd explain it to a colleague, but get the details right.
- **Direct.** State opinions. "Use X" not "You might consider using X".
- **Terse where appropriate.** Commit messages and code comments should be short. Don't pad them.
- **Specific.** Replace vague claims with concrete details, numbers, or examples.
- **Consistent.** Pick one term and stick with it. Don't cycle synonyms.
## Register Guide
Match tone and length to the artifact type.
| Artifact | Tone | Length | Example |
|----------|------|--------|---------|
| Commit message | Terse, imperative | 50-72 chars | `fix: prevent nil panic in auth middleware` |
| Code comment | Brief, explains why | 1-2 lines | `// retry once — transient DNS failures are common in k8s` |
| Docstring | Precise, adds value | What the name doesn't tell you | `"""Raises ConnectionError after 3 retries."""` |
| PR description | Structured, factual | Context + what changed + how to test | Bullet points, not paragraphs |
| README | Conversational, scannable | As short as possible | Start with what it does, then how to use it |
| Error message | Actionable, specific | What happened + what to do | `Config file not found at ~/.app/config.yml. Run 'app init' to create one.` |
FILE:references/fix-strategies.md
# Fix Strategies by Category
Strategies for each finding type, organized by category. Each entry includes the fix approach, risk level, and before/after examples.
## Table of Contents
- [Content Patterns](#content-patterns)
- [Vocabulary Patterns](#vocabulary-patterns)
- [Formatting Patterns](#formatting-patterns)
- [Communication Patterns](#communication-patterns)
- [Filler Patterns](#filler-patterns)
- [Code Docs Patterns](#code-docs-patterns)
---
## Content Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Promotional language | Replace superlatives with specifics | Needs review |
| Vague authority | Delete the claim or add a citation | Safe |
| Formulaic structure | Remove the intro/conclusion wrapper | Needs review |
| Synthetic openers | Delete the opener, start with the point | Safe |
| Negative parallelism | Delete the filler construction, keep the point | Safe |
| Rule of three | Simplify to what matters, drop padding items | Needs review |
| Challenges-and-prospects | Delete the formulaic "Despite..." wrapper | Safe |
### Regression to the Mean
LLMs produce the most statistically likely output. Specific, unusual facts get replaced with generic, positive descriptions. When humanizing, restore specificity — replace vague praise with concrete details.
**Before:**
```markdown
In today's rapidly evolving software landscape, authentication is a crucial
component that plays a pivotal role in securing modern applications.
```
**After:**
```markdown
This guide covers authentication setup for the API.
```
### Negative Parallelism
LLMs produce "Not just X, but also Y" and "Not X, but Y" constructions that add words without adding meaning. Delete the construction, keep the point.
**Before:**
```markdown
This is not just a logging library, but also a comprehensive observability
framework that empowers developers to gain valuable insights.
```
**After:**
```markdown
This is a logging library with structured output and trace correlation.
```
### Rule of Three
LLMs overuse three-item lists ("adjective, adjective, adjective") to make superficial analyses appear comprehensive. Keep only items that carry specific meaning.
**Before:**
```markdown
The event features keynote sessions, panel discussions, and networking opportunities.
```
**After (keep only what's specific):**
```markdown
The event includes keynote sessions and breakout workshops.
```
### Challenges-and-Prospects Formula
Rigid formula: "Despite its [positive words], [subject] faces challenges..." ending with vague positive assessment. Delete the wrapper, state the actual limitation with specifics.
**Before:**
```markdown
Despite its robust architecture, the system faces challenges typical of
distributed environments. Despite these challenges, with its strategic
design and ongoing improvements, the platform continues to thrive.
```
**After:**
```markdown
Known limitations: network partitions can cause stale reads for up to 30s.
See the consistency model docs for details.
```
---
## Vocabulary Patterns
| Type | Strategy | Risk |
|------|----------|------|
| High-signal AI words | Direct word swap | Safe |
| Low-signal clusters | Reduce density, keep 1-2 | Needs review |
| Copula avoidance | Use "is/are" naturally | Safe |
| Rhetorical devices | Delete the question, state the fact | Safe |
| Synonym cycling | Pick one term, use it consistently | Needs review |
| Commit inflation | Rewrite to match actual change scope | Needs review |
### Copula Avoidance
LLMs substitute simple "is/are" with elaborate alternatives like "serves as", "stands as", "boasts", "features", "offers". Use the simple form.
**Before:**
```text
feat: Leverage robust caching paradigm to facilitate seamless data retrieval
```
**After:**
```text
feat: add response caching for faster reads
```
See `references/vocabulary-swaps.md` for the complete word swap table.
---
## Formatting Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Boldface overuse | Remove bold from non-key terms | Safe |
| Emoji decoration | Remove emoji from technical content | Safe |
| Heading restatement | Delete the restating sentence | Safe |
| Title case headings | Convert to sentence case | Safe |
| Em dash overuse | Replace with commas, parentheses, or colons | Safe |
| Thematic breaks | Remove horizontal rules before headings | Safe |
| Curly quotes | Normalize to straight quotes/apostrophes | Safe |
| Inline-header lists | Restructure or convert to prose | Needs review |
| Unnecessary tables | Convert small tables to prose | Needs review |
**Boldface overuse — Before:**
```markdown
## Error Handling
**Error handling** is a **critical** aspect of building **reliable** applications.
The `handleError` function **catches** and **processes** all **runtime errors**.
```
**After:**
```markdown
## Error Handling
The `handleError` function catches runtime errors and logs them with context.
```
**Em dash overuse — Before:**
```markdown
The parser — which handles all input formats — validates each field — including nested objects — before returning.
```
**After:**
```markdown
The parser validates each field (including nested objects) before returning. It handles all input formats.
```
**Title case — Before:**
```markdown
## Strategic Negotiations And Global Partnerships
```
**After:**
```markdown
## Strategic negotiations and global partnerships
```
---
## Communication Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Chat leaks | Delete entirely | Safe |
| Cutoff disclaimers | Delete entirely | Safe |
| Sycophantic tone | Delete or neutralize | Safe |
| Apologetic errors | Rewrite as direct error message | Needs review |
**Before:**
```python
# Great implementation! This elegantly handles the edge case.
# As of my last update, this API endpoint supports JSON.
```
**After:**
```python
# Handles the re-entrant edge case from issue #42.
# This endpoint accepts JSON.
```
---
## Filler Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Filler phrases | Delete the phrase | Safe |
| Excessive hedging | Remove qualifiers, state directly | Safe |
| Generic conclusions | Delete the conclusion paragraph | Safe |
**Before:**
```markdown
It's worth noting that the configuration file might potentially need to be
updated. Going forward, this could possibly affect performance.
```
**After:**
```markdown
Update the configuration file. This affects performance.
```
---
## Code Docs Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Tautological docstrings | Delete or add real information | Needs review |
| Narrating obvious code | Delete the comment | Safe |
| "This noun verbs" | Rewrite in active/direct voice | Safe |
| Exhaustive enumeration | Keep only non-obvious params | Needs review |
**Before:**
```python
def get_user(user_id: int) -> User:
"""Get a user.
This method retrieves a user from the database by their ID.
Args:
user_id: The ID of the user to get.
Returns:
User: The user object.
Raises:
ValueError: If the user ID is invalid.
"""
return db.query(User).get(user_id)
```
**After:**
```python
def get_user(user_id: int) -> User:
"""Raises UserNotFound if ID doesn't exist in the database."""
return db.query(User).get(user_id)
```
FILE:references/vocabulary-swaps.md
# Vocabulary Swap Reference
Direct word replacements for high-signal and medium-signal AI vocabulary. Score density, not individual words — a cluster of 3+ AI words in proximity is one of the strongest AI tells.
## High-Signal Words
These words spiked in frequency after 2022 and co-occur in AI-generated text.
| AI Word | Replacement |
|---------|-------------|
| utilize | use |
| leverage (as "use") | use |
| delve | look at, explore, examine |
| facilitate | help, enable, let |
| endeavor | try, work, effort |
| harnessing | using |
| paradigm | approach, model, pattern |
| whilst | while |
| furthermore | also, and |
| moreover | also, and |
| robust (non-technical) | reliable, solid, strong |
| seamless | smooth, easy |
| cutting-edge | modern, latest, new |
| pivotal | important, key |
| elevate | improve |
| empower | let, enable |
| revolutionize | change, improve |
| unleash | release, enable |
| synergy | (delete — rarely means anything) |
| embark | start, begin |
| meticulous/meticulously | careful, thorough |
| intricate/intricacies | complex, details |
| tapestry | (delete or rewrite — never means anything useful) |
| testament | proof, sign, evidence |
| garner | get, earn, attract |
| interplay | interaction, relationship |
| landscape | (delete or use specific noun) |
## Medium-Signal Words
Less distinctive individually, but meaningful in clusters.
| AI Word | Replacement |
|---------|-------------|
| bolstered | supported, strengthened |
| fostering | building, encouraging |
| showcasing | showing |
| underscore | show, highlight |
| enhance | improve |
| crucial | important |
| vibrant | (delete or use specific adjective) |
| nestled | located, in |
| groundbreaking | new, first |
| renowned | well-known, popular |
## Era Context
AI vocabulary shifts across model generations. Words co-occur: where one appears, others cluster nearby.
- **2023-mid 2024 (GPT-4 era):** delve, tapestry, meticulous, intricate, garner, interplay, testament, vibrant
- **Mid 2024-mid 2025 (GPT-4o era):** bolstered, fostering, showcasing, align with, underscore, enhance
- **Mid 2025+ (GPT-5 era):** showcasing, highlighting, emphasizing, enhance (plus notability/attribution words)
## What NOT to Flag
Do NOT treat these as AI indicators (high false-positive rate):
- Perfect grammar alone (many humans write well)
- Formal or academic prose (correlation is with specific words, not formality)
- Transition words alone (only a few specific transitions are AI-overused)
- Mixed casual/formal registers (common in technical fields)
Use when the user has a fuzzy idea and wants to shape it into a concrete project spec before planning or building. Triggers on: "brainstorm this", "I have an...
---
name: brainstorm-beagle
description: "Use when the user has a fuzzy idea and wants to shape it into a concrete project spec before planning or building. Triggers on: \"brainstorm this\", \"I have an idea for...\", \"help me think through this project\", \"what should I build\", \"spec this out\". Also catches vague feature descriptions needing structured questioning to clarify scope. Does NOT write code, plan implementation, review strategy docs, or run strategy interviews \u2014 produces a WHAT/WHY spec through dialogue, not a HOW plan."
---
# Brainstorm: Ideas Into Specs
Turn a fuzzy idea into a comprehensive, implementation-free project spec through collaborative dialogue.
The output is a standalone spec document — structured enough for any agentic system to consume, clear enough for a human to act on. It captures WHAT and WHY, never HOW.
<hard_gate>
Do NOT write any code, create implementation plans, scaffold projects, or take any implementation action. This skill produces a SPEC DOCUMENT only. Every project goes through this process regardless of perceived simplicity — "simple" projects are where unexamined assumptions waste the most work.
</hard_gate>
## Workflow
Complete these steps in order:
1. **Check for a concept brief** — if `.beagle/concepts/<slug>/brief.md` exists for this idea, ingest it and skip most of steps 2-4 (see *Concept brief ingestion* below)
2. **Explore context** — read project files, docs, git history, existing specs (lighter pass if a brief is present)
3. **Assess scope** — is this one spec or does it need decomposition?
4. **Ask clarifying questions** — one at a time, follow the thread (few to none if a brief is present)
5. **Propose 2-3 directions** — high-level product approaches with tradeoffs
6. **Draft spec** — write the structured spec document
7. **Self-review** — check for completeness, contradictions, implementation leakage (see `references/spec-reviewer.md`)
8. **User review** — present for approval, iterate if needed
9. **Write to disk** — save to `.beagle/concepts/<slug>/spec.md`
```
Brief present? ──→ Yes → Ingest brief (skip most discovery) ──┐
──→ No → Explore context → Assess scope │
├─ Too large? → Decompose → Brainstorm first sub-project
└─ Right size? → Clarifying questions ─┘
│
Both paths converge → Propose directions → Draft spec → Self-review (fix inline) → User review
├─ Changes? → Revise
└─ Approved? → Write to concept folder
```
**The terminal state is a written spec.** This skill does not transition to implementation, planning, or any other skill. The user decides what to do with the spec.
## Concept brief ingestion
If the user invokes brainstorm-beagle on a concept that already has `.beagle/concepts/<slug>/brief.md` (produced by `prfaq-beagle` on pass), ingest the brief at step 1 and skip most discovery:
1. **Read the brief.** Customer, problem, solution concept, stakes, forged decisions, and research pointers are already codified. Do not re-interview the user on these.
2. **Skim the PRFAQ reference.** Open `.beagle/concepts/<slug>/prfaq.md` for the Reasoning blocks — they explain what was challenged and why earlier decisions were made. This is context, not content to re-litigate.
3. **Open questions become your starting point.** The brief's *Open Questions* section lists what PRFAQ surfaced but did not close. These are what you ask the user about — not customer, problem, or motivation, which are already decided.
4. **Proceed to Exploring Directions.** Skip Clarifying Questions and Scope Assessment unless the brief is ambiguous about scope itself.
The brief is a context handoff, not a gate. Run your own Self-Review on the spec you produce — brainstorm-beagle remains responsible for implementation-leakage detection, requirement testability, and scope discipline regardless of how much discovery was pre-done upstream.
**When there is no brief:** proceed through steps 2-9 normally. Not every idea comes from PRFAQ.
## Questioning
You are a thinking partner, not an interviewer. The user has a fuzzy idea — your job is to help them sharpen it.
**How to question:**
- **Start open.** Let them dump their mental model. Don't interrupt with structure.
- **Follow energy.** Whatever they emphasized, dig into that. What excited them? What problem sparked this?
- **Challenge vagueness.** Never accept fuzzy answers. "Good" means what? "Users" means who? "Simple" means how?
- **Make the abstract concrete.** "Walk me through using this." "What does that actually look like?"
- **Clarify ambiguity.** "When you say Z, do you mean A or B?"
- **Know when to stop.** When you understand what, why, who, and what done looks like — offer to proceed.
**Question mechanics:**
- One question per message. If a topic needs more, break it into multiple messages.
- Prefer multiple choice when possible — easier to react to concrete options than open-ended prompts.
- When the user selects "other" or wants to explain freely, switch to plain text. Don't force them back into structured choices.
- 2-4 options is ideal. Never use generic categories ("Technical", "Business", "Other").
**What to ask about:**
| Ask about | Examples |
|-----------|----------|
| Motivation | "What prompted this?" "What are you doing today that this replaces?" |
| Concreteness | "Walk me through using this" "Give me an example" |
| Clarification | "When you say X, do you mean A or B?" |
| Success | "How will you know this is working?" "What does done look like?" |
| Boundaries | "What is this explicitly NOT?" |
**What NOT to ask about:**
- Technical implementation details (that's for planning)
- Architecture patterns (that's for planning)
- User's technical skill level (irrelevant — the system builds)
- Success metrics (inferred from the work)
- Canned questions regardless of context ("What's your core value?", "Who are your stakeholders?")
**Background checklist** (check mentally, not out loud):
- [ ] What they're building (concrete enough to explain to a stranger)
- [ ] Why it needs to exist (the problem or desire driving it)
- [ ] Who it's for (even if just themselves)
- [ ] What "done" looks like (observable outcomes)
When all four are clear, offer to proceed. If the user wants to keep exploring, keep going.
## Scope Assessment
Before diving into questions, assess whether the idea is one project or several.
**Signs it needs decomposition:**
- Multiple independent subsystems ("build a platform with chat, file storage, billing, and analytics")
- No clear ordering dependency between parts
- Would take multiple months of work
**When decomposition is needed:**
1. Help the user identify the independent pieces and their relationships
2. Establish what order they should be built
3. Brainstorm the first sub-project through the normal flow
4. Each sub-project gets its own spec
**For right-sized projects**, proceed directly to clarifying questions.
## Exploring Directions
After understanding the idea, propose 2-3 high-level directions. These are product directions, not technical architectures.
**Good directions:**
- "A CLI tool that operates on single files vs. a daemon that watches directories"
- "A focused MVP with just the core loop vs. a broader first version with supporting features"
- "Optimized for speed of use (power users) vs. optimized for discoverability (new users)"
**Bad directions (implementation leaking in):**
- "React with a REST API vs. HTMX with server-side rendering"
- "PostgreSQL vs. SQLite for storage"
- "Monorepo vs. polyrepo"
Lead with your recommendation and explain why. Present tradeoffs conversationally.
## Scope Discipline
Brainstorming naturally generates ideas beyond the current scope. Handle this gracefully:
**When the user expands scope mid-brainstorm:**
> "That's a great idea but it's its own project/phase. I'll capture it in Future Considerations so it's not lost. For now, let's focus on [current scope]."
**The heuristic:** Does this clarify what we're building, or does it add a new capability that could stand on its own?
Capture deferred ideas in the spec's "Future Considerations" section. Don't lose them, don't act on them.
## Implementation Leakage
The spec must never prescribe implementation. This is the hardest discipline.
| Allowed (WHAT) | Not allowed (HOW) |
|-----------------|-------------------|
| "Users can filter results by date and category" | "Add a /api/filter endpoint that accepts query params" |
| "Must support 10k concurrent users" | "Use Redis for session caching" |
| "Data must persist across sessions" | "Store in PostgreSQL with a users table" |
| "Must work offline" | "Use a service worker with IndexedDB" |
| "Search must feel instant" | "Use Elasticsearch with debounced queries" |
**Exception — constraints:** When the user has genuine constraints ("must use PostgreSQL because that's what our infra runs"), those go in the Constraints section with rationale. A constraint is a boundary condition, not a design choice made during brainstorming.
## Spec Format
Use the template in `references/spec-template.md`. The spec has these sections:
1. **Core Value** — ONE sentence, the most important thing
2. **Problem Statement** — what problem, who has it, why now
3. **Requirements** — must have, should have, out of scope (with reasons)
4. **Constraints** — hard limits with rationale
5. **Key Decisions** — decisions made during brainstorming with alternatives considered
6. **Reference Points** — "I want it like X" moments, external docs, inspiration
7. **Open Questions** — unresolved items needing future research
8. **Future Considerations** — ideas that emerged but belong in later phases
Requirements must be concrete and testable:
| Good requirement | Bad requirement |
|-----------------|-----------------|
| "User can undo the last 10 actions" | "Good undo support" |
| "Page loads in under 2 seconds on 3G" | "Fast performance" |
| "Works with screen readers" | "Accessible" |
| "Export to CSV and JSON" | "Multiple export formats" |
## Self-Review
After drafting the spec, review it for:
1. **Placeholders** — any TBD, TODO, vague requirements? Fix them.
2. **Contradictions** — do any sections conflict? Resolve them.
3. **Implementation leakage** — does any requirement prescribe HOW? Rewrite as WHAT.
4. **Untestable requirements** — could someone verify this was met? Make it concrete.
5. **Missing rationale** — do constraints and out-of-scope items explain WHY? Add reasons.
6. **Scope** — is this focused enough for a single planning cycle?
Fix issues inline. Then present to the user for review.
See `references/spec-reviewer.md` for the detailed review checklist.
**Pass before presenting the draft (user review step):** Advance only when every item is honestly **yes** — not “feels fine.”
1. **Template:** The draft follows the section structure in `references/spec-template.md` (or you note deliberate omissions and why).
2. **No honor-system completeness:** Steps 1–6 above are satisfied; unresolved placeholders/TODOs are confined to *Open Questions* (not smuggled into must-haves).
3. **Leakage check:** Every must-have / should-have passes the two-approach test under **Implementation Leakage** in `references/spec-reviewer.md`, except items explicitly listed under *Constraints* with rationale.
4. **Artifact:** The draft text exists in the conversation (or a single attached buffer) so the user is reviewing concrete prose, not a summary.
## Writing the Spec
**Pass before creating or overwriting `spec.md`:** Do not write until both are true.
1. **User gate:** The user explicitly approved the draft **or** directed you to save/write the file (vague enthusiasm alone is not approval — confirm if unclear).
2. **Path gate:** Target path is finalized — default `.beagle/concepts/<slug>/spec.md`, slug resolved (from brief frontmatter or agreed headline).
- **Default path:** `.beagle/concepts/<slug>/spec.md`
- **Slug source:** inherit from `brief.md` frontmatter if a brief was ingested; otherwise derive a kebab-case slug from the concept headline (≤40 chars, no dates). User preferences override the default path.
- **Companion outputs in-session:** if brainstorm-beagle invokes `web-research` or `artifact-analysis` mid-session, pass `output_dir: /abs/path/.beagle/concepts/<slug>/research/` or `/abs/path/.beagle/concepts/<slug>/analysis/` so findings share the concept folder with anything PRFAQ produced upstream. This keeps the whole concept-forging audit trail in one place.
- Commit to git with message: `docs: add <slug> project spec`
- After writing, tell the user:
> "Spec written to `<path>`. Review it and let me know if you want changes."
- Wait for approval before considering the brainstorm complete.
## Key Principles
- **One question at a time** — don't overwhelm
- **Follow the thread** — don't walk a checklist
- **YAGNI ruthlessly** — remove anything that isn't clearly needed
- **Concrete decisions only** — "card-based layout" not "modern and clean"
- **No implementation** — WHAT and WHY, never HOW
- **Capture everything** — ideas outside scope go to Future Considerations, never lost
- **Incremental validation** — confirm understanding before moving on
- **The spec stands alone** — anyone should be able to read it and understand the project
FILE:references/spec-reviewer.md
# Spec Self-Review Checklist
Run this review after drafting the spec. Fix issues inline — don't flag them and move on.
## Review Dimensions
### 1. Completeness
| Check | What to look for |
|-------|-----------------|
| No placeholders | TBD, TODO, "to be determined", empty sections, ellipsis as content |
| Core Value exists | One sentence that resolves prioritization conflicts |
| Problem is concrete | Specific problem, specific people, specific pain — not abstract |
| Requirements are testable | Every requirement can be verified by observation |
| Out of Scope has reasons | Every exclusion explains WHY, not just WHAT |
| Constraints have rationale | Every constraint explains WHY it's a hard limit |
### 2. Consistency
| Check | What to look for |
|-------|-----------------|
| No contradictions | Requirements don't conflict with each other or with constraints |
| Scope alignment | Must-have requirements match the problem statement |
| Decision coherence | Key decisions don't undermine each other |
| Out of Scope respected | Nothing in requirements contradicts an explicit exclusion |
### 3. Implementation Leakage
**This is the most common failure mode.** Scan every requirement for:
| Leaked | Clean |
|--------|-------|
| "Create a REST API endpoint" | "Service exposes data to third-party integrations" |
| "Use WebSocket for real-time" | "Updates appear within 1 second without page refresh" |
| "Store in a relational database" | "Data persists across sessions and survives restarts" |
| "Build a React component" | "User sees a filterable list of results" |
| "Add a cron job" | "Report is generated daily and available by 9am" |
**Exception:** Constraints section may contain implementation-specific limits ("Must use PostgreSQL") when these are genuine external constraints with rationale.
**The test:** Could this requirement be satisfied by two completely different technical approaches? If yes, it's clean. If it implies exactly one approach, it's leaked.
### 4. Testability
Every requirement should pass the "how would you verify this?" test:
- **Good:** "User can undo the last action" — verify: perform action, press undo, confirm reversal
- **Bad:** "Intuitive undo support" — verify: ???
Rewrite any requirement where "verify" isn't obvious.
### 5. Atomicity
Each requirement should be one thing:
- **Bad:** "Users can search, filter, and sort results"
- **Good:** Three separate requirements — search, filter, sort
Compound requirements hide scope and make prioritization impossible.
### 6. Scope
Is this focused enough for a single planning cycle?
- More than 15 must-have requirements? Probably needs decomposition.
- Requirements spanning multiple independent subsystems? Decompose.
- Could you explain the core loop in 30 seconds? If not, it's too broad.
## Calibration
**Only fix issues that would cause real problems downstream.**
A downstream planning system acting on this spec should be able to:
- Understand what to build without asking the user again
- Decompose requirements into tasks
- Know what's in scope and what's not
- Understand the constraints they're working within
Minor wording preferences, stylistic consistency, and "sections that could be more detailed" are not issues. Ambiguity that could lead someone to build the wrong thing IS an issue.
**Approve the spec unless there are serious gaps.** Then present to the user for review.
FILE:references/spec-template.md
# Spec Document Template
Use this template when writing the final spec document. Save to `docs/specs/YYYY-MM-DD-<topic>.md`.
## Template
```markdown
# [Project Name]
**Created:** [YYYY-MM-DD]
**Status:** Ready for planning
## Core Value
[ONE sentence — the single most important thing this project delivers. If everything else fails, this must work. Drives prioritization when tradeoffs arise.]
## Problem Statement
[What problem exists, who has it, and why it matters now. 2-4 sentences. Include what people do today without this (the status quo) and why that's insufficient.]
## Requirements
### Must Have
[Concrete, testable, user-centric. These define v1 — the project isn't done without them.]
- [Requirement — observable, verifiable outcome]
- [Requirement — observable, verifiable outcome]
### Should Have
[Important but not blocking v1. Build these if time allows, defer if not.]
- [Requirement — observable, verifiable outcome]
### Out of Scope
[Explicit exclusions with reasoning. Prevents scope creep and re-litigation.]
- [Exclusion] — [why not]
- [Exclusion] — [why not]
## Constraints
[Hard limits on the solution space. Each must include WHY — constraints without rationale get questioned.]
- **[Type]:** [What] — [Why]
- **[Type]:** [What] — [Why]
Common types: Tech stack, Timeline, Compatibility, Performance, Security, Regulatory, Team
## Key Decisions
[Significant choices made during brainstorming. Captures the reasoning so downstream work doesn't relitigate.]
### [Decision Area]
- **Decision:** [What was decided]
- **Alternatives considered:** [What else was on the table]
- **Rationale:** [Why this choice]
### [Decision Area]
- **Decision:** [What was decided]
- **Alternatives considered:** [What else was on the table]
- **Rationale:** [Why this choice]
## Reference Points
[Inspiration, external docs, "I want it like X" moments, specific behaviors or patterns the user referenced. Not implementation instructions — direction and taste.]
[If none: "No specific references — open to standard approaches."]
## Open Questions
[Unresolved items that need research or future discussion before or during implementation. Flag what's unknown so downstream systems can investigate.]
- [Question — what needs to be figured out and why it matters]
[If none: "No open questions — spec is self-contained."]
## Future Considerations
[Ideas that emerged during brainstorming but belong in later phases or separate projects. Captured so they're not lost, explicitly deferred.]
[If none: "None — brainstorming stayed within scope."]
```
## Guidelines
**Core Value:**
- Rarely more than one sentence
- Not a tagline — a prioritization tool
- Should resolve ambiguity when two requirements compete
**Requirements — quality checklist:**
- Can someone verify this was met? (testable)
- Does it describe what the user experiences? (user-centric)
- Is it one thing, not a bundle? (atomic)
- Could it be interpreted two ways? If so, pick one.
**Requirements — what NOT to write:**
- Implementation tasks disguised as requirements ("Create a REST API")
- Vague qualities ("Good performance", "Modern UI")
- Compound requirements ("Users can search AND filter AND sort")
- Technical jargon the user didn't use
**Constraints vs Key Decisions:**
- Constraint: imposed externally, limits the solution space ("Must run on iOS 16+")
- Key Decision: chosen during brainstorming, shapes the direction ("Mobile-first, desktop secondary")
**Key Decisions — when to record:**
- When 2+ viable options existed and one was chosen
- When the user expressed a strong preference
- When a direction was explicitly rejected (alternatives section)
- When the user said "you decide" — record it as "Claude's discretion" with the area
**Reference Points — good content:**
- "I want the search to feel like Spotlight — instant, forgiving of typos"
- "The onboarding should be like Linear's — minimal, progressive, no tutorial walls"
- External docs the user wants to be followed: `path/to/spec.md` with what it covers
**Future Considerations — the redirect:**
When an idea is captured here during brainstorming, it was explicitly deferred. Note what it is and why it's deferred (too complex for v1, depends on other work, nice-to-have, etc.).
## Examples
### Example 1: Developer Tool
```markdown
# Logwatch — Spec
**Created:** 2026-03-26
**Status:** Ready for planning
## Core Value
Developers see exactly what happened in production without leaving their terminal.
## Problem Statement
When a production incident occurs, developers switch between Grafana, CloudWatch, and Slack to piece together what happened. This context-switching wastes 10-15 minutes per incident and often means important log lines are missed. Logwatch brings structured log tailing and search into the terminal where developers already work.
## Requirements
### Must Have
- User can tail logs from multiple services simultaneously in a split-pane view
- User can search logs by time range, service name, and log level
- User can bookmark a log line and return to it later in the same session
- Log output is syntax-highlighted by log level (error=red, warn=yellow, info=default)
- User can filter to a single service from the multi-service view without losing scroll position
- Connection drops are detected within 5 seconds and auto-reconnected
### Should Have
- User can save a search query and replay it in a future session
- User can export a time range of logs to a file
- User can share a permalink to a specific log line with teammates
### Out of Scope
- Alerting or notification — this is a viewing tool, not a monitoring tool
- Log aggregation or storage — reads from existing infrastructure
- Dashboard creation — Grafana already handles this well
- Mobile support — terminal-first, desktop only
## Constraints
- **Compatibility:** Must work with CloudWatch Logs and Datadog — these are our two log backends
- **Performance:** Must handle 10k log lines/second without dropping frames
- **Distribution:** Single binary, no runtime dependencies — devs install via brew or curl
- **Auth:** Must support SSO via existing Okta setup — no separate credentials
## Key Decisions
### Primary interface
- **Decision:** TUI with vim-style keybindings
- **Alternatives considered:** Web UI, VS Code extension, plain CLI with flags
- **Rationale:** Developers are already in the terminal during incidents. TUI keeps them in flow. Vim bindings match muscle memory for the team.
### Multi-service display
- **Decision:** Vertical split panes, one per service, synchronized by timestamp
- **Alternatives considered:** Interleaved single stream with service prefixes, tabbed view per service
- **Rationale:** Side-by-side makes cross-service correlation visual and immediate. Interleaved gets noisy above 3 services. Tabs hide context.
## Reference Points
- "I want it to feel like `lazygit` — fast, keyboard-driven, just works"
- "The search should be like ripgrep — regex by default, fast enough to feel instant"
- Okta SSO integration spec: `docs/infra/sso-integration.md` (sections 2-3 cover token flow)
## Open Questions
- Can we get sub-second latency from CloudWatch's API, or do we need a local cache layer? Needs benchmarking.
- Datadog's log query API has rate limits — need to verify they're sufficient for real-time tailing.
## Future Considerations
- Alerting integration (pipe to PagerDuty) — separate project, depends on core being stable
- Log annotation ("this line caused the outage") — great idea, v2 feature
- Team sharing of saved queries — requires backend, save for later
```
### Example 2: Minimal Project
```markdown
# Markdown Link Checker — Spec
**Created:** 2026-03-26
**Status:** Ready for planning
## Core Value
Broken links in docs are caught before they reach readers.
## Problem Statement
Our documentation has 200+ external links. We discover broken links when users report them, sometimes weeks after the target page moved. An automated checker in CI catches these before merge.
## Requirements
### Must Have
- Scans all `.md` files in a directory recursively
- Checks HTTP status of external links (follows redirects up to 3 hops)
- Reports broken links with file path, line number, URL, and HTTP status
- Exit code 1 if any links are broken (for CI gating)
- Completes scan of 200 files with 500 links in under 60 seconds
### Should Have
- User can exclude URLs by pattern (regex allowlist)
- User can set a custom timeout per link
### Out of Scope
- Checking anchor fragments (#section-name) — parsing target HTML is too complex for v1
- Fixing broken links — detection only
- Checking internal cross-file links — separate concern
## Constraints
- **CI:** Must run in GitHub Actions — no external service dependencies
- **Language:** Team prefers Go for CLI tools — aligns with existing tooling
## Key Decisions
### Concurrency model
- **Decision:** Concurrent link checking with configurable parallelism
- **Alternatives considered:** Sequential checking (simple but slow), unbounded parallelism (risks rate limiting)
- **Rationale:** Need to finish in 60s but also avoid hammering external servers. Default 10 concurrent, configurable via flag.
## Reference Points
- "Like `markdown-link-check` npm package but faster and no Node dependency"
## Open Questions
- Should rate-limited responses (429) be treated as broken or retried? Need to decide retry policy.
## Future Considerations
- Anchor fragment checking — worth a follow-up if teams request it
- Cache layer for recently-checked URLs — would speed up repeated CI runs
```
Analyze repo, detect stack, trace changes to user-facing entry points, generate E2E YAML test plan
---
description: Analyze repo, detect stack, trace changes to user-facing entry points, generate E2E YAML test plan
name: gen-test-plan
disable-model-invocation: true
---
# Generate Test Plan
Analyze the repository's tech stack, branch changes vs default, and generate an executable YAML test plan focused on user-facing impact.
**This is an E2E test plan — not an automated test wrapper.** The generated plan will be executed by an autonomous agent acting exactly as a human QA tester would: launching real binaries, hitting real endpoints, interacting with real databases, and verifying real observable behavior.
## Critical Rule: No Automated Test Duplication
**NEVER generate test steps that re-run the project's existing automated test suite.** This means:
- No `cargo test`, `pytest`, `npm test`, `go test`, `mix test`, or equivalent commands as test steps
- No wrapping unit/integration test modules in a test case
- No "run the tests and check they pass" — that's CI's job, not QA's
If you find yourself writing a test step that invokes the project's test runner, **stop and rethink**. Ask: "What would a human tester do to verify this feature works?" The answer is never "run the unit tests."
**What E2E test steps look like:**
- Build the binary and run it with real arguments, check stdout/stderr/exit code
- Start a server and hit it with curl
- Run a CLI command that writes to a real database, then query the database to verify
- Launch the TUI and verify it renders (via screenshot or process lifecycle)
- Chain multiple commands that exercise a full user workflow end-to-end
## Hard gates
Complete these **in order**. Do not advance to the next gate until its **Pass** condition is met (each pass should leave retrievable evidence: pasted command output, a written list, or the generated file on disk). **Scheduling:** Gate **1** before Step **2**; Gate **2** before Step **5**; Gate **3** before Step **7**; Gates **4–5** during **Step 8** (after the Step 7 summary).
1. **Diff and base pinned (after Step 1)** — Resolve the base branch from `--base` when provided, otherwise use the repo default (`main` or `master` per Step 1). Compare `HEAD` to `$(git merge-base HEAD origin/<base_branch>)` (or equivalent if the remote ref differs). **Pass:** You record `current_branch`, `base_branch`, the merge-base SHA or range used, and `changed_files` from `git diff --name-only <merge-base>..HEAD` (empty list allowed if you paste or quote that output and state “no file changes vs base”).
2. **Trace complete (after Step 4)** — **Pass:** Every affected entry point you will test has a **Core functionality** vs **Configuration/admin** classification, and the Step 4 requirement holds: at least one test targets a core entry point **or** you document why that is impossible and flag manual review.
3. **Plan file valid (after Step 6, before Step 7)** — **Pass:** `docs/testing/test-plan.yaml` exists and the following command exits 0 (parses the YAML **and** asserts all four top-level keys are present — a single `grep -E` with alternations would pass on any one match, so do not substitute it):
```bash
python3 -c "import sys, yaml; d = yaml.safe_load(open('docs/testing/test-plan.yaml')) or {}; missing = [k for k in ('version', 'metadata', 'setup', 'tests') if k not in d]; sys.exit('Missing keys: ' + ', '.join(missing) if missing else 0)"
```
4. **No automated-test duplication (Step 8)** — **Pass:** Every `run:` step and every `services:` `command:` is scanned for project test runners (`cargo test`, `pytest`, `npm test`, `go test`, `mix test`, `jest`, `vitest`, `mocha`, etc.); **zero** invocations. If any appear, remove or replace them with real E2E actions and re-run Gate 3.
5. **Behavioral coverage (Step 8)** — **Pass:** Re-read `metadata.changes_summary` and recent commit messages; at least one test’s `context`/`steps` exercises the primary user-visible behavior they describe. If they describe a capability (e.g., a new provider) but no step invokes it, add that test or fail verification.
## Arguments
- `--base <branch>`: Base branch to diff against (default: `main`)
- Path: Target directory (default: current working directory)
## Step 1: Gather Repository Context
```bash
# Get current branch
git rev-parse --abbrev-ref HEAD
# Resolve base branch: use --base if supplied, otherwise default (main → master)
BASE_BRANCH="-$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo main || echo master)"
MERGE_BASE="$(git merge-base HEAD "origin/BASE_BRANCH")"
# Get changed files vs base
git diff --name-only "MERGE_BASE"..HEAD
# Get commit messages for context
git log --oneline "MERGE_BASE"..HEAD
```
**Capture:**
- `current_branch`: Branch name
- `base_branch`: Default branch to compare against
- `changed_files`: List of modified files
- `commit_messages`: What the PR is about
## Step 2: Detect Tech Stack
See [references/stack-discovery.md](references/stack-discovery.md) for stack detection commands, entrypoint discovery, port discovery, and trace rules.
## Step 3: Discover User-Facing Entry Points
A "user-facing entry point" is anything a human interacts with: CLI subcommands, HTTP endpoints, UI routes, TUI screens, gRPC services, database migrations, or configuration files that affect runtime behavior.
### CLI Applications (Rust/clap, Python/argparse/click, Go/cobra)
```bash
# Rust (clap) — look for Subcommand derives and command enums
grep -rn "Subcommand\|#\[command\]" --include="*.rs" | head -20
# Python (click/typer/argparse)
grep -rn "@click.command\|@app.command\|add_parser\|add_subparser" --include="*.py" | head -20
# Go (cobra)
grep -rn "cobra.Command\|AddCommand" --include="*.go" | head -20
```
Build a map of:
- CLI subcommands: command name + description + file:line
- Required arguments and flags per subcommand
- Environment variables the binary reads (grep for `env`, `std::env::var`, `os.Getenv`, `os.environ`)
### HTTP/API Services
**Python (FastAPI/Flask):**
```bash
grep -rn "@app\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
grep -rn "@router\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
```
**Node.js (Express/Fastify):**
```bash
grep -rn "app\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
grep -rn "router\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
```
**Rust (axum/actix/rocket):**
```bash
grep -rn "Router::new\|\.route(\|#\[get\]\|#\[post\]\|HttpServer" --include="*.rs" | head -20
```
**Go (net/http, gin, chi):**
```bash
grep -rn "http.HandleFunc\|r.GET\|r.POST\|router.Get\|router.Post" --include="*.go" | head -20
```
**Elixir (Phoenix):**
```bash
grep -rn "get \"/\|post \"/\|pipe_through\|live \"/\|scope \"/\"" --include="*.ex" | head -20
```
### Browser UI Routes
```bash
grep -rn "createBrowserRouter\|<Route\|path=" --include="*.tsx" --include="*.jsx" | head -20
```
### Database and Migrations
```bash
# SQL migrations
ls migrations/ db/migrate/ priv/repo/migrations/ 2>/dev/null
# Schema files
ls schema.sql schema.prisma 2>/dev/null
```
### Build a consolidated map of:
- CLI subcommands: name + args + file:line
- API endpoints: method + path + file:line
- UI routes: path + component + file:line
- Database migrations: filename + what they create/alter
- Configuration: env vars and config files that affect behavior
## Step 4: Trace Changes to Entry Points
For each changed file, determine if it affects user-facing functionality:
1. **Direct entry point change** — File contains route definitions
2. **Import chain analysis** — Find what imports the changed file and trace up to entry points
3. **Architecture-aware tracing** — Read the project's CLAUDE.md, README, or architecture docs to understand data flow and module relationships, rather than relying solely on grep
4. **Document the trace path** in test context
### Import Chain Analysis by Ecosystem
```bash
# Rust — use/mod/crate references and workspace deps
grep -rn "use.*<crate>\|mod <module>" --include="*.rs"
grep -rn "<crate-name>" --include="Cargo.toml"
# Python — from/import
grep -rn "from.*<module>\|import.*<module>" --include="*.py"
# TypeScript/JavaScript — import/require
grep -rn "from.*<module>\|require.*<module>" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx"
# Elixir — alias/import/use
grep -rn "alias.*<Module>\|import.*<Module>\|use.*<Module>" --include="*.ex" --include="*.exs"
# Go — package references
grep -rn "<package>\." --include="*.go"
```
If the ecosystem is not covered above, or grep results are inconclusive, read the project's CLAUDE.md, README, or architecture docs to understand the module graph and trace the data flow from changed files to user-facing entry points.
### Classify Affected Entry Points
After identifying all affected entry points, classify each one:
| Category | Description | Examples | Priority |
|----------|-------------|----------|----------|
| **Core functionality** | Entry points where the feature does its actual work for the end user | Chat endpoint, API action, data processing pipeline, generation flow | **High — test first** |
| **Configuration/admin** | Entry points where the feature is set up, toggled, or configured | Settings page, admin dashboard, preference toggles, dropdown selections | Lower — test after core |
**Classification rules:**
- Ask: "If a user wanted to *use* this feature (not configure it), which entry point would they interact with?" — that's core functionality
- A settings page that adds a new dropdown option is configuration; the endpoint that actually *uses* that option is core functionality
- The same changed file (e.g., a new provider module) may affect both a settings page and a functional endpoint — both must be traced
**Requirement:** At least one test must target a core functionality entry point before generating configuration/admin tests. If no core functionality entry point can be identified, explicitly document why and flag this for manual review.
**Output:**
For each affected entry point, document:
- Which changed files affect it
- The import/dependency chain
- **Classification:** Core functionality or Configuration/admin
- Why this entry point needs testing
## Step 5: Generate Test Cases
See [references/test-case-generation.md](references/test-case-generation.md) for the detailed API/browser templates, prioritization rules, and test-case guidelines.
## Step 6: Write YAML Test Plan
Create the test plan file:
```bash
mkdir -p docs/testing
```
Write to `docs/testing/test-plan.yaml`:
```yaml
version: 1
metadata:
branch: <current_branch>
base: <base_branch>
generated: <ISO timestamp>
changes_summary: |
<Summary of what this PR changes based on commit messages and diff>
setup:
stack:
- type: <rust|node|python|go|elixir|docker>
package_manager: <cargo|pnpm|npm|yarn|uv|poetry|mix|none>
prerequisites:
# Services or infrastructure the tests need running
- name: <e.g., PostgreSQL>
check: <command to verify it's available, e.g., "pg_isready -h localhost">
build:
# Commands to build the project artifacts (binaries, assets, etc.)
- <build command, e.g., "cargo build --workspace">
services:
# Long-running processes to start before tests (servers, watchers, etc.)
# Omit if the project is a CLI tool or library with no server component
- command: <start command>
health_check:
url: http://localhost:<port>/health
timeout: 30
env:
# Environment variables needed by tests (use VAR for secrets)
DATABASE_URL: "DATABASE_URL"
tests:
# CLI test example — run the built binary with real arguments:
- id: TC-01
name: <CLI test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: <command that a human would type in their terminal>
- run: <follow-up command to verify the effect>
expected: |
<Expected behavior: exit code, stdout content, side effects>
# API test example:
- id: TC-02
name: <API test name>
context: |
<Why this test exists, which changes affect it>
steps:
- action: curl
method: GET
url: http://localhost:<port>/<path>
expected: |
<Expected behavior in natural language>
# Database verification example:
- id: TC-03
name: <Database test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: <command that writes to the database>
- run: psql "DATABASE_URL" -c "SELECT ... FROM ... WHERE ..."
expected: |
<Expected rows, schema state, or migration effect>
# Browser test example (always use agent-browser CLI commands):
- id: TC-04
name: <UI test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: agent-browser open http://localhost:<port>/<path>
- run: agent-browser snapshot -i
- run: agent-browser click @<ref>
- run: agent-browser snapshot -i
- run: agent-browser screenshot evidence/tc-04.png
expected: |
<Expected behavior in natural language>
evidence:
screenshot: evidence/tc-04.png
```
## Step 7: Report Summary
After generating the test plan:
```markdown
## Test Plan Generated
**File:** `docs/testing/test-plan.yaml`
**Branch:** <current_branch> → <base_branch>
### Detected Stack
| Component | Type | Port |
|-----------|------|------|
| <component> | <type> | <port> |
### Tests Generated
| ID | Name | Type | Affected By |
|----|------|------|-------------|
| TC-01 | <name> | curl/browser | <files> |
### Entry Point Coverage
- **Covered:** <N> entry points with tests
- **Unchanged:** <M> entry points not affected by this PR
### Next Steps
1. Review the generated test plan at `docs/testing/test-plan.yaml`
2. Adjust test values and expectations as needed
3. Run tests with:
```
/beagle-testing:run-test-plan
```
```
## Step 8: Verification
Confirm **Hard gates** 1–5 are satisfied with evidence (see **Hard gates** above) before treating the plan as complete. Then run:
```bash
# Verify file was created
ls -la docs/testing/test-plan.yaml
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" && echo "Valid YAML"
# Check required fields
grep -E "^version:|^metadata:|^setup:|^tests:" docs/testing/test-plan.yaml
```
**Verification Checklist:**
- [ ] Test plan file created at `docs/testing/test-plan.yaml`
- [ ] YAML is syntactically valid
- [ ] At least one test case generated
- [ ] Setup commands match detected stack
- [ ] Each test has id, name, steps, and expected fields
- [ ] **No automated test duplication:** Grep every `run:` and `command:` step in the plan for test runner invocations (`cargo test`, `pytest`, `npm test`, `go test`, `mix test`, `jest`, `vitest`, `mocha`, etc.). If ANY step invokes the project's test runner, the plan **fails verification**. Remove those steps and replace them with real E2E actions.
- [ ] **Behavioral coverage:** At least one test exercises the primary behavioral change described in `changes_summary`. Re-read the `changes_summary` and commit messages — if they describe a capability (e.g., "adds Claude Code as a new LLM provider") but no test invokes that capability (e.g., sends a message through the provider), the plan fails verification. Add the missing core functionality test before completing.
- [ ] **No config-only plans:** If all tests target configuration/admin entry points and zero tests target core functionality entry points, the plan is incomplete. Go back to Step 4, identify the core functionality entry points, and add tests for them.
## Rules
- **E2E only** — every test step must exercise the real built artifact (binary, server, UI) as a human would. Never wrap automated test suites.
- Always create `docs/testing/` directory if it doesn't exist
- Generate at least one test per affected entry point
- Include context explaining why each test matters (trace from changes)
- Use natural language for `expected` field (agent will interpret)
- **CLI projects:** Test steps should invoke the actual binary with real arguments and verify stdout, stderr, exit codes, and side effects (files created, database rows written, processes spawned)
- **Server projects:** Start the server in setup, test via curl/agent-browser
- **Library-only projects with no binary or server:** If the change is purely internal library code with no user-facing entry point (no CLI, no server, no UI), state this explicitly and generate tests that exercise the library through its public API via a small driver script — not by running the test suite
- Default to conservative port detection (8000 for API, 5173/3000 for frontend)
- **Browser automation steps MUST use `agent-browser` CLI commands** (e.g., `agent-browser open`, `agent-browser snapshot -i`, `agent-browser click @ref`) — never use abstract action syntax
- Always `agent-browser snapshot -i` before interacting with elements and after navigation/DOM changes
- Use `agent-browser screenshot <path>` to capture evidence for browser tests
- Use `ENV_VAR` syntax for secrets, never hardcode credentials
- If no user-facing changes detected, explain why and suggest manual verification
FILE:references/stack-discovery.md
# Tech Stack and Entry Point Discovery
This reference keeps the detailed stack-detection and entry-point tracing logic that would otherwise make `SKILL.md` too long.
## Step 2: Detect Tech Stack
Scan for project configuration files to determine the stack:
```bash
# Rust detection
ls Cargo.toml Cargo.lock 2>/dev/null
# Elixir detection
ls mix.exs mix.lock 2>/dev/null
# Node.js detection
ls package.json pnpm-lock.yaml package-lock.json yarn.lock 2>/dev/null
# Python detection
ls pyproject.toml requirements.txt setup.py 2>/dev/null
ls uv.lock poetry.lock 2>/dev/null
# Go detection
ls go.mod 2>/dev/null
# Docker detection
ls docker-compose.yml docker-compose.yaml Dockerfile 2>/dev/null
# Makefile detection
ls Makefile 2>/dev/null && grep -q "dev:" Makefile && echo "has-dev-target"
# Database detection
ls migrations/ db/migrate/ priv/repo/migrations/ 2>/dev/null
grep -rl "DATABASE_URL\|postgres\|PgPool\|sqlx\|Ecto.Repo" --include="*.rs" --include="*.ex" --include="*.py" --include="*.ts" --include="*.go" 2>/dev/null | head -5
```
### Stack Detection Rules
| Files Found | Stack | Build Commands | Default Port |
|-------------|-------|----------------|--------------|
| `Cargo.toml` | Rust (cargo) | `cargo build --release` | N/A (CLI) or 8080 |
| `mix.exs` | Elixir (mix) | `mix deps.get && mix compile` | 4000 |
| `package.json` + `pnpm-lock.yaml` | Node.js (pnpm) | `pnpm install && pnpm run build` | 5173, 3000 |
| `package.json` + `package-lock.json` | Node.js (npm) | `npm install && npm run build` | 5173, 3000 |
| `package.json` + `yarn.lock` | Node.js (yarn) | `yarn install && yarn build` | 5173, 3000 |
| `pyproject.toml` + `uv.lock` | Python (uv) | `uv sync` | 8000 |
| `pyproject.toml` + `poetry.lock` | Python (poetry) | `poetry install` | 8000 |
| `go.mod` | Go | `go build ./...` | 8080 |
| `docker-compose.yml` | Docker | `docker-compose up -d` | Parse from compose |
| `Makefile` with `dev:` target | Make-based | `make dev` | Infer from Makefile |
### Determine Project Type
After detecting the stack, classify the project:
| Type | How to detect | E2E test approach |
|------|--------------|-------------------|
| **CLI tool** | Has `fn main` / `if __name__` / binary targets, no HTTP listener | Build binary, invoke subcommands with real args, check stdout/stderr/exit code/side effects |
| **HTTP server** | Has route definitions, listens on a port | Start server, hit endpoints with curl, verify responses and database state |
| **Web app (frontend)** | Has React/Vue/Svelte routes, serves HTML | Start dev server, use agent-browser for UI interactions |
| **Full-stack** | Has both server and frontend | Start both, test API + UI |
| **Library only** | No binary, no server, no main — only `lib.rs`/`__init__.py`/package exports | Write a small driver script that exercises the public API, or test through a downstream consumer |
### Entrypoint Discovery
**Rust (clap CLI):**
```bash
# Find CLI subcommands
grep -rn "Subcommand\|#\[command\]" --include="*.rs" | head -20
# Find binary targets
grep -rn "\[\[bin\]\]\|fn main" --include="*.rs" --include="*.toml" | head -20
# Find HTTP routes (axum/actix/rocket)
grep -rn "Router::new\|\.route(\|#\[get\]\|#\[post\]\|HttpServer" --include="*.rs" | head -20
```
**Elixir (Phoenix):**
```bash
grep -rn "get \"/\|post \"/\|pipe_through\|live \"/\|scope \"/\"" --include="*.ex" | head -20
grep -rn "def handle_event\|def mount" --include="*.ex" | head -20
```
**Python:**
```bash
grep -rn "@app\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
grep -rn "@router\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
grep -rn "@click.command\|@app.command\|add_parser" --include="*.py" | head -20
```
**Node.js (Express/Fastify):**
```bash
grep -rn "app\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
grep -rn "router\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
```
**React Router:**
```bash
grep -rn "createBrowserRouter\|<Route\|path=" --include="*.tsx" --include="*.jsx" | head -20
```
**Go (net/http, gin, chi):**
```bash
grep -rn "http.HandleFunc\|r.GET\|r.POST\|router.Get\|router.Post" --include="*.go" | head -20
grep -rn "cobra.Command\|AddCommand" --include="*.go" | head -20
```
Build a map of:
- CLI subcommands: name + args + description + file:line
- API endpoints: method + path + file:line
- UI routes: path + component + file:line
- Database migrations: filename + tables affected
### Port Discovery
```bash
grep -E "^PORT=" .env .env.example .env.local 2>/dev/null
grep -A2 "ports:" docker-compose.yml 2>/dev/null
grep -E "port:" vite.config.ts vite.config.js 2>/dev/null
```
## Step 4: Trace Changes to Entry Points
For each changed file, determine if it affects user-facing functionality:
1. Direct entry point change - file contains route definitions
2. Import chain analysis - find what imports the changed file and trace up to entry points
3. Architecture-aware tracing - read CLAUDE.md, README, or architecture docs for module relationships
4. Document the trace path in test context
### Import Chain Analysis by Ecosystem
```bash
# Rust — use/mod/crate references
grep -rn "use.*<crate>\|mod <module>" --include="*.rs"
# Also check Cargo.toml dependencies between workspace crates
grep -rn "<crate-name>" --include="Cargo.toml"
# Python
grep -rn "from.*<module>\|import.*<module>" --include="*.py"
# TypeScript/JavaScript
grep -rn "from.*<module>\|require.*<module>" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx"
# Elixir
grep -rn "alias.*<Module>\|import.*<Module>\|use.*<Module>" --include="*.ex" --include="*.exs"
# Go
grep -rn "<package>\." --include="*.go"
```
If the ecosystem is not covered above, or grep results are inconclusive, read the project's CLAUDE.md, README, or architecture docs to understand the module graph and trace the data flow from changed files to user-facing entry points.
### Classify Affected Entry Points
| Category | Description | Examples | Priority |
|----------|-------------|----------|----------|
| Core functionality | Entry points where the feature does its actual work for the end user | Chat endpoint, API action, data processing pipeline, generation flow | High - test first |
| Configuration/admin | Entry points where the feature is set up, toggled, or configured | Settings page, admin dashboard, preference toggles, dropdown selections | Lower - test after core |
Requirement: At least one test must target a core functionality entry point before generating configuration/admin tests.
FILE:references/test-case-generation.md
# Test Case Generation
This reference keeps the long test-template examples and prioritization guidance out of `SKILL.md`.
## Critical: E2E Only
Every test case must exercise the **real built artifact** — the actual binary, server, or UI — exactly as a human QA tester would. A test case that invokes the project's automated test runner (`cargo test`, `pytest`, `npm test`, `go test`, `mix test`, etc.) is **never valid**. Those tests already run in CI. The purpose of this plan is to verify behavior that automated tests cannot: real end-to-end workflows through the actual user interface (CLI, HTTP, browser).
## Step 5: Generate Test Cases
Before generating test cases, answer: "What does this change do for the end user?"
Then ask: "How would a human tester — who has never seen the code — verify this works?"
Generate tests in this order:
1. Core functionality tests first - exercise the primary behavioral change through the actual user-facing interface.
2. Configuration/admin tests second - support the feature but do not replace the core test.
### CLI Applications (shell tests)
```yaml
- id: TC-XX
name: <Describe what user action this represents>
context: |
<Which files changed and why this subcommand is affected>
steps:
# Build the binary first (or reference it from setup.build)
- run: <invoke the CLI binary with real arguments>
- run: <verify the effect — check stdout, query database, inspect files>
expected: |
<Exit code, stdout content, side effects (files created, DB rows, etc.)>
```
**CLI test examples by scenario:**
```yaml
# Subcommand that writes to a database
- id: TC-01
name: "volant plan creates a workflow in PostgreSQL"
steps:
- run: ./target/debug/volant plan "Fix login bug" --description "Users can't log in" --sandbox local
- run: psql "DATABASE_URL" -c "SELECT id, state FROM workflows ORDER BY created_at DESC LIMIT 1"
expected: |
Exit code 0. A new workflow row exists with state 'pending' or further.
# Subcommand that outputs to stdout
- id: TC-02
name: "volant status lists workflows"
steps:
- run: ./target/debug/volant status --all
expected: |
Exit code 0. Outputs a table or list of workflows. Does not crash on empty database.
# Subcommand with --dry-run
- id: TC-03
name: "volant run --dry-run validates without executing"
steps:
- run: ./target/debug/volant run example.toml --dry-run
expected: |
Exit code 0. Prints the resolved workflow structure. Does not execute any nodes.
# Error handling — missing config
- id: TC-04
name: "volant plan errors gracefully without API key"
steps:
- run: env -u ANTHROPIC_API_KEY ./target/debug/volant plan "test" 2>&1 || true
expected: |
Non-zero exit code. Stderr contains a meaningful error about missing configuration,
not a panic or stack trace.
```
### API Endpoints (curl tests)
```yaml
- id: TC-XX
name: <Describe what user action this represents>
context: |
<Which files changed and why this endpoint is affected>
steps:
- action: curl
method: <GET|POST|PUT|DELETE>
url: http://localhost:<port>/<path>
headers:
Content-Type: application/json
body: <JSON body if needed>
expected: |
<HTTP status code, response body shape, side effects>
```
### Database Verification
```yaml
- id: TC-XX
name: <Describe what data change this verifies>
context: |
<Which migration or data-writing code changed>
steps:
- run: <command that triggers the data write>
- run: psql "DATABASE_URL" -c "<SQL query to verify>"
expected: |
<Expected rows, column values, or schema state>
```
**Database test examples:**
```yaml
# Migration applies cleanly
- id: TC-05
name: "Session tables migration creates expected schema"
steps:
- run: psql "DATABASE_URL" -c "\dt sessions"
- run: psql "DATABASE_URL" -c "\d sessions"
expected: |
The 'sessions' table exists with columns matching the migration definition.
# Data roundtrip through the application
- id: TC-06
name: "Checkpoint save and resume preserves workflow state"
steps:
- run: ./target/debug/volant plan "test checkpoint" --sandbox local
- run: psql "DATABASE_URL" -c "SELECT workflow_id, state FROM checkpoints ORDER BY created_at DESC LIMIT 1"
expected: |
A checkpoint row exists for the workflow with the expected state payload.
```
### UI Routes (agent-browser CLI tests)
```yaml
- id: TC-XX
name: <Describe the user journey>
context: |
<Which files changed and why this route is affected>
steps:
- run: agent-browser open http://localhost:<port>/<path>
- run: agent-browser snapshot -i
note: Capture interactive elements with refs
- run: agent-browser fill @<ref> "<test value>"
- run: agent-browser click @<ref>
- run: agent-browser wait --url "**/<expected-path>"
- run: agent-browser snapshot -i
note: Verify final state
- run: agent-browser screenshot docs/testing/evidence/tc-XX.png
expected: |
<Natural language description of expected behavior>
evidence:
screenshot: docs/testing/evidence/tc-XX.png
```
### Process Lifecycle (TUI / long-running)
```yaml
- id: TC-XX
name: <Describe the process behavior>
steps:
# Start the process in background, give it time to initialize, then verify
- run: timeout 5 ./target/debug/volant 2>&1 || true
- run: <verify it started correctly — check stderr output, temp files, etc.>
expected: |
Process starts without crash. Produces expected initial output.
Exits cleanly on timeout/interrupt.
```
### Test Case Guidelines
- **Never invoke the project's test runner** — every step must be a real user action
- At least one test per affected user-facing entry point
- CLI tests for command-line tools — invoke the binary directly
- curl tests for HTTP servers
- Browser tests for web UIs — always use real `agent-browser` CLI commands
- Database tests when migrations or data-writing code changed
- Include authentication/config steps if commands require credentials
- Test error paths — missing config, bad input, unreachable services
- Always snapshot before interacting and re-snapshot after navigation or DOM changes
## Step 6: Write YAML Test Plan
Create `docs/testing/test-plan.yaml` with metadata, setup, prerequisites, and the generated tests.
## Step 7: Report Summary
Report the generated file, detected stack, tests generated, entry-point coverage, and next steps.
## Step 8: Verification
Verify the YAML file exists, parses successfully, and includes the required top-level keys.
**Additional verification:** Grep every `run:` step for test runner commands (`cargo test`, `pytest`, `npm test`, `go test`, `mix test`, `jest`, `vitest`). If any are found, the plan fails — replace with real E2E actions.
Guidance for scaffolding new Rust projects. Use when: (1) starting a new Rust project or workspace, (2) configuring Cargo.toml best practices, (3) setting up...
---
name: rust-project-setup
description: >
Guidance for scaffolding new Rust projects. Use when:
(1) starting a new Rust project or workspace,
(2) configuring Cargo.toml best practices,
(3) setting up CI pipelines for Rust,
(4) organizing a multi-crate workspace,
(5) configuring clippy, rustfmt, and linting.
---
# Rust Project Setup
Step-by-step guidance for setting up new Rust projects with proper configuration, linting, and CI.
## Quick Reference
| Topic | Reference |
|-------|-----------|
| Cargo.toml configuration, profiles, dependencies | [references/cargo-config.md](references/cargo-config.md) |
| Workspace organization, member layout, shared deps | [references/workspace-layout.md](references/workspace-layout.md) |
| GitHub Actions CI, caching, MSRV checks | [references/ci-setup.md](references/ci-setup.md) |
| Feature flags, conditional compilation, build scripts | [references/features-conditional.md](references/features-conditional.md) |
| no_std development, embedded targets, cross-compilation | [references/no-std.md](references/no-std.md) |
## New Project Checklist
### 1. Create the Project
```shell
# Binary
cargo init my-app
# Library
cargo init --lib my-lib
# Workspace (create Cargo.toml manually)
mkdir my-workspace && cd my-workspace
```
### 2. Configure Cargo.toml
Set edition, rust-version (MSRV), and metadata:
```toml
[package]
name = "my-app"
version = "0.1.0"
edition = "2024"
rust-version = "1.85"
```
### 3. Set Up Linting
Add clippy and rustfmt configuration:
```toml
# Cargo.toml
[lints.clippy]
all = { level = "deny", priority = 10 }
pedantic = { level = "warn", priority = 3 }
[lints.rust]
future-incompatible = "warn"
nonstandard_style = "deny"
# unsafe_op_in_unsafe_fn is deny-by-default in edition 2024 — no need to set it
```
> **Edition 2024 lint defaults**: `unsafe_op_in_unsafe_fn` is deny by default. Unsafe operations inside `unsafe fn` require explicit `unsafe {}` blocks. The `gen` keyword is reserved — use `r#gen` if needed as an identifier.
```toml
# rustfmt.toml
edition = "2024"
reorder_imports = true
imports_granularity = "Crate"
group_imports = "StdExternalCrate"
```
### 4. Configure Profiles
```toml
[profile.release]
lto = true
codegen-units = 1
strip = true
```
### 5. Set Up CI
Add GitHub Actions workflow for check, clippy, test, and fmt. See [references/ci-setup.md](references/ci-setup.md).
### 6. Cargo.lock Policy
- **Binaries**: Commit `Cargo.lock` (reproducible builds)
- **Libraries**: Do NOT commit `Cargo.lock` (consumers resolve their own versions)
- Add to `.gitignore` for libraries: `Cargo.lock`
### 7. Documentation Setup
For library crates, enable doc lints:
```rust
// src/lib.rs
#![deny(missing_docs)]
```
Prefer `#[expect(lint)]` over `#[allow(lint)]` for temporary suppressions — it warns when the suppression becomes unnecessary:
```rust
#[expect(dead_code, reason = "used in next PR")]
fn upcoming_feature() {}
```
## Workspace vs Single Crate
| Use | When |
|-----|------|
| Single crate | Small project, CLI tool, simple library |
| Workspace | Multiple related crates, shared dependencies, separate compile targets |
Workspaces reduce compile times by sharing dependencies and build artifacts across members.
## Project Structure
### Binary
```text
my-app/
Cargo.toml
rustfmt.toml
src/
main.rs
lib.rs # separate logic from entry point
tests/
integration_test.rs
```
### Library
```text
my-lib/
Cargo.toml
rustfmt.toml
src/
lib.rs
module_a.rs
module_b/
mod.rs
types.rs
tests/
api_test.rs
examples/
basic_usage.rs
```
### Workspace
```text
my-workspace/
Cargo.toml # [workspace] definition
rustfmt.toml # shared formatting
crates/
core/ # shared types and logic
api/ # HTTP server
cli/ # command-line interface
```
## Dependency Best Practices
- Pin exact versions for binaries: `serde = "=1.0.210"`
- Use version ranges for libraries: `serde = "1"`
- Group features explicitly: `tokio = { version = "1", features = ["rt-multi-thread", "macros"] }`
- Use `[dev-dependencies]` for test-only crates
- Review `cargo tree` for duplicate versions
- Run `cargo audit` for security vulnerabilities
- Replace `once_cell`/`lazy_static` with `std::sync::LazyLock` (stable since Rust 1.80)
## Edition 2024 Migration Notes
When migrating existing projects to edition 2024:
- `unsafe fn` bodies now require explicit `unsafe {}` blocks around unsafe operations
- `extern "C" {}` blocks must be written as `unsafe extern "C" {}`
- `#[no_mangle]` and `#[export_name]` require `#[unsafe(no_mangle)]` and `#[unsafe(export_name)]`
- `gen` is a reserved keyword — rename any `gen` identifiers to `r#gen` or choose a different name
- `-> impl Trait` captures all in-scope lifetimes by default; use `+ use<'a>` for precise control
- `!` (never type) falls back to `!` instead of `()` — review match arms and diverging expressions
- Temporaries in `if let` and tail expressions drop earlier — review code holding locks or guards in these positions
Run `cargo fix --edition` to auto-fix most mechanical changes.
## Setup completion gates
Use these as **objective pass conditions** after the checklist—not informal “looks done.”
1. **Manifest loads** — From the project or workspace root, run `cargo metadata --format-version 1`. **Pass:** exit code 0 and the output lists your crate(s) (package `name` matches what you expect).
2. **Lint and format** — Run `cargo clippy --all-targets` (add `-- -D warnings` if warnings must fail) and `cargo fmt --check`. **Pass:** both exit 0 before you treat CI as authoritative.
3. **CI present** — You committed the workflow you intend to run (see [references/ci-setup.md](references/ci-setup.md)). **Pass:** at least one pipeline run finishes green for check, clippy, tests, and fmt (or the subset you defined).
4. **Lockfile policy** — **Binary crate:** `Cargo.lock` is committed (`git ls-files Cargo.lock` prints `Cargo.lock`). **Library crate:** `Cargo.lock` is not tracked (empty `git ls-files Cargo.lock`, or file gitignored and never added). **Pass:** the index matches that policy with no surprise `Cargo.lock` changes.
## Related Skills
- `beagle-rust:rust-best-practices` — idiomatic patterns and edition 2024 coding guidance
- `beagle-rust:rust-code-review` — code review covering ownership, unsafe, and trait design
FILE:references/cargo-config.md
# Cargo.toml Configuration
## Package Metadata
```toml
[package]
name = "my-crate"
version = "0.1.0"
edition = "2024" # latest stable edition
rust-version = "1.85" # minimum supported Rust version (MSRV)
description = "What this crate does"
license = "MIT OR Apache-2.0"
repository = "https://github.com/org/repo"
```
### Edition Selection
| Edition | When to Use |
|---------|-------------|
| 2024 | New projects (latest, best defaults) |
| 2021 | Projects supporting older Rust versions |
| 2018 | Legacy compatibility only |
Each edition enables new language features and changes some defaults. Editions are opt-in and backward compatible.
#### Edition 2024 Key Behavioral Changes
- **`unsafe_op_in_unsafe_fn` = deny**: Unsafe operations inside `unsafe fn` require explicit `unsafe {}` blocks
- **`unsafe extern` blocks**: `extern "C" {}` must be `unsafe extern "C" {}`
- **`unsafe` attributes**: `#[no_mangle]` becomes `#[unsafe(no_mangle)]`, same for `#[export_name]`
- **`gen` keyword reserved**: Use `r#gen` if you have identifiers named `gen`
- **RPIT lifetime capture**: `-> impl Trait` captures all in-scope lifetimes; use `+ use<'a, T>` for precise control
- **`never_type_fallback`**: `!` falls back to `!` instead of `()`
- **Temporary drop scopes**: Temporaries in `if let` conditions and tail expressions drop earlier
- **`IntoIterator` for `Box<[T]>`**: Now available without explicit conversion
Run `cargo fix --edition` to auto-migrate most mechanical changes when upgrading.
### MSRV (Minimum Supported Rust Version)
Set `rust-version` to declare the oldest Rust version your crate supports. CI should test against this version.
```toml
rust-version = "1.85"
```
## Dependencies
### Version Specification
```toml
[dependencies]
# Libraries: semver range (compatible updates)
serde = "1"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
# Binaries: exact pinning (reproducible builds)
reqwest = "=0.12.5"
# Git dependencies (development only, never publish with these)
my-fork = { git = "https://github.com/user/fork", branch = "fix" }
# Path dependencies (workspace members)
shared-types = { path = "../shared-types" }
[dev-dependencies]
insta = { version = "1", features = ["yaml"] }
rstest = "0.23"
pretty_assertions = "1"
tokio = { version = "1", features = ["test-util"] }
[build-dependencies]
# Only for build.rs scripts
```
### Version Specifier Patterns
| Specifier | Meaning | Use When |
|-----------|---------|----------|
| `"1"` | `>=1.0.0, <2.0.0` | Library deps (wide compatibility) |
| `"1.4"` | `>=1.4.0, <2.0.0` | You need features added in 1.4 |
| `"1.4.3"` (or `"^1.4.3"`) | `>=1.4.3, <2.0.0` | Default caret behavior |
| `"~1.4.3"` | `>=1.4.3, <1.5.0` | Lock to a specific minor version |
| `"=1.4.3"` | Exactly `1.4.3` | Binary pinning, reproducibility |
| `">=1.4, <1.7"` | Range | Avoid known-broken versions |
Set the **minimum version that actually works**, not the latest. Use `cargo +nightly -Zminimal-versions check` to verify your lower bounds are correct. If your code needs something added in 1.6, don't specify `"1"` when `"1.6"` is the honest minimum.
### Patching Dependencies
Override any dependency source temporarily for testing fixes or unreleased changes:
```toml
[patch.crates-io]
regex = { path = "/home/dev/regex" }
serde = { git = "https://github.com/serde-rs/serde.git", branch = "fix" }
```
Patches apply globally across the dependency graph but are not carried into published crates. Use for development only.
### Feature Flags
Define optional features to reduce compile time and binary size:
```toml
[features]
default = ["json"]
json = ["dep:serde_json"]
full = ["json", "yaml", "toml-support"]
yaml = ["dep:serde_yaml"]
toml-support = ["dep:toml"]
```
Use `dep:` prefix (Rust 1.60+) to avoid implicit feature names from optional dependencies.
### Feature Composability Rules
Features must be **additive**. Enabling a feature should never remove types, modules, or function signatures. Cargo takes the **union** of all requested features when multiple crates depend on the same dependency with different features — mutually exclusive features break downstream builds.
Key gotchas:
- **Conditional public items**: If a public struct field or enum variant is gated by a feature, mark the type `#[non_exhaustive]`. Otherwise, dependents without the feature may stop compiling when another crate enables it.
- **Feature-gated trait impls**: Adding a trait impl behind a feature is safe. Removing one is breaking.
- **Test all combinations**: Use `cargo hack check --feature-powerset --no-dev-deps` to verify every combination compiles.
### Workspace Dependency Inheritance
Define dependency versions once at the workspace root, reference in members:
```toml
# Root Cargo.toml
[workspace.dependencies]
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
tracing = "0.1"
```
```toml
# Member Cargo.toml
[dependencies]
serde.workspace = true
tokio = { workspace = true, features = ["test-util"] } # extend features per-member
```
Members can add features on top of the workspace baseline. Version and base features stay consistent across the workspace.
### Deprecated Dependency Replacements
With Rust 1.80+ (required for edition 2024), several popular crates have stdlib replacements:
| Crate | Replacement | Since |
|-------|-------------|-------|
| `once_cell` | `std::sync::LazyLock`, `std::cell::LazyCell` | 1.80 |
| `lazy_static` | `std::sync::LazyLock` | 1.80 |
```rust
// BAD: external dependency for edition 2024 projects
use once_cell::sync::Lazy;
static CONFIG: Lazy<Config> = Lazy::new(|| Config::load());
// GOOD: stdlib LazyLock (stable since 1.80)
use std::sync::LazyLock;
static CONFIG: LazyLock<Config> = LazyLock::new(|| Config::load());
```
## Profiles
### Release Profile
```toml
[profile.release]
lto = true # link-time optimization (slower build, faster binary)
codegen-units = 1 # single codegen unit (slower build, better optimization)
strip = true # strip debug symbols (smaller binary)
panic = "abort" # smaller binary, no unwinding (not for libraries)
```
### Development Profile
```toml
[profile.dev]
opt-level = 0 # fast compilation (default)
[profile.dev.package."*"]
opt-level = 2 # optimize dependencies but not your code
```
### Test Profile
```toml
[profile.test]
opt-level = 1 # slightly faster test execution
```
### Custom Profiles
Define profiles beyond dev/release for specialized builds:
```toml
[profile.profiling]
inherits = "release"
debug = true # debug symbols for perf/flamegraph
strip = false
[profile.embedded]
inherits = "release"
opt-level = "s" # optimize for binary size
lto = true
codegen-units = 1
panic = "abort"
```
Use with `cargo build --profile profiling`. Each profile gets its own `target/<profile-name>/` output directory.
### Per-Dependency Profile Overrides
Optimize specific dependencies differently from your own code:
```toml
[profile.dev.package.serde]
opt-level = 3 # full optimization for serde even in debug
[profile.dev.package."*"]
opt-level = 2 # moderate optimization for all other deps
```
Useful when a dependency is prohibitively slow in debug mode (compression, video encoding, crypto). Note: generic code monomorphized in your crate uses your crate's profile settings, not the dependency override.
## Supply Chain Auditing with cargo-deny
Configure `cargo-deny` for automated dependency auditing in CI:
```shell
cargo install cargo-deny
cargo deny init # creates deny.toml
cargo deny check # run all checks
```
```toml
# deny.toml
[licenses]
allow = ["MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause"]
[bans]
multiple-versions = "warn"
wildcards = "deny"
[advisories]
vulnerability = "deny"
unmaintained = "warn"
[sources]
allow-git = []
```
Also run `cargo audit` for security vulnerability checks. Both tools complement each other: `cargo-deny` covers licenses and duplicates, `cargo-audit` focuses on CVEs.
## Clippy and Lint Configuration
### Package-Level Lints
```toml
[lints.clippy]
all = { level = "deny", priority = 10 }
redundant_clone = { level = "deny", priority = 9 }
pedantic = { level = "warn", priority = 3 }
[lints.rust]
future-incompatible = "warn"
nonstandard_style = "deny"
unsafe_code = "deny" # for crates that should never use unsafe
# unsafe_op_in_unsafe_fn is deny-by-default in edition 2024 — no explicit entry needed
```
#### Edition 2024 Lint Defaults
These lints are deny-by-default in edition 2024 and do not need explicit configuration:
| Lint | Effect |
|------|--------|
| `unsafe_op_in_unsafe_fn` | Unsafe ops in `unsafe fn` require explicit `unsafe {}` blocks |
| `never_type_fallback_flowing_into_unsafe` | Prevents `!` fallback into unsafe contexts |
Use `#[expect(lint)]` instead of `#[allow(lint)]` for temporary suppressions — it warns when the suppression becomes unnecessary:
```rust
#[expect(clippy::needless_pass_by_value, reason = "required by framework trait")]
fn handler(req: Request) -> Response { /* ... */ }
```
### Workspace-Level Lints
Define once, inherit everywhere:
```toml
# Root Cargo.toml
[workspace.lints.clippy]
all = { level = "deny", priority = 10 }
pedantic = { level = "warn", priority = 3 }
```
```toml
# Member Cargo.toml
[lints]
workspace = true
```
## rustfmt.toml
```toml
edition = "2024"
max_width = 100
reorder_imports = true
imports_granularity = "Crate"
group_imports = "StdExternalCrate"
use_field_init_shorthand = true
```
Place in the repository root. Runs automatically with `cargo fmt`.
## Cargo.lock Policy
| Project Type | Commit Cargo.lock? | Reason |
|-------------|-------------------|--------|
| Binary / Application | Yes | Reproducible builds |
| Library | No | Consumers resolve their own versions |
| Workspace with binaries | Yes | Binary members need reproducible builds |
For libraries, add to `.gitignore`:
```gitignore
Cargo.lock
```
## Useful Commands
```shell
cargo check # fast type checking without building
cargo build --release # optimized build
cargo test # run all tests
cargo doc --open # generate and view documentation
cargo tree # show dependency tree
cargo audit # check for security vulnerabilities
cargo update # update dependencies within semver constraints
cargo clippy --fix # auto-fix clippy suggestions
```
FILE:references/ci-setup.md
# CI Setup
## GitHub Actions for Rust
### Complete Workflow
```yaml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: -Dwarnings
jobs:
check:
name: Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo check --workspace --all-targets
clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
components: clippy
- uses: Swatinem/rust-cache@v2
- run: cargo clippy --workspace --all-targets --all-features -- -D warnings
# Edition 2024: unsafe_op_in_unsafe_fn is warn-by-default (not deny).
# With `-D warnings` above, clippy will fail on these. For projects
# mixing editions, add explicit flags:
# -- -D warnings -W unsafe_op_in_unsafe_fn
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo test --workspace
fmt:
name: Format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt
- run: cargo fmt --all --check
msrv:
name: MSRV
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@master
with:
toolchain: "1.85" # match rust-version in Cargo.toml
- uses: Swatinem/rust-cache@v2
- run: cargo check --workspace
```
## Key Actions
### dtolnay/rust-toolchain
Installs a specific Rust toolchain. More reliable than `actions-rs`:
```yaml
- uses: dtolnay/rust-toolchain@stable
with:
components: clippy, rustfmt
```
For MSRV testing:
```yaml
- uses: dtolnay/rust-toolchain@master
with:
toolchain: "1.85"
```
### Swatinem/rust-cache
Caches Cargo registry, build artifacts, and target directory:
```yaml
- uses: Swatinem/rust-cache@v2
```
Automatic cache key based on Cargo.lock and toolchain. Typical speedup: 2-5x on subsequent runs.
Options:
```yaml
- uses: Swatinem/rust-cache@v2
with:
cache-on-failure: true # cache even if build fails
shared-key: "shared" # share cache across jobs
```
## MSRV Testing
Test against the minimum supported Rust version declared in `Cargo.toml`:
```toml
# Cargo.toml
rust-version = "1.85"
```
The MSRV job uses that exact version. If it breaks, either:
- Fix the code to work on the MSRV
- Bump `rust-version` in Cargo.toml
### Edition 2024 MSRV
Edition 2024 requires Rust 1.85 or later. For projects using edition 2024, the MSRV cannot be lower than `1.85`. If your workspace mixes editions (e.g., a member still on edition 2021), the MSRV job should test against the highest edition's minimum:
```yaml
msrv:
name: MSRV
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@master
with:
toolchain: "1.85" # edition 2024 minimum
- uses: Swatinem/rust-cache@v2
- run: cargo check --workspace
```
Consider a matrix strategy if you support multiple toolchain versions:
```yaml
msrv:
name: MSRV
strategy:
matrix:
toolchain: ["1.85", "stable"]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@master
with:
toolchain: { matrix.toolchain}
- uses: Swatinem/rust-cache@v2
with:
shared-key: "msrv-{ matrix.toolchain}"
- run: cargo check --workspace
```
## Doc Tests
`cargo nextest` does not run doc tests. Run them separately:
```yaml
doc-test:
name: Doc Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo test --doc --workspace
```
## Security Audit
Check dependencies for known vulnerabilities:
```yaml
audit:
name: Security Audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: rustsec/audit-check@v2
with:
token: { secrets.GITHUB_TOKEN}
```
## Release Builds
For release pipelines, optimize build settings:
```yaml
release:
name: Release Build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo build --release
- uses: actions/upload-artifact@v4
with:
name: binary
path: target/release/my-app
```
## Cross-Compilation
Build for multiple targets:
```yaml
cross:
name: Cross-compile
strategy:
matrix:
target:
- x86_64-unknown-linux-gnu
- aarch64-unknown-linux-gnu
- x86_64-apple-darwin
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
targets: { matrix.target}
- run: cargo check --target { matrix.target}
```
For full cross-compilation builds, consider the `cross` tool:
```shell
cargo install cross
cross build --target aarch64-unknown-linux-gnu --release
```
## Caching Strategy
| What | How | Impact |
|------|-----|--------|
| Cargo registry | `Swatinem/rust-cache` (automatic) | Avoids re-downloading crates |
| Build artifacts | `Swatinem/rust-cache` (automatic) | Avoids recompiling unchanged deps |
| sccache | Manual setup with `RUSTC_WRAPPER=sccache` | Shares cache across branches |
For large workspaces, consider `sccache` for cross-branch cache sharing.
## Workflow Tips
- Run `cargo check` before `cargo test` -- it is faster and catches most issues
- Use `RUSTFLAGS: -Dwarnings` in env to fail on warnings across all jobs
- Keep MSRV job separate -- it runs less often and has different cache needs
- Use `cargo nextest` for faster test execution (parallel, better output)
FILE:references/features-conditional.md
# Features and Conditional Compilation
## Feature Flag Design
Features must be **additive and composable**. Enabling a feature should never remove functionality or break compilation. If crate A compiles with some set of features on crate C, it must also compile with all features enabled on crate C.
Cargo takes the **union** of all requested features when multiple dependents enable different features on the same crate. Mutually exclusive features break downstream builds.
### Default Features
Curate defaults for the common case. Let users opt out of heavy dependencies:
```toml
[features]
default = ["json", "logging"]
json = ["dep:serde_json"]
logging = ["dep:tracing"]
full = ["json", "logging", "yaml", "compression"]
yaml = ["dep:serde_yaml"]
compression = ["dep:flate2"]
```
Use `dep:` prefix (Rust 1.60+) to avoid implicit feature names from optional dependencies.
### `std` Feature Pattern for no_std Crates
Use an additive `std` feature, not a subtractive `no-std` feature:
```toml
[features]
default = ["std"]
std = []
alloc = []
```
```rust
#![cfg_attr(not(feature = "std"), no_std)]
#[cfg(feature = "alloc")]
extern crate alloc;
#[cfg(feature = "std")]
pub fn read_file(path: &str) -> std::io::Result<Vec<u8>> {
std::fs::read(path)
}
```
This way, any crate in the dependency graph enabling `std` adds functionality rather than removing it.
### Feature Documentation
Document what each feature enables. Users should not have to read source to understand features:
```toml
[package.metadata.docs.rs]
all-features = true # build docs with all features enabled
```
### Testing Feature Combinations
Use `cargo-hack` to verify all feature combinations compile:
```shell
cargo install cargo-hack
cargo hack check --feature-powerset --no-dev-deps
```
Configure CI to run this check. Any combination of features must compile.
## Conditional Compilation
### `#[cfg(...)]` Attribute
Place on items (functions, types, impl blocks, modules, use statements, struct fields):
```rust
#[cfg(feature = "metrics")]
mod metrics;
#[cfg(target_os = "linux")]
fn platform_init() { /* linux-specific */ }
#[cfg(all(feature = "std", target_arch = "x86_64"))]
fn optimized_path() { /* ... */ }
```
### `cfg_attr` for Conditional Attributes
Apply attributes only when a condition holds:
```rust
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Config {
pub name: String,
}
#[cfg_attr(miri, ignore)]
#[test]
fn expensive_test() { /* skipped under Miri */ }
```
### Common cfg Options
| Option | Example | Use |
|--------|---------|-----|
| `feature = "name"` | `cfg(feature = "json")` | Feature-gated code |
| `target_os` | `cfg(target_os = "macos")` | OS-specific code |
| `unix` / `windows` | `cfg(unix)` | OS family shorthand |
| `target_arch` | `cfg(target_arch = "aarch64")` | Architecture-specific |
| `test` | `cfg(test)` | Test-only code (current crate only) |
| `debug_assertions` | `cfg(debug_assertions)` | Debug mode only |
Combine with `all()`, `any()`, `not()`:
```rust
#[cfg(any(target_os = "linux", target_os = "macos"))]
fn unix_like_setup() { /* ... */ }
```
### Conditional Dependencies
Gate platform-specific dependencies in Cargo.toml:
```toml
[target.'cfg(windows)'.dependencies]
winapi = { version = "0.3", features = ["winuser"] }
[target.'cfg(unix)'.dependencies]
nix = "0.29"
```
Note: only target-based cfg options work here. Feature and context options are not available at dependency resolution time.
## Build Scripts (build.rs)
Use build scripts for compile-time code generation and native library compilation:
```rust
// build.rs
fn main() {
// Link a native library
println!("cargo:rustc-link-lib=static=mylib");
println!("cargo:rustc-link-search=native=/usr/local/lib");
// Set a custom cfg option
println!("cargo:rustc-cfg=has_feature_x");
// Rerun only when this file changes
println!("cargo:rerun-if-changed=wrapper.h");
}
```
Declare build script dependencies separately:
```toml
[build-dependencies]
cc = "1" # compile C/C++ code
bindgen = "0.72" # generate FFI bindings
```
## Project Directory Organization
### Examples Directory
Place runnable examples in `examples/`:
```text
examples/
basic.rs # cargo run --example basic
advanced/
main.rs # cargo run --example advanced
helper.rs
```
### Benchmarks Directory
Place benchmarks in `benches/`:
```text
benches/
throughput.rs # cargo bench --bench throughput
```
Use `criterion` for stable benchmarks (the built-in `#[bench]` is nightly-only):
```toml
[[bench]]
name = "throughput"
harness = false
[dev-dependencies]
criterion = { version = "0.8", features = ["html_reports"] }
```
## Dependency Auditing
See [cargo-config.md](cargo-config.md) for `cargo-deny` setup covering license compliance, duplicate detection, and vulnerability scanning.
FILE:references/no-std.md
# no_std Development
## Opting Out of the Standard Library
`#![no_std]` switches the crate prelude from `std::prelude` to `core::prelude`, preventing accidental dependence on OS-provided functionality:
```rust
#![no_std]
// core types (Option, Result, Iterator) are available through the prelude
// std types (File, HashMap, println!) are not
```
The attribute only changes the prelude. You can still explicitly `use std::` if needed, which enables the common pattern of offering both no_std and std APIs through feature flags.
## Three Library Tiers
| Tier | Provides | Requires |
|------|----------|----------|
| `core` | Fundamental types (Option, Result), iterators, sorting, atomics, marker types | Nothing beyond the hardware |
| `alloc` | Vec, String, Box, Arc, Rc, BTreeMap, format! | A memory allocator |
| `std` | File, net, HashMap, println!, time, threads | An operating system |
`std` re-exports everything from `core` and `alloc`. Most types you access through `std::` actually live in `core::` or `alloc::`.
### What Each Tier Excludes
- **core only**: no heap allocation, no collections, no String, no I/O
- **core + alloc**: no HashMap (requires OS randomness), no filesystem, no networking, no threads
- **std**: full functionality, requires OS support
## Using alloc in no_std
Opt into heap-allocated types without pulling in the full standard library:
```rust
#![no_std]
extern crate alloc;
use alloc::vec::Vec;
use alloc::string::String;
use alloc::boxed::Box;
use alloc::sync::Arc;
use alloc::collections::BTreeMap;
```
Replace `use std::` with `use alloc::` for heap types. Note: `HashMap` is not in `alloc` because it requires OS-provided randomness for key hashing.
### Custom Allocator
Define a global allocator when the platform has no default:
```rust
use core::alloc::{GlobalAlloc, Layout};
struct MyAllocator;
unsafe impl GlobalAlloc for MyAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
// platform-specific allocation
# unimplemented!()
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
// platform-specific deallocation
# unimplemented!()
}
}
#[global_allocator]
static ALLOCATOR: MyAllocator = MyAllocator;
```
## Embedded Binary Setup
For targets without an OS, opt out of both the standard library and the Rust runtime:
```rust
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
loop {} // halt on panic; alternatives: abort, reset device
}
#[unsafe(no_mangle)] // edition 2024 syntax
pub extern "C" fn main() -> ! {
// entry point — never returns
loop {}
}
```
- `#![no_main]` removes the Rust runtime (`lang_start`), so no command-line arg setup, no signal handlers, no stdout flushing
- `#[panic_handler]` is required: defines what happens on panic (must diverge with `-> !`)
- The entry point signature must match the target platform's expectations
## Volatile Memory Access
Use volatile reads and writes for memory-mapped hardware registers. The compiler cannot elide or reorder volatile operations:
```rust
use core::ptr;
const GPIO_REG: *mut u32 = 0x4000_0000 as *mut u32;
fn set_pin_high(pin: u8) {
unsafe {
let current = ptr::read_volatile(GPIO_REG);
ptr::write_volatile(GPIO_REG, current | (1 << pin));
}
}
```
Use volatile operations when:
- Hardware registers have side effects on read
- Interrupt handlers access shared memory
- Memory-mapped device state must be read/written in exact order
## Type-Safe Hardware State Machines
Use `PhantomData` and zero-sized types to enforce valid hardware states at compile time:
```rust
use core::marker::PhantomData;
pub struct On;
pub struct Off;
pub struct Led<State>(PhantomData<State>);
impl Led<Off> {
pub fn turn_on(self) -> Led<On> {
// write to hardware register
Led(PhantomData)
}
}
impl Led<On> {
pub fn turn_off(self) -> Led<Off> {
// write to hardware register
Led(PhantomData)
}
}
```
Methods consume `self` and return the new state type, making invalid transitions unrepresentable. The `PhantomData` carries no runtime cost.
## Fixed-Size Stack Collections
When heap allocation is unavailable, use stack-allocated collections with const generics. The `arrayvec` crate provides production-ready `ArrayVec<T, N>` and `ArrayString<N>` types that store elements inline with a compile-time capacity limit and fail gracefully when full.
## Cross-Compilation
Targets follow the format `machine-vendor-os` (e.g., `thumbv7m-none-eabi`, `x86_64-unknown-linux-musl`):
```shell
rustup target add thumbv7m-none-eabi
cargo build --target thumbv7m-none-eabi
```
### Verifying no_std Compatibility
Build against a bare-metal target to catch accidental `std` usage in your code and dependencies:
```shell
cargo check --target thumbv7m-none-eabi
```
Add this to CI for no_std crates. For custom targets without a prebuilt standard library:
```shell
rustup component add rust-src
cargo build -Z build-std=core,alloc --target my-custom-target.json
```
## Static Memory Preferences
In embedded contexts, prefer static and stack allocations:
| Strategy | When to Use |
|----------|-------------|
| `const` / `static` | Global configuration, lookup tables, singleton hardware handles |
| Stack arrays (`[T; N]`) | Fixed-size buffers with known bounds |
| `ArrayVec<T, N>` | Variable-length data with a compile-time maximum |
| `alloc` types | Only when dynamic sizing is essential and an allocator is available |
For fallible allocation in `alloc`-using code, prefer `try_` variants (`Vec::try_reserve`, `Box::try_new`) over panicking methods when targeting environments where out-of-memory must be handled gracefully.
FILE:references/workspace-layout.md
# Workspace Layout
## When to Use Workspaces
Use a workspace when you have multiple related crates that:
- Share dependencies (reduces compile time and disk usage)
- Need coordinated versioning
- Have separate build targets (binary + library, multiple binaries)
- Benefit from a shared CI pipeline
Don't use a workspace for a single crate. The overhead isn't worth it.
## Basic Structure
```text
my-workspace/
Cargo.toml # workspace root
rustfmt.toml # shared formatting
.github/
workflows/ci.yml # shared CI
crates/
core/ # shared types and logic
Cargo.toml
src/lib.rs
api/ # HTTP server binary
Cargo.toml
src/main.rs
cli/ # CLI binary
Cargo.toml
src/main.rs
tests/ # workspace-level integration tests (optional)
```
## Workspace Cargo.toml
```toml
[workspace]
resolver = "3" # default for edition 2024; explicit for clarity
members = [
"crates/core",
"crates/api",
"crates/cli",
]
[workspace.package]
edition = "2024"
rust-version = "1.85"
license = "MIT"
repository = "https://github.com/org/repo"
[workspace.dependencies]
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
thiserror = "2"
tracing = "0.1"
[workspace.lints.clippy]
all = { level = "deny", priority = 10 }
pedantic = { level = "warn", priority = 3 }
[workspace.lints.rust]
future-incompatible = "warn"
```
## Member Cargo.toml
Members inherit from workspace:
```toml
[package]
name = "my-api"
version = "0.1.0"
edition.workspace = true
rust-version.workspace = true
license.workspace = true
[dependencies]
my-core = { path = "../core" } # path dependency to workspace member
serde.workspace = true # inherit version and features
tokio.workspace = true
axum = "0.8" # member-specific dependency
[lints]
workspace = true # inherit lint config
```
## Edition Inheritance in Workspaces
Edition 2024 introduces important workspace-level behaviors:
- **Edition inherits from workspace**: Members using `edition.workspace = true` inherit the workspace edition. All members get edition 2024 semantics (unsafe block requirements, lifetime capture rules, etc.)
- **Mixed editions**: Members can override with a local `edition = "2021"` if needed, but this creates inconsistent behavior across crates — avoid when possible
- **Resolver**: Edition 2024 defaults to resolver `"3"` (MSRV-aware); edition 2021 defaults to `"2"`. Setting it explicitly in the workspace root is good practice for clarity
- **Lint inheritance**: `[workspace.lints.rust]` applies uniformly, but edition 2024 deny-by-default lints (like `unsafe_op_in_unsafe_fn`) activate per-member based on that member's edition
```toml
# Root Cargo.toml — all members inherit edition 2024
[workspace.package]
edition = "2024"
rust-version = "1.85"
# Member Cargo.toml — inherits edition 2024 and MSRV
[package]
name = "my-crate"
version = "0.1.0"
edition.workspace = true
rust-version.workspace = true
```
## Shared Dependencies
Define versions once in `[workspace.dependencies]`, reference with `.workspace = true` in members:
```toml
# Root Cargo.toml
[workspace.dependencies]
serde = { version = "1", features = ["derive"] }
# Member Cargo.toml
[dependencies]
serde.workspace = true
```
To add features for a specific member:
```toml
[dependencies]
tokio = { workspace = true, features = ["test-util"] }
```
## Path Dependencies
Members reference each other with path dependencies:
```toml
[dependencies]
core = { path = "../core" }
```
These are resolved at build time. Cargo ensures all workspace members use compatible versions.
## Feature Flags Across Workspace
Define features in individual crates and propagate through path dependencies:
```toml
# crates/core/Cargo.toml
[features]
default = []
postgres = ["dep:sqlx"]
metrics = ["dep:prometheus"]
# crates/api/Cargo.toml
[dependencies]
core = { path = "../core", features = ["postgres", "metrics"] }
```
## Running Commands
```shell
# Run across all members
cargo check --workspace
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
# Run for a specific member
cargo test -p my-api
cargo run -p my-cli
# Build specific binary
cargo build --release -p my-api
```
## Common Patterns
### Shared Types Crate
A `core` or `types` crate containing shared types, error definitions, and traits:
```text
crates/core/src/
lib.rs # re-exports
error.rs # shared error types
types.rs # domain types
traits.rs # shared trait definitions
```
### Binary + Library Split
Separate the binary entry point from logic for testability:
```text
crates/api/src/
main.rs # entry point, minimal
lib.rs # all logic, imported by main.rs and tests
```
### Internal Crates
Mark crates as internal (not published) by omitting `version` or adding `publish = false`:
```toml
[package]
name = "internal-utils"
publish = false
```
Development guidance for writing idiomatic Rust. Use when: (1) writing new Rust functions or modules, (2) choosing between borrowing, cloning, or ownership p...
---
name: rust-best-practices
description: >
Development guidance for writing idiomatic Rust. Use when:
(1) writing new Rust functions or modules,
(2) choosing between borrowing, cloning, or ownership patterns,
(3) implementing error handling with Result types,
(4) optimizing Rust code for performance,
(5) configuring clippy and linting for a project,
(6) deciding between static and dynamic dispatch,
(7) writing documentation or doc tests.
---
# Rust Best Practices
Guidance for writing idiomatic, performant, and safe Rust code. This is a development skill, not a review skill -- use it when building, not reviewing.
## Quick Reference
| Topic | Key Rule | Reference |
|-------|----------|-----------|
| Ownership | Borrow by default, clone only when you need a separate owned copy | [references/coding-idioms.md](references/coding-idioms.md) |
| Clippy | Run `cargo clippy -- -D warnings` on every commit; configure workspace lints | [references/clippy-config.md](references/clippy-config.md) |
| Performance | Don't guess, measure. Profile with `--release` first | [references/performance.md](references/performance.md) |
| Generics | Static dispatch by default, dynamic dispatch when you need mixed types | [references/generics-dispatch.md](references/generics-dispatch.md) |
| Type State | Encode state in the type system when invalid operations should be compile errors | [references/type-state-pattern.md](references/type-state-pattern.md) |
| Documentation | `//` for why, `///` for what and how, `//!` for module/crate purpose | [references/documentation.md](references/documentation.md) |
| Pointers | Choose pointer types based on ownership needs and threading model | [references/pointer-types.md](references/pointer-types.md) |
| API Design | Unsurprising, flexible, obvious, constrained -- encode invariants in types | [references/api-design.md](references/api-design.md) |
| Ecosystem | Evaluate crates, pick error handling strategy, stay current | [references/ecosystem-patterns.md](references/ecosystem-patterns.md) |
## Gates
Short **sequences with pass conditions** before claiming outcomes that need evidence (not an internal “I checked”).
### Clippy clean
1. From the workspace root (or with `-p <crate>`), run: `cargo clippy --all-targets --all-features -- -D warnings`.
2. **Pass:** exit status is `0` and the invocation finishes without Clippy-deny failures.
### Performance claim
1. Build with `cargo build --release` (or your benchmark harness) under the same profile you ship or measure.
2. Capture a **before** and **after** number from the same tool and metric (name both), e.g. Criterion `ns/iter`, `heaptrack` allocations, or a flamegraph path on disk.
3. **Pass:** you can cite both measurements, **or** you explicitly state that only correctness or readability changed and you are **not** claiming a performance delta.
### Docs for symbols you changed
1. Run `cargo doc --no-deps` for the crate you edited (add `-p <crate>` in workspaces).
2. **Pass:** the doc build succeeds; if `#![deny(missing_docs)]` (or crate policy) applies, there are no new missing-doc errors for those symbols.
## Coding Idioms
Prefer `&T` over `.clone()`, use `&str`/`&[T]` in parameters, and chain iterators instead of index-based loops. For Option/Result, use `let Ok(x) = expr else { return }` for early returns and `?` for propagation. See [references/coding-idioms.md](references/coding-idioms.md) for ownership, iterator, and import patterns.
## Error Handling
Return `Result<T, E>` for fallible operations. Use `thiserror` for library error types, `anyhow` for binaries. Propagate with `?`, never `unwrap()` outside tests. See [references/coding-idioms.md](references/coding-idioms.md) for Option/Result patterns.
## Clippy Discipline
Run `cargo clippy --all-targets --all-features -- -D warnings` on every commit. Configure workspace lints in `Cargo.toml` and use `#[expect(clippy::lint)]` (not `#[allow]`) as the standard for lint suppression -- it warns when the suppression becomes stale. See [references/clippy-config.md](references/clippy-config.md) for lint configuration and key lints.
## Performance Mindset
Always benchmark with `--release`, profile before optimizing, and avoid cloning in loops or premature `.collect()` calls. Keep small types on the stack and heap-allocate only recursive structures and large buffers. See [references/performance.md](references/performance.md) for profiling tools and allocation guidance.
## Generics and Dispatch
Use static dispatch (`impl Trait` / `<T: Trait>`) by default for zero-cost monomorphization. Switch to `dyn Trait` only for heterogeneous collections or plugin architectures, preferring `&dyn Trait` over `Box<dyn Trait>` when ownership isn't needed. In edition 2024, `-> impl Trait` captures all in-scope lifetimes by default -- use `+ use<'a, T>` for precise capture control. Prefer native `async fn` in traits over the `async-trait` crate for static dispatch. See [references/generics-dispatch.md](references/generics-dispatch.md) for dispatch trade-offs, RPIT capture rules, and async trait guidance.
## Type State Pattern
Encode valid states in the type system so invalid operations become compile errors. Use for builders with required fields, protocol state machines, and workflow pipelines. See [references/type-state-pattern.md](references/type-state-pattern.md) for implementation patterns and when to avoid.
## Documentation
Use `//` for why, `///` for what/how on public APIs, and `//!` for module purpose. Every `TODO` needs a linked issue and library crates should enable `#![deny(missing_docs)]`. Use `#[diagnostic::on_unimplemented]` to provide custom compiler errors for your public traits. See [references/documentation.md](references/documentation.md) for doc test patterns, comment conventions, and diagnostic attributes.
## API Design
Follow four principles: unsurprising (reuse standard names and traits), flexible (use generics and `impl Trait` to avoid unnecessary restrictions), obvious (encode invariants in the type system so misuse is a compile error), and constrained (expose only what you can commit to long-term). Use `#[non_exhaustive]` for types that may grow, seal traits you need to extend without breaking changes, and wrap foreign types in newtypes to control your SemVer surface. See [references/api-design.md](references/api-design.md) for builder patterns, sealed traits, and SemVer implications.
## Ecosystem Patterns
Evaluate crates by recent download trends, maintenance activity, documentation quality, and transitive dependency weight. Use `thiserror` for library error types, `anyhow` for binaries, and `eyre` when you need custom error reporters. Prefer vendoring or writing code yourself when a crate pulls heavy dependencies for a small feature. Run `cargo-deny` for license and vulnerability auditing and `cargo-udeps` to trim unused dependencies. See [references/ecosystem-patterns.md](references/ecosystem-patterns.md) for crate evaluation criteria, edition migration, and essential tooling.
## Pointer Types
Choose pointer types based on ownership and threading: `Box<T>` for single-owner heap allocation, `Rc<T>`/`Arc<T>` for shared ownership, `Cell`/`RefCell`/`Mutex`/`RwLock` for interior mutability. Use `LazyLock`/`LazyCell` (stable since 1.80) instead of `lazy_static` or `once_cell`. See [references/pointer-types.md](references/pointer-types.md) for the full single-thread vs multi-thread decision table and migration guidance.
FILE:references/api-design.md
# API Design
## Four Principles
Every Rust interface should be **unsurprising** (follow naming conventions and standard trait expectations), **flexible** (avoid unnecessary restrictions on callers), **obvious** (use types and docs to prevent misuse), and **constrained** (expose only what you intend to support long-term).
## Naming Conventions
Follow the Rust API Guidelines. Reuse well-known names so users can rely on intuition:
- `iter` takes `&self` and returns an iterator
- `into_inner` takes `self` and returns a wrapped value
- `SomethingError` implements `std::error::Error`
Avoid using familiar names for unfamiliar behavior -- if `iter` takes `self`, users will write bugs.
## Standard Trait Implementations
Eagerly implement standard traits even if you don't need them yet. Users cannot implement foreign traits on your types due to coherence rules.
**Priority order:**
1. `Debug` -- nearly every type should have it
2. `Send`, `Sync` -- document if intentionally missing
3. `Clone`, `Default` -- expected for most types
4. `PartialEq`, `Hash` -- needed for collections and assertions
5. `Serialize`/`Deserialize` -- behind a `serde` feature flag
Avoid deriving `Copy` by default. It changes move semantics, and removing it later is a breaking change.
## Making Invalid States Unrepresentable
Use the type system to prevent misuse at compile time rather than panicking at runtime.
```rust
// BAD -- runtime check, caller can pass wrong combination
fn launch(rocket: &mut Rocket, is_fueled: bool, is_on_ground: bool) {
assert!(is_fueled && is_on_ground);
}
// GOOD -- invalid calls are compile errors
struct Grounded;
struct Launched;
struct Rocket<Stage = Grounded> {
stage: std::marker::PhantomData<Stage>,
}
impl Rocket<Grounded> {
fn launch(self) -> Rocket<Launched> { /* ... */ }
}
```
Combine related booleans into enums. If a pointer is only valid when a flag is true, use an `Option` or a single enum with the data inside the relevant variant.
## Builder Pattern
Use when constructing types with many optional fields or when required fields must be validated before use.
```rust
pub struct ServerConfig { /* private fields */ }
pub struct ServerConfigBuilder {
host: String,
port: Option<u16>,
tls: Option<TlsConfig>,
}
impl ServerConfigBuilder {
pub fn new(host: impl Into<String>) -> Self {
Self { host: host.into(), port: None, tls: None }
}
pub fn port(mut self, port: u16) -> Self {
self.port = Some(port);
self
}
pub fn build(self) -> Result<ServerConfig, ConfigError> {
// validate and construct
}
}
```
For builders with required stages, combine with type-state pattern (see [type-state-pattern.md](type-state-pattern.md)).
## `impl Trait` in Argument vs Return Position
**Argument position** (`fn foo(x: impl Read)`) -- syntactic sugar for generics. Caller chooses the concrete type. Monomorphized, so each type gets its own copy.
**Return position** (`fn foo() -> impl Read`) -- caller does not choose the type. Useful for hiding complex return types (iterators, closures, futures). In edition 2024, captures all in-scope lifetimes by default.
```rust
// Argument position: caller picks the type
fn process(input: impl AsRef<str>) { /* ... */ }
// Return position: hides the concrete iterator type
fn even_squares(nums: &[i32]) -> impl Iterator<Item = i32> + '_ {
nums.iter().filter(|n| *n % 2 == 0).map(|n| n * n)
}
```
Prefer generics over `impl Trait` in arguments when the same type parameter is referenced multiple times or when the caller needs turbofish syntax.
## Sealed Traits
Prevent external crates from implementing your trait while still allowing them to use it. Useful for derived/blanket traits and restricting type parameters.
```rust
pub trait Stage: sealed::Sealed { /* ... */ }
mod sealed {
pub trait Sealed {}
impl Sealed for super::Grounded {}
impl Sealed for super::Launched {}
}
```
Document that the trait is sealed so users don't waste time trying to implement it.
## Newtype Pattern
Wrap a type to give it distinct semantics or work around the orphan rule.
```rust
// Semantic distinctness
struct Meters(f64);
struct Seconds(f64);
// Orphan rule workaround -- implement foreign trait for foreign type
struct PrettyVec<T>(Vec<T>);
impl<T: Display> Display for PrettyVec<T> { /* ... */ }
```
Use `Deref` to forward method calls to the inner type when the wrapper is transparent.
## Non-Exhaustive for Forward Compatibility
Prevent downstream code from constructing your types directly or matching exhaustively, so you can add fields/variants without a breaking change.
```rust
#[non_exhaustive]
pub enum Error {
NotFound,
PermissionDenied,
}
#[non_exhaustive]
pub struct Config {
pub timeout_ms: u64,
}
```
Use when the type is likely to gain new variants or fields. Avoid on stable types where exhaustive matching is valuable to callers.
## Blanket Implementations
Provide blanket impls for references when your trait has only `&self` methods:
```rust
impl<T: MyTrait> MyTrait for &T { /* forward calls */ }
impl<T: MyTrait> MyTrait for Box<T> { /* forward calls */ }
```
This lets `fn foo(x: impl MyTrait)` accept both owned values and references without surprises. Adding a blanket impl later is a breaking change due to coherence, so plan early.
## SemVer Implications
**Breaking changes** (require major version bump):
- Removing or renaming public items
- Adding fields to a non-`#[non_exhaustive]` struct
- Removing a trait implementation
- Adding a blanket trait implementation
- Making a trait no longer object-safe
- Changing auto-trait implementations (Send, Sync)
- Bumping a major version of a re-exported dependency
**Non-breaking changes:**
- Adding new public items
- Adding a trait method with a default implementation
- Implementing a trait for a new type
- Relaxing trait bounds on existing functions
Use `#[non_exhaustive]` on types you expect to evolve, seal traits you need to extend, and wrap re-exported foreign types in newtypes.
FILE:references/clippy-config.md
# Clippy Configuration
## Daily Workflow Command
Run on every commit or PR:
```shell
cargo clippy --all-targets --all-features --locked -- -D warnings
```
- `--all-targets`: checks library, tests, benches, examples
- `--all-features`: enables all feature flags to catch conditional code
- `--locked`: requires Cargo.lock to be up-to-date (omit for library crates that don't commit `Cargo.lock`)
- `-D warnings`: treats warnings as errors
Add to your Makefile, Justfile, xtask, or CI pipeline.
## Workspace Lint Configuration
Configure in `Cargo.toml` for consistent enforcement:
```toml
[workspace.lints.rust]
future-incompatible = "warn"
nonstandard_style = "deny"
[workspace.lints.clippy]
all = { level = "deny", priority = 10 }
redundant_clone = { level = "deny", priority = 9 }
manual_while_let_some = { level = "deny", priority = 4 }
pedantic = { level = "warn", priority = 3 }
```
Individual crates inherit workspace lints:
```toml
[lints]
workspace = true
```
Higher priority numbers win when lints conflict.
## Key Lints to Enforce
| Lint | Why | Category |
|------|-----|----------|
| `redundant_clone` | Detects unnecessary clones with performance impact | perf |
| `large_enum_variant` | Warns about oversized variants -- consider Boxing | perf |
| `needless_collect` | Prevents unnecessary intermediate collection allocation | nursery |
| `needless_borrow` | Removes redundant `&` borrowing | style |
| `clone_on_copy` | Catches `.clone()` on Copy types like `u32` | complexity |
| `unnecessary_wraps` | Function always returns Some/Ok -- drop the wrapper | pedantic |
| `manual_ok_or` | Suggests `.ok_or_else()` over match | style |
Run the perf lint group specifically:
```shell
cargo clippy -- -D clippy::perf
```
## Lint Suppression
### Use `#[expect]` as the Default
`#[expect]` (stable since Rust 1.81) is the standard for lint suppression. It warns when the lint no longer triggers, preventing stale suppressions from accumulating. Always use `#[expect]` instead of `#[allow]`:
```rust
// BAD - stale allow stays forever unnoticed
#[allow(clippy::large_enum_variant)]
enum Message { /* ... */ }
// GOOD - compiler warns when lint is no longer needed
#[expect(clippy::large_enum_variant)]
enum Message { /* ... */ }
```
When migrating an existing codebase, replace all `#[allow(lint)]` with `#[expect(lint)]`. The compiler will immediately flag any suppressions that are no longer needed, letting you clean up dead lint suppression.
For crate-level suppression where `#![expect(...)]` is not practical (e.g., generated code), `#![allow(...)]` remains acceptable with a comment explaining why.
### Always Add Justification
```rust
// Intentionally large for cache-line alignment
#[expect(clippy::large_enum_variant, reason = "cache-line alignment")]
enum Packet {
Header(u8),
Payload([u8; 1024]),
}
```
The `reason` parameter (stable since 1.81) documents intent inline and shows up in compiler output when the expect becomes unfulfilled.
### Handling False Positives
1. Try refactoring to satisfy the lint
2. If the code is genuinely correct, suppress **locally** with `#[expect]` and a reason
3. Avoid global suppression unless it is a known framework issue (e.g., Bevy engine patterns)
## CI Integration
Add clippy to your CI pipeline:
```yaml
# GitHub Actions (application/binary crates that commit Cargo.lock)
- name: Clippy
run: cargo clippy --all-targets --all-features --locked -- -D warnings
# Library crates (Cargo.lock not committed — omit --locked)
- name: Clippy
run: cargo clippy --all-targets --all-features -- -D warnings
```
Consider adding pedantic and nursery for stricter checks:
```shell
cargo clippy -- -W clippy::pedantic -W clippy::nursery
```
## Optional Stricter Lints
| Group | When to Use |
|-------|-------------|
| `pedantic` | Strict style checks, occasional false positives |
| `nursery` | New lints under development, may be noisy |
| `perf` | Performance-focused checks (always recommended) |
| `restriction` | Very strict, use selectively (e.g., `clippy::unwrap_used`) |
FILE:references/coding-idioms.md
# Coding Idioms
## Borrowing Over Cloning
Default to borrowing (`&T`). Clone only when you genuinely need a separate owned copy.
### When Clone is Appropriate
- Shared ownership via `Arc::clone(&arc)` (cheap atomic increment)
- Immutable snapshots where the original must be preserved
- When the API requires owned data and the caller still needs the original
- Caching results returned multiple times
### Clone Traps to Avoid
```rust
// BAD - cloning to avoid lifetime annotations
fn process(thing: &Thing) {
let owned = thing.clone(); // if you need ownership, take it in the signature
consume(owned);
}
// GOOD - take ownership explicitly
fn process(thing: Thing) {
consume(thing);
}
```
```rust
// BAD - cloning inside iterator
items.iter().map(|x| x.clone()).collect::<Vec<_>>();
// GOOD - use .cloned() or .copied()
items.iter().cloned().collect::<Vec<_>>();
items.iter().copied().collect::<Vec<_>>(); // for Copy types
```
### Prefer Borrowed Parameters
```rust
// DO - borrow when you only read
fn greet(name: &str) {
println!("Hello, {name}");
}
// DO - use slices over owned collections
fn sum(values: &[i32]) -> i32 {
values.iter().sum()
}
// DON'T - force callers to allocate
fn greet(name: String) {
println!("Hello, {name}");
}
```
For maximum flexibility in public APIs, use `impl AsRef<str>` or `impl Into<String>`.
## Copy Trait
### When to Derive Copy
- All fields are `Copy` themselves
- Struct is small (<=24 bytes / 2-3 machine words)
- Type is plain data without heap allocations
```rust
// GOOD - small plain data
#[derive(Debug, Copy, Clone)]
struct Point { x: f32, y: f32, z: f32 }
// GOOD - tag-like enum
#[derive(Debug, Copy, Clone)]
enum Direction { North, South, East, West }
// CAN'T - String is not Copy
struct User { age: i32, name: String }
```
Enum size equals the largest variant. Keep variants small or Box large payloads.
## Option and Result Handling
### Pattern Selection
| Pattern | Use When |
|---------|----------|
| `let Ok(x) = expr else { return ... }` | Early return, divergent code doesn't need error value |
| `match` | Pattern matching inner variants, transforming Result/Option shapes |
| `if let ... else` | Else branch needs computation with the value |
| `?` operator | Propagating errors to the caller |
```rust
// Early return with let-else
let Some(config) = load_config() else {
return Err(AppError::MissingConfig);
};
// Pattern matching inner variants
match result {
Ok(Direction::North) => handle_north(),
Ok(other) => handle_other(other),
Err(e) => handle_error(e),
}
// Propagation with ?
fn process(req: &Request) -> Result<Response, Error> {
let body = validate(req)?;
let user = authorize(&body)?;
Ok(handle(user)?)
}
```
### Avoid These
```rust
// BAD - unwrap in production
let port = config.port.unwrap();
// BAD - match that should be .ok() or .ok_or()
match result {
Ok(v) => Some(v),
Err(_) => None,
}
// GOOD
result.ok()
```
## Prevent Early Allocation
Use `_or_else` variants when the fallback involves allocation:
```rust
// BAD - format! runs even when x is Some
x.ok_or(ParseError::Detail(format!("missing {name}")));
// GOOD - only allocates on the error path
x.ok_or_else(|| ParseError::Detail(format!("missing {name}")));
// BAD - Vec::new() always allocates
values.unwrap_or(Vec::new());
// GOOD - uses Default trait
values.unwrap_or_default();
```
## Iterator Patterns
### Prefer Iterator Chains For
- Collection transforms: `.filter().map().collect()`
- Combining: `.enumerate()`, `.chain()`, `.zip()`
- Windowing: `.windows()`, `.chunks()`
### Prefer For Loops For
- Early exits: `break`, `continue`, `return`
- Side effects: logging, I/O
- When readability matters more than chaining
### Anti-Patterns
```rust
// BAD - premature collect
let doubled: Vec<_> = items.iter().map(|x| x * 2).collect();
process(doubled.into_iter());
// GOOD - pass iterator directly
process(items.iter().map(|x| x * 2));
// BAD - .fold() for summing
items.iter().fold(0, |acc, x| acc + x);
// GOOD - .sum() is optimized
let total: i32 = items.iter().sum();
```
Iterators are lazy. Nothing happens until you consume them with `.collect()`, `.sum()`, `.for_each()`, etc.
### Error Mapping
Use `inspect_err` for logging and `map_err` for transforming:
```rust
result
.inspect_err(|e| tracing::error!("operation failed: {e}"))
.map_err(|e| AppError::from(e))?;
```
## Edition 2024 Awareness
### Reserved Keyword: `gen`
Rust 2024 reserves `gen` as a keyword (for future generator support). Code using `gen` as an identifier must use the raw identifier escape:
```rust
// BAD (edition 2024) -- gen is a reserved keyword
let gen = 42;
fn gen() {}
// GOOD -- use raw identifier
let r#gen = 42;
fn r#gen() {}
// BETTER -- just rename to avoid confusion
let generation = 42;
fn generate() {}
```
This affects variable names, function names, module names, and any other identifiers. Prefer renaming over `r#gen` for readability.
### Never Type Fallback
In edition 2024, the `!` (never) type falls back to `!` instead of `()`. This affects diverging expressions in type inference:
```rust
// This compiles in edition 2021 (! falls back to ())
// May behave differently in edition 2024
let value = if condition {
42
} else {
panic!("unreachable")
// In 2021: inferred as () fallback
// In 2024: inferred as ! (never type)
};
```
In practice, most well-typed code is unaffected. Watch for cases where diverging branches interact with trait resolution or match exhaustiveness.
### `if let` Temporary Scope
Temporaries in `if let` conditions are now dropped at the end of the `if let` expression (not the enclosing block). This can affect code holding locks or references through temporaries:
```rust
// Edition 2024: temporary from get_mutex().lock() dropped earlier
if let Some(val) = get_mutex().lock().unwrap().get("key") {
// In edition 2024, the MutexGuard may already be dropped here
// if the temporary is not bound to a variable
}
// GOOD -- bind the guard explicitly
let guard = get_mutex().lock().unwrap();
if let Some(val) = guard.get("key") {
// guard lives until end of scope
}
```
## Import Ordering
Standard order: `std` -> external crates -> workspace crates -> `super::`/`crate::`
```rust
// std
use std::sync::Arc;
// external crates
use chrono::Utc;
use serde::{Deserialize, Serialize};
// workspace crates
use shared_types::Config;
// crate/super
use super::schema::Context;
use crate::models::Event;
```
Configure `rustfmt.toml` for automatic enforcement:
```toml
reorder_imports = true
imports_granularity = "Crate"
group_imports = "StdExternalCrate"
```
FILE:references/documentation.md
# Documentation
## Comments vs Doc Comments
| Purpose | `// comment` | `/// doc` / `//! crate doc` |
|---------|-------------|---------------------------|
| Describe why | Yes -- explains reasoning | No |
| Describe API | No | Yes -- public interfaces, usage |
| Maintainable | Gets stale, not compiled | Tied to code, appears in `cargo doc` |
| Testable | No | Yes -- doc examples run with `cargo test` |
| Visibility | Local to source | Exported to users and tools |
## When to Use `//` Comments
Use when something can't be expressed clearly in code:
```rust
// SAFETY: ptr is guaranteed non-null and aligned by caller contract
unsafe { std::ptr::copy_nonoverlapping(src, dst, len); }
// PERF: Caching root cert store avoids repeated OS calls on macOS
// See ADR-12: TLS startup latency
let tls_store = cached_root_store();
```
### Good Comments
- Safety invariants: `// SAFETY: ...`
- Performance reasoning: `// PERF: ...`
- Design context: `// CONTEXT: See ADR-42 for rationale`
- Workarounds: `// WORKAROUND: upstream bug #123`
### Bad Comments
```rust
// BAD - restates the obvious
counter += 1; // increment counter by 1
// BAD - wall of text that should be a doc comment or ADR
// This function was originally written in 2023 for the legacy API...
// [20 more lines of history]
```
## Replace Comments with Code
If you're writing a long comment explaining "what" or "how", refactor instead:
```rust
// DON'T
fn process_request(req: Request) -> Result<(), Error> {
// validate headers, then decode body, then authorize, then dispatch
// ...100 lines...
}
// DO
fn process_request(req: Request) -> Result<(), Error> {
validate_headers(&req)?;
let body = decode_body(&req)?;
authorize(&body)?;
dispatch(body)
}
```
Structure and naming replace commentary. Tests serve as living documentation.
## TODO Discipline
Don't leave orphan TODOs. Link to a tracked issue:
```rust
// TODO(#42): Remove workaround after upstream fix lands
```
## When to Use `///` Doc Comments
Document all public items: functions, structs, traits, enums, constants.
```rust
/// Loads a user profile from disk.
///
/// # Errors
///
/// Returns [`AppError::NotFound`] if the file is missing.
/// Returns [`AppError::InvalidFormat`] if the content is not valid JSON.
///
/// # Examples
///
/// ```rust
/// # use my_crate::load_user;
/// let user = load_user("profiles/alice.json")?;
/// assert_eq!(user.name, "Alice");
/// # Ok::<(), my_crate::AppError>(())
/// ```
pub fn load_user(path: &str) -> Result<User, AppError> { /* ... */ }
```
### Required Sections
- **Purpose** -- what the item does (first line)
- **`# Examples`** -- runnable code showing usage
- **`# Errors`** -- when it returns `Err` (for Result-returning functions)
- **`# Panics`** -- when it can panic (if applicable)
- **`# Safety`** -- invariants for `unsafe` functions
### Doc Test Tips
- Hide setup lines with `#` prefix
- Examples run with `cargo test` (but NOT `cargo nextest` -- run `cargo test --doc` separately)
- Use `compile_fail` attribute for wrong-usage examples
- Use `no_run` for side-effect examples (network, file I/O)
## Module-Level Docs with `//!`
Place at the top of `lib.rs` or `mod.rs`:
```rust
//! HTTP client with retry and circuit-breaker support.
//!
//! This module provides a resilient HTTP client that wraps `reqwest`
//! with automatic retries and circuit-breaker patterns.
//!
//! # Examples
//!
//! ```rust
//! let client = http::Client::builder().retries(3).build();
//! let response = client.get("https://api.example.com").await?;
//! ```
```
## Custom Error Messages with `#[diagnostic::on_unimplemented]`
Since Rust 1.78, you can provide custom compiler error messages when a trait is not implemented. This dramatically improves developer experience for library traits:
```rust
#[diagnostic::on_unimplemented(
message = "`{Self}` is not a valid handler function",
label = "this type does not implement `Handler`",
note = "Handler functions must accept a `Request` and return `impl IntoResponse`"
)]
trait Handler {
fn call(&self, req: Request) -> Response;
}
```
When someone tries to use a type that doesn't implement `Handler`, they see your custom message instead of the generic "trait bound not satisfied" error.
Use this for:
- Public library traits where users frequently hit confusing errors
- Trait bounds with non-obvious requirements (e.g., axum handlers, tower services)
- Domain-specific traits where the fix is not obvious from the trait name
## Documentation Lints
Enable for library crates:
```rust
// In lib.rs
#![deny(missing_docs)]
```
| Lint | Purpose |
|------|---------|
| `missing_docs` | Warn on undocumented public items |
| `broken_intra_doc_links` | Detect broken `[links]` in doc comments |
| `missing_panics_doc` | Require `# Panics` section when function can panic |
| `missing_errors_doc` | Require `# Errors` for Result-returning functions |
| `missing_safety_doc` | Require `# Safety` for unsafe public functions |
| `empty_docs` (clippy) | Prevent empty doc comments that bypass `missing_docs` |
Run `cargo doc --open` to preview your documentation output.
## Checklist
- [ ] All public items have `///` doc comments
- [ ] `//!` at top of lib.rs/mod.rs explaining purpose
- [ ] `# Examples` with runnable code on key functions
- [ ] `# Errors` and `# Panics` sections where applicable
- [ ] `// SAFETY:` comments on all `unsafe` blocks
- [ ] No stale comments that describe old behavior
- [ ] Every `TODO` links to a tracked issue
- [ ] `#![deny(missing_docs)]` enabled for library crates
FILE:references/ecosystem-patterns.md
# Ecosystem Patterns
## Evaluating Crates
Before adding a dependency, assess it across these dimensions:
| Signal | What to Check |
|--------|---------------|
| Downloads | crates.io recent download trend, not just total |
| Maintenance | Commit recency, open issue response time, release cadence |
| Documentation | Docs.rs quality, examples, module-level guides |
| Dependencies | `cargo tree -i <crate>` -- how much does it pull in? |
| Compile time | Test with `cargo build --timings` before committing |
| Soundness | Check for `unsafe` usage, look for past CVEs |
Prefer crates that are narrowly scoped, have few transitive dependencies, and gate optional features behind Cargo features.
## Error Handling: anyhow vs thiserror vs eyre
| Crate | Use When | Returns |
|-------|----------|---------|
| `thiserror` | Library code with structured error types | `enum MyError { ... }` |
| `anyhow` | Application/binary code, rapid prototyping | `anyhow::Result<T>` |
| `eyre` | Application code needing custom reporters | `eyre::Result<T>` |
**Decision framework:**
- Writing a library others depend on? Use `thiserror` so callers can match on variants.
- Writing a binary or CLI? Use `anyhow` for ergonomic context chaining.
- Need custom error formatting (color, structured logs)? Use `eyre` with a custom handler.
- Never use `anyhow` in library public APIs -- it erases error types that downstream callers need.
```rust
// Library: structured errors with thiserror
#[derive(Debug, thiserror::Error)]
pub enum StorageError {
#[error("key not found: {key}")]
NotFound { key: String },
#[error("connection failed")]
Connection(#[from] std::io::Error),
}
// Binary: contextual errors with anyhow
use anyhow::{Context, Result};
fn main() -> Result<()> {
let config = load_config()
.context("failed to load configuration")?;
run(config)
}
```
## Common Design Patterns
### Newtype
Wrap a primitive to add meaning and prevent mixing up arguments of the same underlying type.
```rust
struct UserId(u64);
struct OrderId(u64);
// Can't accidentally pass UserId where OrderId is expected
```
Also used to work around the orphan rule when implementing foreign traits on foreign types.
### Type State
Encode workflow stages in generic parameters so invalid transitions are compile errors. See [type-state-pattern.md](type-state-pattern.md).
### Sealed Traits
Prevent external implementations while keeping the trait usable. See [api-design.md](api-design.md) for the pattern.
### Builder
Construct complex types step-by-step with validation at build time. Use for types with more than 3-4 optional fields. See [api-design.md](api-design.md).
## Orphan Rule Strategies
You cannot implement a foreign trait for a foreign type. Workarounds:
1. **Newtype wrapper** -- wrap the foreign type and implement the trait on the wrapper
2. **Extension trait** -- define a new trait with the methods you need, blanket-implement it
3. **Upstream PR** -- contribute the implementation to the crate that owns the type or trait
```rust
// Extension trait pattern
trait IteratorExt: Iterator {
fn sum_by<F, S>(self, f: F) -> S
where
F: FnMut(Self::Item) -> S,
S: std::iter::Sum;
}
impl<I: Iterator> IteratorExt for I {
fn sum_by<F, S>(self, f: F) -> S
where
F: FnMut(Self::Item) -> S,
S: std::iter::Sum,
{
self.map(f).sum()
}
}
```
## When to Vendor vs Depend
**Prefer a dependency when:**
- The crate is well-maintained and narrowly scoped
- You'd have to replicate significant correctness-sensitive logic
- Security-sensitive code (crypto, parsing) -- defer to audited crates
**Prefer vendoring or writing it yourself when:**
- The crate pulls in heavy transitive dependencies for a small feature
- You need only a tiny fraction of the crate's functionality
- The crate is unmaintained or has no recent releases
- Build times are a critical constraint
Use `cargo-udeps` to detect unused dependencies and `cargo-deny` to audit licenses and vulnerabilities.
## Edition Migration
Rust editions (2015, 2018, 2021, 2024) are opt-in per crate via `Cargo.toml`. Different crates in a dependency tree can use different editions and interoperate.
**Migration process:**
1. Run `cargo fix --edition` to apply automated fixes
2. Update `edition = "2024"` in `Cargo.toml`
3. Run `cargo clippy` and fix new warnings
4. Review edition-specific changes (see [coding-idioms.md](coding-idioms.md) for 2024 specifics)
Avoid skipping editions -- migrate incrementally. The automated tooling handles most changes, but review the edition guide for semantic differences.
## Essential Tools
| Tool | Purpose |
|------|---------|
| `cargo-deny` | Lint dependency graph for licenses, vulnerabilities, duplicates |
| `cargo-udeps` | Find unused dependencies |
| `cargo-outdated` | Detect available updates including major version bumps |
| `cargo-expand` | Inspect macro expansion output |
| `cargo-hack` | Test all feature combinations |
| `cargo-llvm-lines` | Find monomorphization-heavy code increasing compile time |
## Staying Current
- **This Week in Rust** (this-week-in-rust.org) -- weekly ecosystem digest
- **Rust Blog** (blog.rust-lang.org) -- release announcements and feature highlights
- **Rust RFCs** (github.com/rust-lang/rfcs) -- upcoming language changes
- **Clippy** -- enable it always; it surfaces new language features and idioms
- **Edition Guide** (doc.rust-lang.org/edition-guide) -- what changed per edition
- **caniuse.rs** -- look up when a specific feature landed on stable
FILE:references/generics-dispatch.md
# Generics and Dispatch
## Static Dispatch: `impl Trait` / `<T: Trait>`
Generics are monomorphized at compile time -- the compiler generates specialized code for each concrete type. Zero runtime cost.
### Use When
- Performance-critical code (tight loops, hot paths)
- Types are known at compile time
- You want inlining and optimization
```rust
// Generic function -- compiler generates specialized versions
fn process<T: Display>(item: T) {
println!("{item}");
}
// Equivalent modern syntax
fn process(item: impl Display) {
println!("{item}");
}
```
## Dynamic Dispatch: `dyn Trait`
Uses a vtable (virtual function table) for runtime polymorphism. Incurs indirection cost per call.
### Use When
- Heterogeneous collections (different types in one `Vec`)
- Plugin architectures with runtime-loaded components
- Abstracting internals behind a stable public interface
- Binary size matters more than call performance
```rust
// Different types in one collection
fn greet_all(animals: &[Box<dyn Animal>]) {
for animal in animals {
println!("{}", animal.greet());
}
}
```
## Trade-Off Table
| Aspect | Static (`impl Trait`) | Dynamic (`dyn Trait`) |
|--------|----------------------|----------------------|
| Performance | Faster, inlined | Slower, vtable indirection |
| Compile time | Slower (monomorphization) | Faster (shared code) |
| Binary size | Larger (per-type codegen) | Smaller |
| Flexibility | One type at a time | Mix types in collections |
| Error messages | Clearer | Type erasure obscures errors |
## Decision Guide
1. **Start with generics.** They are the default in Rust.
2. **Switch to `dyn Trait`** when you need runtime polymorphism or heterogeneous collections.
3. If unsure, use generics with trait bounds -- refactor to `dyn` only when flexibility outweighs speed.
## Best Practices for Dynamic Dispatch
### Pointer Choice
```rust
// Prefer &dyn Trait when you don't need ownership
fn log(writer: &dyn Write) { /* ... */ }
// Use Box<dyn Trait> when you need to own the value
fn create_handler() -> Box<dyn Handler> { /* ... */ }
// Use Arc<dyn Trait> for shared access across threads
fn register(service: Arc<dyn Service>) { /* ... */ }
```
### Don't Box Too Early
```rust
// DO - use generics when possible
struct Renderer<B: Backend> {
backend: B,
}
// DON'T - premature boxing reduces performance and flexibility
struct Renderer {
backend: Box<dyn Backend>,
}
```
Box at API boundaries (public return types), not inside structs.
### Object Safety Rules
You can only create `dyn Trait` from object-safe traits:
- No generic methods
- No `Self: Sized` bound
- All methods use `&self`, `&mut self`, or `self`
```rust
// Object-safe -- can use as dyn
trait Runnable {
fn run(&self);
}
// NOT object-safe -- generic method
trait Factory {
fn create<T>(&self) -> T;
}
```
## RPIT Lifetime Capture (Edition 2024)
In Rust 2024, `-> impl Trait` return types capture ALL in-scope generic parameters and lifetimes by default. Previously (edition 2021), hidden types only captured the generic parameters explicitly mentioned in the opaque type.
### What Changed
```rust
// Edition 2021: 'a is NOT captured, return type outlives 'a
fn foo<'a>(x: &'a str) -> impl Display {
x.len() // returns usize, no lifetime dependency
}
// Edition 2024: 'a IS captured by default, return type borrows 'a
fn foo<'a>(x: &'a str) -> impl Display {
x.len() // still returns usize, but type signature now captures 'a
}
```
### Precise Capturing with `+ use<>`
When you need to opt out of capturing a lifetime or generic, use the `+ use<>` syntax:
```rust
// Only capture T, not the lifetime 'a
fn extract<'a, T: Display>(data: &'a [T]) -> impl Display + use<T> {
data.len()
}
// Capture nothing -- equivalent to edition 2021 behavior
fn compute<'a>(x: &'a str) -> impl Display + use<> {
x.len()
}
```
Use `+ use<>` when the return type genuinely does not depend on a lifetime, and callers need the returned value to outlive the borrow.
## Async Functions in Traits
Since Rust 1.75, `async fn` works directly in traits without the `async-trait` crate for many use cases:
```rust
// GOOD (Rust 1.75+) -- native async fn in trait
trait DataStore {
async fn get(&self, key: &str) -> Option<String>;
async fn put(&self, key: &str, value: &str) -> Result<(), StoreError>;
}
// BAD -- unnecessary async-trait dependency
#[async_trait::async_trait]
trait DataStore {
async fn get(&self, key: &str) -> Option<String>;
async fn put(&self, key: &str, value: &str) -> Result<(), StoreError>;
}
```
### When You Still Need `async-trait`
- **`dyn Trait` dispatch** -- native async fn in traits is not yet object-safe. If you need `Box<dyn DataStore>`, you still need `async-trait` or manual boxing.
- **Older MSRV** -- if your minimum supported Rust version is below 1.75.
For most application code with static dispatch, drop `async-trait` and use native syntax.
## Patterns
### Accept Generic, Return Concrete
Public APIs often accept generics for flexibility but return concrete types:
```rust
pub fn parse(input: impl AsRef<str>) -> Result<Config, ParseError> {
let s = input.as_ref();
// ...
}
```
### Trait Bounds on Impl Blocks
Constrain methods to specific trait implementations:
```rust
struct Wrapper<T>(T);
impl<T: Display> Wrapper<T> {
fn show(&self) {
println!("{}", self.0);
}
}
impl<T: Display + Debug> Wrapper<T> {
fn debug_show(&self) {
println!("{:?}", self.0);
}
}
```
### Where Clauses for Readability
```rust
// Hard to read
fn process<T: Clone + Debug + Send + Sync + 'static>(item: T) { /* ... */ }
// Clearer
fn process<T>(item: T)
where
T: Clone + Debug + Send + Sync + 'static,
{
// ...
}
```
## Variance Rules for Generics
Variance determines how subtyping relationships carry through generic types.
| Variance | Rule | Example |
|----------|------|---------|
| Covariant | If `'a: 'b`, then `T<'a>: T<'b>` | `&'a T`, `Vec<T>`, `Box<T>` |
| Contravariant | If `'a: 'b`, then `T<'b>: T<'a>` | `fn(T)` (in argument position) |
| Invariant | No subtyping relationship | `&'a mut T`, `Cell<T>`, `UnsafeCell<T>` |
**Practical impact:** `&'a mut T` is invariant over `T`, meaning you cannot coerce `&mut Vec<&'static str>` to `&mut Vec<&'a str>`. This prevents unsoundness where a shorter-lived reference could be inserted through the mutable alias.
Prefer `&T` (covariant) over `&mut T` (invariant) in generic contexts when mutation is not needed -- it gives callers more flexibility with lifetimes.
## Trait Object Cost Analysis
Dynamic dispatch via `dyn Trait` introduces two costs:
1. **Vtable indirection** -- each method call goes through a pointer lookup instead of a direct call. Prevents inlining and limits compiler optimizations.
2. **Loss of monomorphization** -- the compiler cannot specialize code per concrete type, eliminating opportunities for constant folding and layout optimization.
The overhead per call is typically 1-3 ns (vtable pointer chase). Avoid `dyn Trait` in tight loops or hot paths. Use it freely at architectural boundaries where call frequency is low.
## Object Safety Rules
A trait is object-safe (usable as `dyn Trait`) only when all methods satisfy:
- **No generic type parameters** on methods (generic params on the trait itself are fine)
- **No use of `Self` as a concrete type** in arguments or return position
- **No `Self: Sized` bound** on the trait itself (individual methods can have it)
- **Receivers must be dispatchable:** `&self`, `&mut self`, `self`, `Box<Self>`, `Arc<Self>`, `Pin<&Self>`, etc.
```rust
// NOT object-safe -- generic method prevents vtable construction
trait Parser {
fn parse<T: FromStr>(&self, input: &str) -> T;
}
// Fix: exempt the generic method, keep the trait object-safe
trait Searchable {
fn search(&self, query: &str) -> Vec<String>;
fn search_typed<T>(&self, query: &str) -> Vec<T> where Self: Sized;
// ^^ only callable on concrete types; dyn Searchable still works for search()
}
```
Prefer keeping traits object-safe. Add `where Self: Sized` to convenience methods that break object safety rather than sacrificing the whole trait.
## Complete Dispatch Decision Tree
```text
Do you need different concrete types in the same collection or behind one pointer?
YES -> dyn Trait (dynamic dispatch)
-> Need ownership? Box<dyn Trait>
-> Borrowed only? &dyn Trait
-> Shared across threads? Arc<dyn Trait>
NO -> Do callers need to name the concrete return type?
YES -> Generic <T: Trait> (full monomorphization)
NO -> impl Trait (opaque type, still static dispatch)
```
**`impl Trait` vs `dyn Trait` vs `T: Trait` summary:**
| Feature | `T: Trait` | `impl Trait` | `dyn Trait` |
|---------|-----------|-------------|-------------|
| Dispatch | Static | Static | Dynamic |
| Caller picks type | Yes | Arg: yes, Return: no | No (type-erased) |
| Turbofish (`::<>`) | Yes | No | N/A |
| Multiple types in collection | No | No | Yes |
| Binary size impact | Larger (codegen per type) | Larger | Smaller |
| Use in trait definitions | Yes | Limited | Yes |
## Blanket Implementation Patterns
Blanket impls provide automatic trait implementations for broad categories of types. Use them to reduce boilerplate, but be aware of their downstream impact.
```rust
// Common: forward trait through references
impl<T: MyTrait> MyTrait for &T {
fn method(&self) { (**self).method() }
}
// Common: forward through smart pointers
impl<T: MyTrait + ?Sized> MyTrait for Box<T> {
fn method(&self) { (**self).method() }
}
// Powerful but constraining: implement for all types meeting a bound
impl<T: Display> Loggable for T {
fn log(&self) { println!("{self}"); }
}
```
**Impact on downstream users:** a blanket impl prevents anyone from writing a more specific impl for the same trait. Once `impl<T: Display> Loggable for T` exists, no one can write `impl Loggable for MyType` even if `MyType: Display`. Plan blanket impls carefully -- adding one later is a breaking change due to coherence rules.
FILE:references/performance.md
# Performance
## Golden Rule
> Don't guess, measure.
Rust code is fast by default. Optimize only after finding bottlenecks with real profiling data.
## First Steps
1. **Build with `--release`** -- debug builds lack optimizations. Most "Rust is slow" complaints come from debug builds.
2. **Run `cargo clippy -- -D clippy::perf`** -- catches common performance anti-patterns.
3. **Benchmark before and after** -- use `cargo bench` to verify improvements (>5% = worthwhile).
4. **Profile with flamegraphs** -- `cargo flamegraph` or `samply` (macOS) to find real hotspots.
## Profiling Tools
### cargo bench
Built-in micro-benchmarking. Write scenarios and compare:
```shell
cargo bench
```
### cargo flamegraph
Visualize CPU time per function:
```shell
cargo install flamegraph
cargo flamegraph --release # always profile with --release
```
Reading flamegraphs:
- **Width** = time spent (wider = more CPU time)
- **Y-axis** = stack depth (main at bottom, called functions stacked up)
- **Color** = random (not meaningful)
- **Thick stacks** = heavy CPU usage, investigate these
### samply (macOS alternative)
Better developer experience on macOS:
```shell
cargo install samply
samply record cargo run --release
```
## Avoid Redundant Cloning
Clone at the last possible moment, if at all:
```rust
// BAD - clone in loop
for item in &items {
process(item.clone()); // clone per iteration
}
// GOOD - borrow
for item in &items {
process(item); // just borrow
}
```
### When to Pass Ownership
- API requires owned data
- Sending data to another thread (`Arc::clone` is cheap)
- Operator overloads that consume `self`
- Modeling business logic transitions (`Validate::try_from(raw_input)`)
### When NOT to Pass Ownership
- Function only reads the data: use `&T` or `&[T]`
- Iteration: use `&some_vec` or `.iter()`
- Mutation: use `&mut T`
### Cow for Maybe-Owned Data
```rust
use std::borrow::Cow;
fn normalize(input: Cow<'_, str>) -> Cow<'_, str> {
if input.contains('\t') {
Cow::Owned(input.replace('\t', " "))
} else {
input // no allocation needed
}
}
```
## Stack vs Heap
### Keep on the Stack
- Small types: primitives, `Copy` types, `usize`, `bool`
- Types returned by value that are cheap to copy
### Move to the Heap
- Recursive data structures: `Box<Node>`, `Box<[Node; 8]>`
- Large buffers (>512 bytes)
- Types behind trait objects: `Box<dyn Trait>`
```rust
// BAD - allocates 64KB on stack then moves to heap
let buffer: Box<[u8; 65536]> = Box::new([0u8; 65536]);
// GOOD - allocates directly on heap
let buffer: Box<[u8]> = vec![0u8; 65536].into_boxed_slice();
```
### Be Cautious With
- `#[inline]` -- only use when benchmarks prove benefit. Rust already inlines well.
- Large stack arrays -- consider `smallvec` for arrays that might grow.
- Large stack-allocated arrays (`let buf: [u8; 65536]`) -- they live on the stack and can overflow it. Use `Box<[T; N]>` or `Vec<T>` for large data.
## Iterator Optimization
Iterators compile to tight loops (zero-cost abstractions):
```rust
// GOOD - compiler optimizes this into a single loop
let total: i32 = items.iter()
.filter(|x| x.is_valid())
.map(|x| x.value())
.sum();
```
### `IntoIterator` for `Box<[T]>` (Edition 2024)
Rust 2024 adds `IntoIterator` for `Box<[T]>`, so boxed slices can be iterated directly:
```rust
// Previously required converting to Vec first
let boxed: Box<[i32]> = vec![1, 2, 3].into_boxed_slice();
// BAD (pre-2024) -- convert to Vec to iterate by value
let items: Vec<i32> = boxed.into_vec();
for item in items { /* ... */ }
// GOOD (edition 2024) -- iterate directly
let boxed: Box<[i32]> = vec![1, 2, 3].into_boxed_slice();
for item in boxed { /* ... */ }
```
### Avoid Intermediate Collections
```rust
// BAD - allocates a Vec just to iterate again
let valid: Vec<_> = items.iter().filter(|x| x.is_valid()).collect();
process(valid.into_iter());
// GOOD - pass the iterator (fn process(iter: impl Iterator<Item = &T>))
process(items.iter().filter(|x| x.is_valid()));
```
### Prefer .sum() Over .fold()
`.sum()` is specialized and the compiler can optimize it better:
```rust
// DO
let total: i32 = values.iter().sum();
// DON'T (unless you need a different initial value or accumulator)
let total = values.iter().fold(0, |acc, x| acc + x);
```
### Use Capacity Hints
```rust
// DO - pre-allocate when size is known
let mut results = Vec::with_capacity(items.len());
// DON'T - grow incrementally
let mut results = Vec::new();
```
## String Performance
```rust
// BAD in hot path - format! allocates every call
for item in items {
log(&format!("processing {}", item.id));
}
// GOOD - reuse buffer
let mut buf = String::with_capacity(64);
for item in items {
buf.clear();
use std::fmt::Write;
write!(&mut buf, "processing {}", item.id).unwrap();
log(&buf);
}
```
FILE:references/pointer-types.md
# Pointer Types
## Thread Safety: Send and Sync
Rust tracks pointer safety through two marker traits:
- **`Send`** -- data can move across threads
- **`Sync`** -- data can be referenced from multiple threads simultaneously
A pointer is thread-safe only if the data behind it is.
## Quick Reference
| Type | Description | Send + Sync | Use When |
|------|-------------|-------------|----------|
| `&T` | Shared reference | Yes (if `T: Sync`) | Multiple readers, no mutation |
| `&mut T` | Exclusive mutable reference | Send (if `T: Send`), Sync (if `T: Sync`) | Single writer |
| `Box<T>` | Heap-allocated, single owner | Send (if `T: Send`), Sync (if `T: Sync`) | Recursive types, large data, trait objects |
| `Rc<T>` | Reference counted, single thread | Neither | Multiple owners, same thread |
| `Arc<T>` | Atomic reference counted | Yes (if `T: Send + Sync`) | Multiple owners, across threads |
| `Cell<T>` | Interior mutability, Copy types | Not Sync | Shared mutable state, single thread |
| `RefCell<T>` | Interior mutability, runtime checks | Not Sync | Shared mutable state, single thread |
| `Mutex<T>` | Thread-safe exclusive access | Yes (if `T: Send`) | Shared mutable state, across threads |
| `RwLock<T>` | Thread-safe read-many/write-one | Send (if `T: Send`), Sync (if `T: Send + Sync`) | Read-heavy shared state, across threads |
| `OnceCell<T>` | One-time init, single thread | Not Sync | Lazy initialization |
| `OnceLock<T>` | One-time init, thread-safe | Yes | Static lazy values (replaces `lazy_static!`) |
| `LazyCell<T>` | Deferred init with closure | Not Sync | Complex lazy init, single thread |
| `LazyLock<T>` | Deferred init with closure, thread-safe | Yes | Complex static initialization |
| `*const T` / `*mut T` | Raw pointers | Neither (manual) | FFI, raw memory |
## When to Use Each
### `&T` -- Shared Borrow
The most common type. Safe, no mutation, multiple readers:
```rust
fn print_len(s: &str) {
println!("{}", s.len());
}
```
### `&mut T` -- Exclusive Borrow
Single writer, enforced at compile time:
```rust
fn append_suffix(s: &mut String) {
s.push_str("_updated");
}
```
### `Box<T>` -- Heap Allocation
Single-owner heap data. Required for recursive types:
```rust
enum Tree<T> {
Leaf(T),
Branch(Box<Tree<T>>, Box<Tree<T>>),
}
```
### `Rc<T>` / `Arc<T>` -- Shared Ownership
`Rc` for single-threaded, `Arc` for multi-threaded:
```rust
// Multi-thread: Arc + Mutex for shared mutable state
let shared = Arc::new(Mutex::new(Vec::new()));
let clone = Arc::clone(&shared);
```
Common mistake: using `Arc<Mutex<T>>` when data is single-threaded. Use `Rc<RefCell<T>>` instead.
### `Cell<T>` / `RefCell<T>` -- Interior Mutability
Mutate data behind a shared reference. `Cell` for Copy types, `RefCell` for others:
```rust
use std::cell::Cell;
struct Counter {
count: Cell<u32>,
}
impl Counter {
fn increment(&self) { // note: &self, not &mut self
self.count.set(self.count.get() + 1);
}
}
```
`RefCell` panics on double borrow at runtime. Prefer compile-time borrowing when possible.
### `Mutex<T>` / `RwLock<T>` -- Thread-Safe Mutability
`Mutex` for exclusive access, `RwLock` when reads outnumber writes:
```rust
use std::sync::{Arc, Mutex};
let data = Arc::new(Mutex::new(HashMap::new()));
// In a thread:
let mut map = data.lock().unwrap();
map.insert("key", "value");
// Lock released when `map` drops
```
### `OnceLock<T>` / `LazyLock<T>` -- One-Time Initialization
Replace `lazy_static!` and `once_cell` with standard library types. `LazyLock` (stable since 1.80) and `OnceLock` make third-party lazy initialization crates unnecessary:
```rust
use std::sync::OnceLock;
static CONFIG: OnceLock<Config> = OnceLock::new();
fn get_config() -> &'static Config {
CONFIG.get_or_init(|| load_config_from_disk())
}
```
```rust
use std::sync::LazyLock;
static REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap()
});
```
### Migrating from `lazy_static` / `once_cell`
```rust
// BAD -- third-party dependency no longer needed
lazy_static::lazy_static! {
static ref CONFIG: Config = load_config();
}
// BAD -- once_cell also superseded
static CONFIG: once_cell::sync::Lazy<Config> = once_cell::sync::Lazy::new(|| load_config());
// GOOD -- standard library LazyLock
static CONFIG: LazyLock<Config> = LazyLock::new(|| load_config());
```
For single-threaded contexts, use `LazyCell` / `OnceCell` (from `std::cell`) instead of their `sync` counterparts:
```rust
use std::cell::LazyCell;
// Thread-local lazy value -- no atomic overhead
let lazy_value = LazyCell::new(|| expensive_computation());
```
Remove `lazy_static` and `once_cell` from `Cargo.toml` once all usages are migrated.
## Decision Guide
```text
Need heap allocation?
-> Single owner: Box<T>
-> Shared ownership, single thread: Rc<T>
-> Shared ownership, multi-thread: Arc<T>
Need interior mutability?
-> Copy type, single thread: Cell<T>
-> Non-Copy, single thread: RefCell<T>
-> Multi-thread, exclusive: Mutex<T>
-> Multi-thread, read-heavy: RwLock<T>
Need lazy initialization?
-> Single thread: OnceCell<T> / LazyCell<T>
-> Multi-thread / static: OnceLock<T> / LazyLock<T>
```
FILE:references/type-state-pattern.md
# Type State Pattern
## What It Is
Type State Pattern encodes different states of a system as types, not runtime flags. The compiler enforces valid state transitions -- invalid operations become compile errors instead of runtime bugs.
`PhantomData<State>` is removed after compilation, so there is no runtime overhead.
## Simple Example: Connection State
```rust
use std::marker::PhantomData;
struct Disconnected;
struct Connected;
struct Client<State> {
addr: String,
_state: PhantomData<State>,
}
impl Client<Disconnected> {
fn new(addr: &str) -> Self {
Client { addr: addr.to_string(), _state: PhantomData }
}
fn connect(self) -> Result<Client<Connected>, std::io::Error> {
// ... establish connection ...
Ok(Client { addr: self.addr, _state: PhantomData })
}
}
impl Client<Connected> {
fn send(&self, msg: &[u8]) -> Result<(), std::io::Error> {
// Only available when connected
Ok(())
}
}
```
```rust
let client = Client::new("localhost:8080");
// client.send(b"hello"); // Won't compile -- not connected yet
let connected = client.connect()?;
connected.send(b"hello")?; // Works
```
## Builder with Required Fields
Force callers to set required fields before `.build()`:
```rust
use std::marker::PhantomData;
struct Missing;
struct Set;
struct Builder<NameState, PortState> {
name: Option<String>,
port: Option<u16>,
_name: PhantomData<NameState>,
_port: PhantomData<PortState>,
}
impl Builder<Missing, Missing> {
fn new() -> Self {
Builder {
name: None, port: None,
_name: PhantomData, _port: PhantomData,
}
}
}
impl<P> Builder<Missing, P> {
fn name(self, name: impl Into<String>) -> Builder<Set, P> {
Builder {
name: Some(name.into()), port: self.port,
_name: PhantomData, _port: PhantomData,
}
}
}
impl<N> Builder<N, Missing> {
fn port(self, port: u16) -> Builder<N, Set> {
Builder {
name: self.name, port: Some(port),
_name: PhantomData, _port: PhantomData,
}
}
}
impl Builder<Set, Set> {
fn build(self) -> Server {
Server {
name: self.name.unwrap(),
port: self.port.unwrap(),
}
}
}
```
```rust
// Valid -- both required fields set
let server = Builder::new().name("api").port(8080).build();
let server = Builder::new().port(8080).name("api").build(); // order doesn't matter
// Won't compile -- missing required field
// let server = Builder::new().name("api").build(); // Error: port not set
```
## When to Use
- **Compile-time state safety** -- prevent invalid operations entirely
- **API constraints** -- builders with required fields, protocol state machines
- **Replace runtime booleans** -- `is_connected`, `is_authenticated` become type-level guarantees
- **Workflow pipelines** -- validate -> authorize -> process
## When to Avoid
- **Trivial state** -- simple enums are clearer for 2-3 states without complex transitions
- **Runtime flexibility** -- state determined by user input at runtime
- **Complex generics** -- if the type signature becomes harder to understand than the bug it prevents
- **Not worth the verbosity** -- pattern requires duplicating struct fields across state transitions
## Downsides
- More verbose than runtime checks
- Complex type signatures with multiple state parameters
- PhantomData is not intuitive for Rust beginners
- Struct fields must be moved between states (some duplication)
> Use when it saves bugs, increases safety, or simplifies logic -- not for cleverness.
Comprehensive Rust code review with optional parallel agents
---
description: Comprehensive Rust code review with optional parallel agents
name: review-rust
disable-model-invocation: true
---
# Rust Code Review
## Arguments
- `--parallel`: Spawn specialized subagents per technology area
- Path: Target directory (default: current working directory)
## Hard gates
Complete in order before writing **Issues** in the output (empty scope is allowed; fabricated findings are not).
1. **Scope gate:** You have an explicit list of `.rs` paths under review (from Step 1 or the user-provided path). **Pass:** List printed or "No Rust files in scope" — then stop with no Issues.
2. **Compiler/linter gate:** Step 3 commands were run from the crate or workspace root (`Cargo.toml` present); if they cannot run, one line states why (e.g. missing toolchain, no `Cargo.toml`, sandbox). **Pass:** You do not report a problem already shown as an error/warning in Step 3 output, and you do not duplicate compiler or clippy diagnostics the author must fix first.
3. **Protocol gate:** `beagle-rust:review-verification-protocol` is loaded before Step 7. **Pass:** Every Critical/Major finding satisfies Step 8 (and the protocol); if there are zero findings, say "Protocol applied; no issues" in the Review Summary.
4. **Evidence gate (Critical/Major):** For each Critical or Major item, you re-read the file at `FILE:LINE` with full surrounding context (not only the diff hunk). **Pass:** The Issue description matches observable code at that location.
## Step 1: Identify Changed Files
```bash
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.rs$'
```
## Step 2: Check Rust Edition and MSRV
```bash
# Check Cargo.toml for edition and rust-version
grep -E 'edition|rust-version' Cargo.toml
# Check workspace members if workspace
grep -A 20 '\[workspace\]' Cargo.toml
```
**Edition 2024 awareness** (requires MSRV 1.85+):
If `edition = "2024"` is detected, the following behavioral changes apply throughout the review:
- `unsafe_op_in_unsafe_fn` is deny by default — unsafe operations inside `unsafe fn` MUST use explicit `unsafe {}` blocks
- `extern "C" {}` blocks must be `unsafe extern "C" {}`
- `#[no_mangle]` and `#[export_name]` must be `#[unsafe(no_mangle)]` and `#[unsafe(export_name)]`
- `-> impl Trait` captures ALL in-scope lifetimes by default (RPIT lifetime capture change); use `+ use<'a>` for precise capture
- `gen` is a reserved keyword — code using it as an identifier must use `r#gen`
- `!` (never type) falls back to `!` instead of `()` — may change behavior of inferred types
- Temporaries in `if let` conditions and tail expressions are dropped earlier than in edition 2021
- `Box<[T]>` now implements `IntoIterator`
Record the detected edition — it affects severity calibration in Steps 3, 8, and the verification protocol.
## Step 3: Verify Linter Status
CRITICAL: Run clippy and check BEFORE flagging style or correctness issues. Do NOT flag issues that clippy or the compiler already catches.
```bash
cargo clippy --all-targets --all-features -- -D warnings 2>&1 | head -50
cargo clippy -- -D clippy::perf 2>&1 | head -20
cargo check --all-targets 2>&1 | head -50
```
**Edition 2024 note:** Edition 2024 promotes several previously-warn lints to deny (notably `unsafe_op_in_unsafe_fn`). If clippy or `cargo check` already reports edition-related errors, do not duplicate those as review findings — instead note that the author must fix compiler errors first.
## Step 4: Detect Technologies
```bash
# Detect tokio async runtime
grep -r "tokio" --include="Cargo.toml" -l | head -3
# Detect axum web framework
grep -r "axum" --include="Cargo.toml" -l | head -3
# Detect sqlx database
grep -r "sqlx" --include="Cargo.toml" -l | head -3
# Detect serde serialization
grep -r "serde" --include="Cargo.toml" -l | head -3
# Detect thiserror / anyhow
grep -r "thiserror\|anyhow" --include="Cargo.toml" -l | head -3
# Detect tracing
grep -r "tracing" --include="Cargo.toml" -l | head -3
# Check for test files in diff
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '((^|/)(test|tests)/.*\.rs$)|(_test\.rs$)'
# Check for unsafe code in diff
git diff $(git merge-base HEAD main)..HEAD -- '*.rs' | grep -c 'unsafe'
# Detect async fn in traits (no async-trait crate needed since Rust 1.75)
grep -r "async-trait" --include="Cargo.toml" -l | head -3
# Detect LazyLock/LazyCell usage (replaces once_cell/lazy_static since 1.80)
grep -r "once_cell\|lazy_static" --include="Cargo.toml" -l | head -3
# Detect #[expect] lint attribute usage (stable since 1.81)
git diff $(git merge-base HEAD main)..HEAD -- '*.rs' | grep -c '#\[expect('
# Detect macro definitions in diff
git diff $(git merge-base HEAD main)..HEAD -- '*.rs' | grep -cE 'macro_rules!|#\[proc_macro|#\[derive\('
# Detect FFI code in diff
git diff $(git merge-base HEAD main)..HEAD -- '*.rs' | grep -cE 'extern "C"|#\[no_mangle\]|#\[repr\(C\)\]|bindgen|#\[unsafe\(no_mangle\)\]'
```
**Modern Rust detection notes:**
- If `async-trait` is a dependency but the project uses edition 2024 or MSRV >= 1.75, flag as Informational — native `async fn` in traits is available and `async-trait` can likely be removed.
- If `once_cell` or `lazy_static` is a dependency but MSRV >= 1.80, flag as Informational — `std::sync::LazyLock` and `std::cell::LazyCell` are stable replacements.
- If `#[allow(...)]` is used where `#[expect(...)]` would be better (MSRV >= 1.81), note as Minor — `#[expect]` warns when the suppressed lint no longer fires, keeping suppressions clean.
## Step 5: Load Verification Protocol
Load `beagle-rust:review-verification-protocol` skill and keep its checklist in mind throughout the review.
## Step 6: Load Skills
Use the `Skill` tool to load each applicable skill (e.g., `Skill(skill: "beagle-rust:rust-code-review")`).
**Always load:**
- `beagle-rust:rust-code-review`
**Conditionally load based on detection:**
| Condition | Skill |
|-----------|-------|
| Tokio detected | `beagle-rust:tokio-async-code-review` |
| Axum detected | `beagle-rust:axum-code-review` |
| sqlx detected | `beagle-rust:sqlx-code-review` |
| Serde detected | `beagle-rust:serde-code-review` |
| Test files changed | `beagle-rust:rust-testing-code-review` |
| Macro definitions in diff | `beagle-rust:macros-code-review` |
| FFI code detected (extern, repr(C), bindgen) | `beagle-rust:ffi-code-review` |
## Step 7: Review
**Sequential (default):**
1. Load applicable skills
2. Review core Rust quality (ownership, error handling, unsafe, traits)
3. Review detected technology areas
4. Consolidate findings
**Parallel (--parallel flag):**
1. Detect all technologies upfront
2. Spawn one subagent per technology area with `Task` tool
3. Each agent loads its skill and reviews its domain
4. Wait for all agents
5. Consolidate findings
## Step 8: Verify Findings
Before reporting any issue:
1. Re-read the actual code (not just diff context)
2. For "unused" claims - did you search all references across the workspace?
3. For "missing" claims - did you check trait definitions, derive macros, and `#[cfg]` gated code?
4. For "unnecessary clone" - did you verify the borrow checker allows a reference?
5. For "unsafe" issues - did you check the safety comments and surrounding invariants?
6. Remove any findings that are style preferences, not actual issues
**Edition 2024 verification rules:**
7. Do NOT flag `unsafe {}` blocks inside `unsafe fn` as unnecessary — they are REQUIRED in edition 2024
8. Do NOT flag `unsafe extern "C"` as unusual syntax — it is REQUIRED in edition 2024
9. Do NOT flag `#[unsafe(no_mangle)]` or `#[unsafe(export_name)]` as unusual — they are REQUIRED in edition 2024
10. For `-> impl Trait` returns, verify whether implicit lifetime capture is intentional — in edition 2024 all in-scope lifetimes are captured by default; suggest `+ use<'a>` only when narrower capture is needed
11. For code using `Box<[T]>` in iterator contexts, remember `IntoIterator` is now available in edition 2024 — do not flag `.iter()` on boxed slices as the only approach
12. If temporaries in `if let` or tail expressions cause borrow issues, consider whether edition 2024's earlier drop semantics are the root cause
## Step 9: Review Convergence
### Single-Pass Completeness
You MUST report ALL issues across ALL categories (ownership, error handling, async, types, tests, security, performance) in a single review pass. Do not hold back issues for later rounds.
Before submitting findings, ask yourself:
- "If all my recommended fixes are applied, will I find NEW issues in the fixed code?"
- "Am I requesting new code (tests, types, modules) that will itself need review?"
If yes to either: include those anticipated downstream issues NOW, in this review, so the author can address everything at once.
### Scope Rules
- Review ONLY the code in the diff and directly related existing code
- Do NOT request new features, test infrastructure, or architectural changes that didn't exist before the diff
- If test coverage is missing, flag it as ONE Minor issue ("Missing test coverage for X, Y, Z") — do NOT specify implementation details
- Doc comments, naming issues are Minor unless they affect public API contracts
- Do NOT request adding new dependencies (e.g., proptest, mockall, criterion)
### Fix Complexity Budget
Fixes to existing code should be flagged at their real severity regardless of size.
However, requests for **net-new code that didn't exist before the diff** must be classified as Informational:
- Adding a new dependency
- Creating entirely new modules, files, or test suites
- Extracting new traits or abstractions
- Adding benchmark suites
These are improvement suggestions for the author to consider in future work, not review blockers.
### Iteration Policy
If this is a re-review after fixes were applied:
- ONLY verify that previously flagged issues were addressed correctly
- Do NOT introduce new findings unrelated to the previous review's issues
- Accept Minor/Nice-to-Have issues that weren't fixed — do not re-flag them
- The goal of re-review is VERIFICATION, not discovery
## Output Format
```markdown
## Review Summary
[1-2 sentence overview of findings]
## Issues
### Critical (Blocking)
1. [FILE:LINE] ISSUE_TITLE
- Issue: Description of what's wrong
- Why: Why this matters (unsound unsafe, data race, panic, security)
- Fix: Specific recommended fix
### Major (Should Fix)
2. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Minor (Nice to Have)
N. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Informational (For Awareness)
N. [FILE:LINE] SUGGESTION_TITLE
- Suggestion: ...
- Rationale: ...
## Good Patterns
- [FILE:LINE] Pattern description (preserve this)
## Verdict
Ready: Yes | No | With fixes 1-N (Critical/Major only; Minor items are acceptable)
Rationale: [1-2 sentences]
```
## Rules
- Complete **Hard gates** before writing Issues
- Load skills BEFORE reviewing (not after)
- Number every issue sequentially (1, 2, 3...)
- Include FILE:LINE for each issue
- Separate Issue/Why/Fix clearly
- Categorize by actual severity
- Run clippy before flagging style issues
- Run verification after fixes
- Report ALL issues in a single pass — do not hold back findings for later iterations
- Re-reviews verify previous fixes ONLY — no new discovery
- Requests for net-new code (new modules, dependencies, test suites) are Informational, not blocking
- The Verdict ignores Minor and Informational items — only Critical and Major block approval
## Post-Fix Verification
After fixes are applied, run:
```bash
cargo check --all-targets
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets
```
All checks must pass before approval.
Comprehensive React/TypeScript frontend code review with optional parallel agents
---
description: Comprehensive React/TypeScript frontend code review with optional parallel agents
name: review-frontend
disable-model-invocation: true
---
# Frontend Code Review
## Arguments
- `--parallel`: Spawn specialized subagents per technology area
- Path: Target directory (default: current working directory)
## Step 1: Identify Changed Files
```bash
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.(tsx?|css)$'
```
## Step 2: Detect Technologies
```bash
# Detect React Flow
grep -r "@xyflow/react\|ReactFlow\|useNodesState" --include="*.tsx" -l | head -3
# Detect Zustand
grep -r "from 'zustand'\|create\(\(" --include="*.ts" --include="*.tsx" -l | head -3
# Detect Tailwind v4
grep -r "@theme\|@layer theme" --include="*.css" -l | head -3
# Check for test files
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.test\.tsx?$'
```
## Step 3: Load Verification Protocol
Load `beagle-react:review-verification-protocol` before any substantive judgment on code.
**Pass before Step 5:** The skill is loaded (or its checklist is open in context). Do not classify severity or write findings until this gate clears.
## Step 4: Load Skills
Use the `Skill` tool to load each applicable skill (e.g., `Skill(skill: "beagle-react:react-router-code-review")`).
**Always load:**
- `beagle-react:react-router-code-review`
- `beagle-react:shadcn-code-review`
**Conditionally load based on detection:**
| Condition | Skill |
|-----------|-------|
| @xyflow/react detected | `beagle-react:react-flow-code-review` |
| Zustand detected | `beagle-react:zustand-state` |
| Tailwind v4 detected | `beagle-react:tailwind-v4` |
| Test files changed | `beagle-react:vitest-testing` |
## Step 5: Review
**Sequential (default):**
1. Load applicable skills
2. Review React Router patterns first
3. Review shadcn/ui patterns
4. Review detected technology areas
5. Consolidate findings
**Parallel (--parallel flag):**
1. Detect all technologies upfront
2. Spawn one subagent per technology area with `Task` tool
3. Each agent loads its skill and reviews its domain
4. Wait for all agents
5. Consolidate findings
## Step 6: Verify Findings
Before reporting any issue:
1. Re-read the actual code (not just diff context)
2. For "unused" claims - did you search all references?
3. For "missing" claims - did you check framework/parent handling?
4. For syntax issues - did you verify against current version docs?
5. Remove any findings that are style preferences, not actual issues
**Pass before promoting to Critical/Major:** For that item, (2)–(4) are satisfied with a concrete artifact when applicable — e.g. opened file at `FILE:LINE`, grep/search output for references, or cited parent/framework code — not only diff context.
## Step 7: Review Convergence
### Single-Pass Completeness
You MUST report ALL issues across ALL categories (style, logic, types, tests, security, performance) in a single review pass. Do not hold back issues for later rounds.
Before submitting findings, ask yourself:
- "If all my recommended fixes are applied, will I find NEW issues in the fixed code?"
- "Am I requesting new code (tests, types, modules) that will itself need review?"
If yes to either: include those anticipated downstream issues NOW, in this review, so the author can address everything at once.
### Scope Rules
- Review ONLY the code in the diff and directly related existing code
- Do NOT request new features, test infrastructure, or architectural changes that didn't exist before the diff
- If test coverage is missing, flag it as ONE Minor issue ("Missing test coverage for X, Y, Z") — do NOT specify implementation details like mock libraries, behaviour extraction, or dependency injection patterns that would introduce substantial new code
- Typespecs, documentation, and naming issues are Minor unless they affect public API contracts
- Do NOT request adding new dependencies (e.g. Mox, testing libraries, linter plugins)
### Fix Complexity Budget
Fixes to existing code should be flagged at their real severity regardless of size.
However, requests for **net-new code that didn't exist before the diff** must be classified as Informational:
- Adding a new dependency (e.g. Mox, a linter plugin)
- Creating entirely new modules, files, or test suites
- Extracting new behaviours, protocols, or abstractions
These are improvement suggestions for the author to consider in future work, not review blockers.
### Iteration Policy
If this is a re-review after fixes were applied:
- ONLY verify that previously flagged issues were addressed correctly
- Do NOT introduce new findings unrelated to the previous review's issues
- Accept Minor/Nice-to-Have issues that weren't fixed — do not re-flag them
- The goal of re-review is VERIFICATION, not discovery
## Output Format
```markdown
## Review Summary
[1-2 sentence overview of findings]
## Issues
### Critical (Blocking)
1. [FILE:LINE] ISSUE_TITLE
- Issue: Description of what's wrong
- Why: Why this matters (bug, a11y, perf, security)
- Fix: Specific recommended fix
### Major (Should Fix)
2. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Minor (Nice to Have)
N. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Informational (For Awareness)
N. [FILE:LINE] SUGGESTION_TITLE
- Suggestion: ...
- Rationale: ...
## Good Patterns
- [FILE:LINE] Pattern description (preserve this)
## Verdict
Ready: Yes | No | With fixes 1-N (Critical/Major only; Minor items are acceptable)
Rationale: [1-2 sentences]
```
## Post-Fix Verification
After fixes are applied, run:
```bash
npm run lint
npm run typecheck
npm run test
```
All checks must pass before approval.
## Gates
Advance in order; do not skip a **pass condition** by restating it informally.
1. **Scope recorded** — **Pass when:** You have the output of the Step 1 command (or an explicit substitute path list) naming what is in scope for this review.
2. **Protocol + always skills loaded** — **Pass when:** `beagle-react:review-verification-protocol`, `beagle-react:react-router-code-review`, and `beagle-react:shadcn-code-review` are loaded before first severity judgment.
3. **Conditional skills** — **Pass when:** For each Step 2 detection row, you either loaded the listed skill or recorded that detection was negative (which command returned no matches).
4. **Critical/Major evidence** — **Pass when:** Each such finding cites `FILE:LINE` that exists in the tree and meets the Step 6 pass rule for that finding type.
5. **Single output** — **Pass when:** The Issues section uses one continuous numbering sequence and this deliverable satisfies Step 7 single-pass completeness (no withheld issue types or rounds).
## Rules
- Load skills BEFORE reviewing (not after)
- Number every issue sequentially (1, 2, 3...)
- Include FILE:LINE for each issue
- Separate Issue/Why/Fix clearly
- Categorize by actual severity
- Don't assume Next.js patterns (no "use client")
- Run verification after fixes
- Report ALL issues in a single pass — do not hold back findings for later iterations
- Re-reviews verify previous fixes ONLY — no new discovery
- Requests for net-new code (new modules, dependencies, test suites) are Informational, not blocking
- The Verdict ignores Minor and Informational items — only Critical and Major block approval
Comprehensive Python/FastAPI backend code review with optional parallel agents
---
description: Comprehensive Python/FastAPI backend code review with optional parallel agents
name: review-python
disable-model-invocation: true
---
# Backend Code Review
## Hard gates (sequence)
Advance only when each **pass condition** is objectively satisfied (prevents linter-owned false positives and ungrounded findings):
| Gate | Pass condition |
|------|----------------|
| **G1 — Diff scope** | Step 1 command has been run; the changed `.py` paths are enumerated in writing (list may be empty — if empty, state that explicitly and do not invent Python findings). |
| **G2 — Linters before manual style/type** | For `ruff` and `mypy`: either no project config exists for that tool, **or** it was run on the changed files and you captured pass/fail (exit code or clear tool output). **Do not** add manual style or type findings for rules those tools already enforce when configured. |
| **G3 — Protocol and base skills** | `beagle-python:review-verification-protocol`, `beagle-python:python-code-review`, and `beagle-python:fastapi-code-review` are loaded before Step 6 substantive review. |
| **G4 — Evidence per issue** | Step 7 checks are satisfied for each reported issue before it appears in the final list (re-read source, search references for “unused”, confirm framework handling for “missing”, verify syntax against current docs). |
| **G5 — Output contract** | Findings use sequential numbering, every issue has `FILE:LINE`, and the **Verdict** follows Step 8 (Critical/Major only block; Minor/Informational do not). |
## Arguments
- `--parallel`: Spawn specialized subagents per technology area
- Path: Target directory (default: current working directory)
## Step 1: Identify Changed Files
**Pass (G1):** Capture the command output (or equivalent) as your authoritative changed-`.py` set before Steps 2–3.
```bash
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.py$'
```
## Step 2: Verify Linter Status
**CRITICAL**: Run project linters BEFORE flagging any style or type issues. **Pass (G2):** You may only proceed to Step 3 after each configured linter has been run on the changed files or you have recorded why it was skipped (missing config).
```bash
# Check if ruff config exists and run it
if [ -f "pyproject.toml" ] || [ -f "ruff.toml" ]; then
ruff check <changed_files>
fi
# Check if mypy config exists and run it
if [ -f "pyproject.toml" ] || [ -f "mypy.ini" ]; then
mypy <changed_files>
fi
```
**Rules:**
- If a linter passes for a specific rule (e.g., line length), DO NOT flag that issue manually
- Linter configuration is authoritative for style rules
- Only flag issues that linters cannot detect (semantic issues, architectural problems)
**Why:** Analysis of 24 review outcomes showed 4 false positives (17%) where reviewers flagged line-length violations that `ruff check` confirmed don't exist. The linter's configuration reflects intentional project decisions.
## Step 3: Detect Technologies
```bash
# Detect Pydantic-AI
grep -r "pydantic_ai\|@agent\.tool\|RunContext" --include="*.py" -l | head -3
# Detect SQLAlchemy
grep -r "from sqlalchemy\|Session\|relationship" --include="*.py" -l | head -3
# Detect Postgres-specific
grep -r "psycopg\|asyncpg\|JSONB\|GIN" --include="*.py" -l | head -3
# Check for test files
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E 'test.*\.py$'
```
## Step 4: Load Verification Protocol
Load `beagle-python:review-verification-protocol` skill and keep its checklist in mind throughout the review.
## Step 5: Load Skills
Use the `Skill` tool to load each applicable skill (e.g., `Skill(skill: "beagle-python:python-code-review")`).
**Always load:**
- `beagle-python:python-code-review`
- `beagle-python:fastapi-code-review`
**Conditionally load based on detection:**
| Condition | Skill |
|-----------|-------|
| Test files changed | `beagle-python:pytest-code-review` |
| Pydantic-AI detected | `beagle-ai:pydantic-ai-common-pitfalls` |
| SQLAlchemy detected | `beagle-python:sqlalchemy-code-review` |
| Postgres detected | `beagle-python:postgres-code-review` |
## Step 6: Review
**Sequential (default):**
1. Load applicable skills
2. Review Python quality issues first
3. Review FastAPI patterns
4. Review detected technology areas
5. Consolidate findings
**Parallel (--parallel flag):**
1. Detect all technologies upfront
2. Spawn one subagent per technology area with `Task` tool
3. Each agent loads its skill and reviews its domain
4. Wait for all agents
5. Consolidate findings
### Before Flagging Optimization or Pattern Issues
1. **Check CLAUDE.md** for documented intentional patterns
2. **Check code comments** around the flagged area for "intentional", "optimization", or "NOTE:"
3. **Trace the code path** before claiming missing coverage or inconsistent handling
4. **Consider framework idioms** - what looks wrong generically may be correct for the framework
**Why:** Analysis showed rejections where reviewers flagged "inconsistent error handling" that was intentional optimization, and "missing test coverage" for code paths that don't exist.
## Step 7: Verify Findings
**Pass (G4):** No issue ships until all bullets below are true for that issue.
Before reporting any issue:
1. Re-read the actual code (not just diff context)
2. For "unused" claims - did you search all references?
3. For "missing" claims - did you check framework/parent handling?
4. For syntax issues - did you verify against current version docs?
5. Remove any findings that are style preferences, not actual issues
## Step 8: Review Convergence
**Pass (G5):** Final markdown matches the Output Format template; verdict line reflects only Critical/Major blockers per scope rules below.
### Single-Pass Completeness
You MUST report ALL issues across ALL categories (style, logic, types, tests, security, performance) in a single review pass. Do not hold back issues for later rounds.
Before submitting findings, ask yourself:
- "If all my recommended fixes are applied, will I find NEW issues in the fixed code?"
- "Am I requesting new code (tests, types, modules) that will itself need review?"
If yes to either: include those anticipated downstream issues NOW, in this review, so the author can address everything at once.
### Scope Rules
- Review ONLY the code in the diff and directly related existing code
- Do NOT request new features, test infrastructure, or architectural changes that didn't exist before the diff
- If test coverage is missing, flag it as ONE Minor issue ("Missing test coverage for X, Y, Z") — do NOT specify implementation details like mock libraries, behaviour extraction, or dependency injection patterns that would introduce substantial new code
- Typespecs, documentation, and naming issues are Minor unless they affect public API contracts
- Do NOT request adding new dependencies (e.g. Mox, testing libraries, linter plugins)
### Fix Complexity Budget
Fixes to existing code should be flagged at their real severity regardless of size.
However, requests for **net-new code that didn't exist before the diff** must be classified as Informational:
- Adding a new dependency (e.g. Mox, a linter plugin)
- Creating entirely new modules, files, or test suites
- Extracting new behaviours, protocols, or abstractions
These are improvement suggestions for the author to consider in future work, not review blockers.
### Iteration Policy
If this is a re-review after fixes were applied:
- ONLY verify that previously flagged issues were addressed correctly
- Do NOT introduce new findings unrelated to the previous review's issues
- Accept Minor/Nice-to-Have issues that weren't fixed — do not re-flag them
- The goal of re-review is VERIFICATION, not discovery
## Output Format
```markdown
## Review Summary
[1-2 sentence overview of findings]
## Issues
### Critical (Blocking)
1. [FILE:LINE] ISSUE_TITLE
- Issue: Description of what's wrong
- Why: Why this matters (bug, type safety, security)
- Fix: Specific recommended fix
### Major (Should Fix)
2. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Minor (Nice to Have)
N. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Informational (For Awareness)
N. [FILE:LINE] SUGGESTION_TITLE
- Suggestion: ...
- Rationale: ...
## Good Patterns
- [FILE:LINE] Pattern description (preserve this)
## Verdict
Ready: Yes | No | With fixes 1-N (Critical/Major only; Minor items are acceptable)
Rationale: [1-2 sentences]
```
## Post-Fix Verification
After fixes are applied, run:
```bash
ruff check .
mypy .
pytest
```
All checks must pass before approval.
## Rules
- Load skills BEFORE reviewing (not after)
- Number every issue sequentially (1, 2, 3...)
- Include FILE:LINE for each issue
- Separate Issue/Why/Fix clearly
- Categorize by actual severity
- Run verification after fixes
- Report ALL issues in a single pass — do not hold back findings for later iterations
- Re-reviews verify previous fixes ONLY — no new discovery
- Requests for net-new code (new modules, dependencies, test suites) are Informational, not blocking
- The Verdict ignores Minor and Informational items — only Critical and Major block approval
Comprehensive iOS/SwiftUI code review with optional parallel agents
---
description: Comprehensive iOS/SwiftUI code review with optional parallel agents
name: review-ios
disable-model-invocation: true
---
# iOS Code Review
## Arguments
- `--parallel`: Spawn specialized subagents per technology area
- Path: Target directory (default: current working directory)
## Hard gates
Complete in order before writing **Issues** in the output (empty scope is allowed; fabricated findings are not).
1. **Scope gate:** You have an explicit list of `.swift` paths under review (from Step 1 or a user-provided path). **Pass:** List captured in working notes **or** one line: `No Swift files in scope` — then stop with no Issues.
2. **Linter gate (style):** Step 2 commands ran for this tree; if no `.swiftlint.yml` / `.swiftlint.yaml`, note that in one line. **Pass:** You do not report a style issue that SwiftLint would already enforce for that line when config exists and `swiftlint` succeeds.
3. **Protocol gate:** `beagle-ios:review-verification-protocol` is loaded before Step 6. **Pass:** If you report any Issues, at least one finding was checked against that checklist (name the item in Review Summary or on that Issue); if you report zero Issues, state `Protocol applied; no issues` in Review Summary.
4. **Evidence gate (Critical/Major):** For each Critical or Major item, you re-read the file at `FILE:LINE` (full surrounding context, not only the diff hunk). **Pass:** The Issue text matches observable code at that location.
Do not begin Step 6 until **Gates 1–3** are satisfied (skills load order stays Steps 4–5).
## Step 1: Identify Changed Files
```bash
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.swift$'
```
## Step 2: Verify Linter Status
**CRITICAL**: Run SwiftLint BEFORE flagging any style issues.
```bash
# Check if SwiftLint config exists and run it
if [ -f ".swiftlint.yml" ] || [ -f ".swiftlint.yaml" ]; then
swiftlint lint --quiet <changed_files>
fi
```
**Rules:**
- If SwiftLint passes for a specific rule, DO NOT flag that issue manually
- SwiftLint configuration is authoritative for style rules
- Only flag issues that linters cannot detect (semantic issues, architectural problems)
## Step 3: Detect Technologies
```bash
# SwiftUI (always with swift files that import it)
grep -r "import SwiftUI" --include="*.swift" -l | head -3
# SwiftData
grep -r "import SwiftData\|@Model\|@Query" --include="*.swift" -l | head -3
# Swift Testing
grep -r "import Testing\|@Test\|#expect" --include="*.swift" -l | head -3
# Combine
grep -r "import Combine\|AnyPublisher\|@Published" --include="*.swift" -l | head -3
# URLSession (explicit async patterns)
grep -r "URLSession\.shared\|\.data(from:\|\.download(from:" --include="*.swift" -l | head -3
# CloudKit
grep -r "import CloudKit\|CKContainer\|CKRecord" --include="*.swift" -l | head -3
# WidgetKit
grep -r "import WidgetKit\|TimelineProvider\|WidgetFamily" --include="*.swift" -l | head -3
# App Intents
grep -r "import AppIntents\|@AppIntent\|AppEntity" --include="*.swift" -l | head -3
# HealthKit
grep -r "import HealthKit\|HKHealthStore\|HKQuery" --include="*.swift" -l | head -3
# WatchKit
grep -r "import WatchKit\|WKExtension\|WKInterfaceController" --include="*.swift" -l | head -3
# Animations (beyond basic withAnimation)
grep -r "PhaseAnimator\|KeyframeAnimator\|matchedGeometryEffect\|navigationTransition\|scrollTransition\|CABasicAnimation\|CASpringAnimation\|CAKeyframeAnimation\|UIViewPropertyAnimator\|UIDynamicAnimator\|\.symbolEffect\|\.contentTransition\|CustomAnimation\|MeshGradient" --include="*.swift" -l | head -3
```
## Step 4: Load Verification Protocol
Load `beagle-ios:review-verification-protocol` skill and keep its checklist in mind throughout the review.
## Step 5: Load Skills
Use the `Skill` tool to load each applicable skill (e.g., `Skill(skill: "beagle-ios:swift-code-review")`).
**Always load:**
- `beagle-ios:swift-code-review`
- `beagle-ios:swiftui-code-review`
**Conditionally load based on detection:**
| Condition | Skill |
|-----------|-------|
| SwiftData detected | `beagle-ios:swiftdata-code-review` |
| Swift Testing detected | `beagle-ios:swift-testing-code-review` |
| Combine detected | `beagle-ios:combine-code-review` |
| URLSession detected | `beagle-ios:urlsession-code-review` |
| CloudKit detected | `beagle-ios:cloudkit-code-review` |
| WidgetKit detected | `beagle-ios:widgetkit-code-review` |
| App Intents detected | `beagle-ios:app-intents-code-review` |
| HealthKit detected | `beagle-ios:healthkit-code-review` |
| WatchKit detected | `beagle-ios:watchos-code-review` |
| Animation code detected | `beagle-ios:ios-animation-code-review` |
## Step 6: Review
**Sequential (default):**
1. Load applicable skills
2. Review Swift quality issues first (concurrency, memory, error handling)
3. Review SwiftUI patterns (view composition, state management, accessibility)
4. Review detected technology areas
5. Consolidate findings
**Parallel (--parallel flag):**
1. Detect all technologies upfront
2. Spawn one subagent per technology area with `Task` tool
3. Each agent loads its skill and reviews its domain
4. Wait for all agents
5. Consolidate findings
### Before Flagging Issues
1. **Check SwiftLint output** - don't duplicate linter findings
2. **Check code comments** for intentional patterns (// MARK:, // NOTE:, etc.)
3. **Consider Apple framework idioms** - what looks wrong generically may be correct for the framework
4. **Trace async code paths** before claiming missing error handling or race conditions
## Step 7: Verify Findings
Before reporting any issue:
1. Re-read the actual code (not just diff context)
2. For "unused" claims - did you search all references?
3. For "missing" claims - did you check framework/parent handling?
4. For syntax issues - did you verify against current version docs?
5. Remove any findings that are style preferences, not actual issues
## Step 8: Review Convergence
### Single-Pass Completeness
You MUST report ALL issues across ALL categories (style, logic, types, tests, security, performance) in a single review pass. Do not hold back issues for later rounds.
Before submitting findings, ask yourself:
- "If all my recommended fixes are applied, will I find NEW issues in the fixed code?"
- "Am I requesting new code (tests, types, modules) that will itself need review?"
If yes to either: include those anticipated downstream issues NOW, in this review, so the author can address everything at once.
### Scope Rules
- Review ONLY the code in the diff and directly related existing code
- Do NOT request new features, test infrastructure, or architectural changes that didn't exist before the diff
- If test coverage is missing, flag it as ONE Minor issue ("Missing test coverage for X, Y, Z") — do NOT specify implementation details like mock libraries, behaviour extraction, or dependency injection patterns that would introduce substantial new code
- Typespecs, documentation, and naming issues are Minor unless they affect public API contracts
- Do NOT request adding new dependencies (e.g. Mox, testing libraries, linter plugins)
### Fix Complexity Budget
Fixes to existing code should be flagged at their real severity regardless of size.
However, requests for **net-new code that didn't exist before the diff** must be classified as Informational:
- Adding a new dependency (e.g. Mox, a linter plugin)
- Creating entirely new modules, files, or test suites
- Extracting new behaviours, protocols, or abstractions
These are improvement suggestions for the author to consider in future work, not review blockers.
### Iteration Policy
If this is a re-review after fixes were applied:
- ONLY verify that previously flagged issues were addressed correctly
- Do NOT introduce new findings unrelated to the previous review's issues
- Accept Minor/Nice-to-Have issues that weren't fixed — do not re-flag them
- The goal of re-review is VERIFICATION, not discovery
## Output Format
```markdown
## Review Summary
[1-2 sentence overview of findings]
## Issues
### Critical (Blocking)
1. [FILE:LINE] ISSUE_TITLE
- Issue: Description of what's wrong
- Why: Why this matters (crash, data loss, security, race condition)
- Fix: Specific recommended fix
### Major (Should Fix)
2. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Minor (Nice to Have)
N. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Informational (For Awareness)
N. [FILE:LINE] SUGGESTION_TITLE
- Suggestion: ...
- Rationale: ...
## Good Patterns
- [FILE:LINE] Pattern description (preserve this)
## Verdict
Ready: Yes | No | With fixes 1-N (Critical/Major only; Minor items are acceptable)
Rationale: [1-2 sentences]
```
## Post-Fix Verification
After fixes are applied, run:
```bash
# Swift build and lint
swift build
swiftlint lint --quiet
# Run tests if present
swift test
```
All checks must pass before approval.
## Rules
- Load skills BEFORE reviewing (not after)
- Number every issue sequentially (1, 2, 3...)
- Include FILE:LINE for each issue
- Separate Issue/Why/Fix clearly
- Categorize by actual severity
- Check for Swift 6 strict concurrency issues
- Run verification after fixes
- Report ALL issues in a single pass — do not hold back findings for later iterations
- Re-reviews verify previous fixes ONLY — no new discovery
- Requests for net-new code (new modules, dependencies, test suites) are Informational, not blocking
- The Verdict ignores Minor and Informational items — only Critical and Major block approval
Comprehensive BubbleTea TUI code review for terminal applications
---
description: Comprehensive BubbleTea TUI code review for terminal applications
name: review-tui
disable-model-invocation: true
---
# TUI Code Review
## Arguments
- `--parallel`: Spawn specialized subagents per technology area
- Path: Target directory (default: current working directory)
## Gates (sequence)
Advance only when each **pass condition** is true (reduces scope drift and unsubstantiated blocking claims):
| Gate | Pass condition |
|------|----------------|
| **G1 — Scope** | Step 1 produced a concrete list of target `.go` paths (from the git command or an explicit user path). If the list is empty, you **stopped** for scope clarification **or** recorded an agreed non-git scope (e.g. single file/dir) before reviewing. |
| **G2 — Skills before review** | `beagle-go:review-verification-protocol`, `beagle-go:go-code-review`, and `beagle-go:bubbletea-code-review` are loaded; Step 4 conditionals (tests → `go-testing-code-review`, Wish → `wish-ssh-code-review`) are loaded **before** Step 5. |
| **G3 — Evidence for Critical/Major** | Each Critical/Major finding cites **file path + line** (or a short quoted snippet) from the **opened** source—not from diff hunks alone. |
| **G4 — Pre-output hygiene** | Each retained finding was checked against Step 7 **and** the loaded verification protocol **before** writing the Issues section. |
Do not start Step 5 until **G2** passes. Do not publish Critical/Major until **G3** and **G4** pass.
## Step 1: Identify Changed Files
```bash
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '\.go$'
```
## Step 2: Detect Technologies
```bash
# Detect BubbleTea (required for TUI review)
grep -r "charmbracelet/bubbletea" --include="*.go" -l | head -3
# Detect Lipgloss styling
grep -r "charmbracelet/lipgloss\|lipgloss\.Style" --include="*.go" -l | head -3
# Detect Bubbles components
grep -r "charmbracelet/bubbles\|list\.Model\|textinput\.Model\|viewport\.Model" --include="*.go" -l | head -3
# Detect Wish SSH server
grep -r "charmbracelet/wish\|ssh\.Session" --include="*.go" -l | head -3
# Check for test files
git diff --name-only $(git merge-base HEAD main)..HEAD | grep -E '_test\.go$'
```
## Step 3: Load Verification Protocol
Load `beagle-go:review-verification-protocol` skill and keep its checklist in mind throughout the review.
## Step 4: Load Skills
Use the `Skill` tool to load each applicable skill (e.g., `Skill(skill: "beagle-go:go-code-review")`).
**Always load:**
- `beagle-go:go-code-review`
- `beagle-go:bubbletea-code-review`
**Conditionally load based on detection:**
| Condition | Skill |
|-----------|-------|
| Test files changed | `beagle-go:go-testing-code-review` |
| Wish SSH detected | `beagle-go:wish-ssh-code-review` |
## Step 5: Review Focus Areas
### Model/Update/View (Elm Architecture)
- [ ] Model is immutable (Update returns new model)
- [ ] Init returns proper initial command
- [ ] Update handles all message types
- [ ] View is pure function (no side effects)
- [ ] tea.Quit used correctly for exit
### Lipgloss Styling
- [ ] Styles defined once at package level
- [ ] Styles not created in View function
- [ ] Colors use AdaptiveColor for light/dark themes
- [ ] Layout responds to WindowSizeMsg
### Component Composition
- [ ] Sub-component updates propagated
- [ ] WindowSizeMsg passed to resizable components
- [ ] Focus management for multiple components
- [ ] Clear state machine for view transitions
### SSH Server (if applicable)
- [ ] Host keys persisted
- [ ] Graceful shutdown implemented
- [ ] PTY window size passed to TUI
- [ ] Per-session Lipgloss renderer
## Step 6: Review
**Sequential (default):**
1. Load applicable skills
2. Review Go code quality
3. Review BubbleTea patterns (Model/Update/View)
4. Review Lipgloss styling
5. Review component composition
6. Review SSH server (if applicable)
7. Consolidate findings
**Parallel (--parallel flag):**
1. Detect all technologies upfront
2. Spawn subagents for: Go quality, BubbleTea, SSH
3. Wait for all agents
4. Consolidate findings
## Step 7: Verify Findings
Before reporting any issue:
1. Re-read the actual code (not just diff context)
2. For "unused" claims - did you search all references?
3. For "missing" claims - did you check framework/parent handling?
4. For syntax issues - did you verify against current version docs?
5. Remove any findings that are style preferences, not actual issues
## Step 8: Review Convergence
### Single-Pass Completeness
You MUST report ALL issues across ALL categories (style, logic, types, tests, security, performance) in a single review pass. Do not hold back issues for later rounds.
Before submitting findings, ask yourself:
- "If all my recommended fixes are applied, will I find NEW issues in the fixed code?"
- "Am I requesting new code (tests, types, modules) that will itself need review?"
If yes to either: include those anticipated downstream issues NOW, in this review, so the author can address everything at once.
### Scope Rules
- Review ONLY the code in the diff and directly related existing code
- Do NOT request new features, test infrastructure, or architectural changes that didn't exist before the diff
- If test coverage is missing, flag it as ONE Minor issue ("Missing test coverage for X, Y, Z") — do NOT specify implementation details like mock libraries, behaviour extraction, or dependency injection patterns that would introduce substantial new code
- Typespecs, documentation, and naming issues are Minor unless they affect public API contracts
- Do NOT request adding new dependencies (e.g. Mox, testing libraries, linter plugins)
### Fix Complexity Budget
Fixes to existing code should be flagged at their real severity regardless of size.
However, requests for **net-new code that didn't exist before the diff** must be classified as Informational:
- Adding a new dependency (e.g. Mox, a linter plugin)
- Creating entirely new modules, files, or test suites
- Extracting new behaviours, protocols, or abstractions
These are improvement suggestions for the author to consider in future work, not review blockers.
### Iteration Policy
If this is a re-review after fixes were applied:
- ONLY verify that previously flagged issues were addressed correctly
- Do NOT introduce new findings unrelated to the previous review's issues
- Accept Minor/Nice-to-Have issues that weren't fixed — do not re-flag them
- The goal of re-review is VERIFICATION, not discovery
## Output Format
```markdown
## Review Summary
[1-2 sentence overview of findings]
## Issues
### Critical (Blocking)
1. [FILE:LINE] ISSUE_TITLE
- Issue: Description of what's wrong
- Why: Why this matters (UI freeze, crash, resource leak)
- Fix: Specific recommended fix
### Major (Should Fix)
2. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Minor (Nice to Have)
N. [FILE:LINE] ISSUE_TITLE
- Issue: ...
- Why: ...
- Fix: ...
### Informational (For Awareness)
N. [FILE:LINE] SUGGESTION_TITLE
- Suggestion: ...
- Rationale: ...
## Good Patterns
- [FILE:LINE] Pattern description (preserve this)
## Verdict
Ready: Yes | No | With fixes 1-N (Critical/Major only; Minor items are acceptable)
Rationale: [1-2 sentences]
```
## Post-Fix Verification
After fixes are applied, run:
```bash
go build ./...
go vet ./...
golangci-lint run
go test -v -race ./...
```
All checks must pass before approval.
## Rules
- Load skills BEFORE reviewing (not after)
- Number every issue sequentially (1, 2, 3...)
- Include FILE:LINE for each issue
- Separate Issue/Why/Fix clearly
- Categorize by actual severity
- Pay special attention to:
- Blocking operations in Update (freezes UI)
- Style creation in View (performance)
- Missing WindowSizeMsg handling (broken resize)
- Run verification after fixes
- Report ALL issues in a single pass — do not hold back findings for later iterations
- Re-reviews verify previous fixes ONLY — no new discovery
- Requests for net-new code (new modules, dependencies, test suites) are Informational, not blocking
- The Verdict ignores Minor and Informational items — only Critical and Major block approval
Detect AI-generated writing patterns in developer text — docs, docstrings, commit messages, PR descriptions, and code comments. Use when reviewing any text a...
---
name: review-ai-writing
description: Detect AI-generated writing patterns in developer text — docs, docstrings, commit messages, PR descriptions, and code comments. Use when reviewing any text artifact for authenticity and clarity.
disable-model-invocation: true
autoContext:
whenUserAsks:
- ai writing
- ai-generated
- sounds like ai
- writing quality
- humanize-beagle
- robotic writing
- chatgpt
dependencies:
- docs-style
---
# Review AI Writing
Detect AI-generated writing patterns across developer text artifacts using parallel subagents.
## Usage
```text
/beagle-docs:review-ai-writing [--all] [--category <name>] [path]
```
**Flags:**
- `--all` - Scan entire codebase (default: changed files from main)
- `--category <name>` - Only check specific category: `content|vocabulary|formatting|communication|filler|code_docs`
- Path: Target directory (default: current working directory)
## Instructions
### 1. Parse Arguments
Extract flags from `$ARGUMENTS`:
- `--all` - Full codebase scan
- `--category <name>` - Filter to specific category
- Path - Target directory
### 2. Load Skills
Load required skills:
```text
Skill(skill: "beagle-docs:review-ai-writing")
Skill(skill: "beagle-core:review-verification-protocol")
```
### 3. Determine Scope
```bash
# Default: changed files from main
git diff --name-only $(git merge-base HEAD main)..HEAD
# If --all flag: scan all text artifacts
find . -type f \( -name "*.md" -o -name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -o -name "*.go" -o -name "*.rs" -o -name "*.java" -o -name "*.rb" -o -name "*.swift" -o -name "*.kt" -o -name "*.ex" -o -name "*.exs" \) ! -path "*/node_modules/*" ! -path "*/.git/*" ! -path "*/vendor/*" ! -path "*/__pycache__/*" ! -path "*/dist/*" ! -path "*/build/*"
```
If no files found, exit with: "No files to scan. Check your branch has changes or use --all."
### 4. Check for Existing LLM Artifacts Review
```bash
# Check if llm-artifacts review exists to avoid double-flagging
if [ -f .beagle/llm-artifacts-review.json ]; then
echo "Found existing llm-artifacts review — will skip overlapping findings"
fi
```
Parse existing findings from `.beagle/llm-artifacts-review.json` if present. When consolidating, skip any finding where both the file:line and pattern type match an existing llm-artifacts finding (specifically `verbose_comment` and `over_documentation` types).
### 5. Classify Files by Type
Partition files into three groups:
| Group | File Types | Patterns to Check |
|-------|-----------|-------------------|
| **Prose** | `*.md` | All 6 categories |
| **Code Docs** | `*.py`, `*.ts`, `*.tsx`, `*.js`, `*.jsx`, `*.go`, `*.rs`, `*.java`, `*.rb`, `*.swift`, `*.kt`, `*.ex`, `*.exs` | vocabulary, communication, filler, code_docs |
| **Git** | Commit messages, PR descriptions | content, vocabulary, communication, filler |
For Git artifacts, collect recent commits:
```bash
# Commits on current branch not in main
git log --format="%H %s" $(git merge-base HEAD main)..HEAD
```
### 6. Spawn Parallel Subagents
If total items >= 4, spawn up to 3 subagents via `Task` tool. If `--category` is set, spawn a single agent for that category only.
#### Subagent 1: Prose Agent
**Scope:** Markdown files only
**Check:** All 6 pattern categories
**Instructions:**
1. Load `beagle-docs:review-ai-writing` skill
2. Read each markdown file
3. Scan for all pattern categories
4. Apply false positive checks from the skill
5. Return findings in the structured format
#### Subagent 2: Code Docs Agent
**Scope:** Source code files
**Check:** vocabulary, communication, filler, code_docs categories
**Instructions:**
1. Load `beagle-docs:review-ai-writing` skill
2. Extract docstrings and comments from each file
3. Scan for applicable pattern categories
4. Skip code itself — only check text in comments and docstrings
5. Return findings in the structured format
#### Subagent 3: Git Agent
**Scope:** Commit messages and PR descriptions
**Check:** content, vocabulary, communication, filler categories
**Instructions:**
1. Load `beagle-docs:review-ai-writing` skill
2. Read commit messages from the branch
3. If on a PR branch, read the PR description via `gh pr view --json body`
4. Scan for applicable pattern categories
5. Use synthetic paths: `git:commit:<sha>` with line 0, `git:pr:<number>` with line 0
6. Return findings in the structured format
### 7. Consolidate Findings
Wait for all subagents to complete, then:
1. Merge all findings into a single list
2. Remove duplicates (same file:line and type)
3. Remove findings that overlap with `.beagle/llm-artifacts-review.json`
4. Assign unique IDs (1, 2, 3...)
5. Group by category for display
### 8. Write JSON Report
Create `.beagle` directory if it doesn't exist:
```bash
mkdir -p .beagle
```
Write findings to `.beagle/ai-writing-review.json`:
```json
{
"version": "1.0.0",
"created_at": "2025-01-15T10:30:00Z",
"git_head": "abc1234",
"scope": "changed",
"files_scanned": 12,
"commits_scanned": 5,
"findings": [
{
"id": 1,
"category": "vocabulary",
"type": "ai_vocabulary_high",
"file": "README.md",
"line": 15,
"original_text": "This library leverages cutting-edge algorithms to facilitate seamless data processing.",
"description": "High-signal AI vocabulary: leverage, cutting-edge, facilitate, seamless",
"suggestion": "This library uses streaming algorithms for fast data processing.",
"risk": "Low",
"fix_safety": "Safe",
"fix_action": "rewrite"
},
{
"id": 2,
"category": "code_docs",
"type": "tautological_docstring",
"file": "src/auth.py",
"line": 42,
"original_text": "\"\"\"Get the user by ID.\"\"\"",
"description": "Docstring restates function name get_user_by_id without adding value",
"suggestion": "\"\"\"Raises UserNotFound if ID doesn't exist.\"\"\"",
"risk": "Medium",
"fix_safety": "Needs review",
"fix_action": "rewrite"
},
{
"id": 3,
"category": "communication",
"type": "chat_leak",
"file": "git:commit:abc1234",
"line": 0,
"original_text": "Certainly! Here's the updated authentication flow",
"description": "Chat leak in commit message: starts with 'Certainly! Here's'",
"suggestion": "Update authentication flow",
"risk": "Low",
"fix_safety": "Safe",
"fix_action": "rewrite"
}
],
"summary": {
"total": 3,
"by_category": {
"vocabulary": 1,
"code_docs": 1,
"communication": 1
},
"by_risk": {
"Low": 2,
"Medium": 1
},
"by_fix_safety": {
"Safe": 2,
"Needs review": 1
}
}
}
```
### 9. Display Summary
```markdown
## AI Writing Review
**Scope:** Changed files from main
**Files scanned:** 12 | **Commits scanned:** 5
### Findings by Category
#### Vocabulary (1 issue)
1. [README.md:15] **AI vocabulary** (Low, Safe)
- High-signal AI vocabulary: leverage, cutting-edge, facilitate, seamless
- Suggestion: Rewrite with simple words
#### Code Docs (1 issue)
2. [src/auth.py:42] **Tautological docstring** (Medium, Needs review)
- Docstring restates function name without adding value
- Suggestion: Add meaningful information or delete
#### Communication (1 issue)
3. [git:commit:abc1234:0] **Chat leak** (Low, Safe)
- Commit message starts with "Certainly! Here's"
- Suggestion: Rewrite as imperative commit message
### Summary Table
| Category | Safe | Needs Review | Total |
|----------|------|--------------|-------|
| Vocabulary | 1 | 0 | 1 |
| Code Docs | 0 | 1 | 1 |
| Communication | 1 | 0 | 1 |
| **Total** | **2** | **1** | **3** |
### Next Steps
- Run `/beagle-docs:humanize-beagle` to apply fixes
- Run `/beagle-docs:humanize-beagle --dry-run` to preview changes first
- Review the JSON report at `.beagle/ai-writing-review.json`
```
### 10. Verification
Before completing, all of the following must **pass** (objective checks):
1. **JSON file exists and parses:** `.beagle/ai-writing-review.json` is present **or** you exited at Gate 1 with no scan (then no JSON is required).
2. **JSON validity:** If the file exists, `python3 -c "import json; json.load(open('.beagle/ai-writing-review.json'))"` exits 0.
3. **Subagent success:** If you used `Task` subagents, each returned without tool/runtime failure (failed spawn = do not write final JSON as if complete).
4. **Git HEAD captured:** When JSON exists, `git_head` matches `git rev-parse HEAD` (non-empty string).
5. **No double-flagging:** If `.beagle/llm-artifacts-review.json` exists, no finding duplicates its file:line + overlapping type for the skip rules in §4.
```bash
# Verify JSON is valid (when file exists)
python3 -c "import json; json.load(open('.beagle/ai-writing-review.json'))" 2>/dev/null && echo "Valid JSON" || echo "Invalid JSON"
```
If any check fails, report the error and do not proceed.
## Output Format for Each Finding
```text
[FILE:LINE] ISSUE_TITLE
- Category: content | vocabulary | formatting | communication | filler | code_docs
- Type: specific_pattern_name
- Original: "the problematic text"
- Suggestion: "the improved text" or "delete"
- Risk: Low | Medium
- Fix Safety: Safe | Needs review
```
## Rules
- Always load `beagle-docs:review-ai-writing` and `beagle-core:review-verification-protocol` first
- Use `Task` tool for parallel subagents when >= 4 items to scan
- Every finding MUST have file:line reference (use synthetic paths for git artifacts)
- Do not flag false positives listed in the skill
- Do not duplicate findings from `.beagle/llm-artifacts-review.json`
- Create `.beagle` directory if needed
- Write JSON report before displaying summary
## Gates (sequenced pass conditions)
Advance only when each **pass condition** is satisfied using artifacts (paths, exit codes, parseable output)—not an internal “I checked” claim.
1. **Arguments → scope**
- **Pass:** You can list the concrete paths (or `git:commit:<sha>` / `git:pr:<n>`) you will scan. If that set is empty, emit the “No files to scan…” message and **do not** create `.beagle/ai-writing-review.json`.
2. **Scope → execution**
- **Pass:** Each of Prose, Code docs, and Git (when in scope) has either completed subagent output **or** equivalent inline work with the same structured fields per finding.
3. **Consolidation → write**
- **Pass:** Duplicates (same file:line and type) removed; when `.beagle/llm-artifacts-review.json` exists, overlaps with it skipped per §4; `git_head` equals the output of `git rev-parse HEAD` (non-empty).
4. **JSON → summary**
- **Pass:** `python3 -c "import json; json.load(open('.beagle/ai-writing-review.json'))"` exits 0.
5. **Finding → verification protocol**
- **Pass:** For each reported issue, you can cite the surrounding paragraph or function you used so the flag is evidence-backed (see `beagle-core:review-verification-protocol`).
## Reference Material
### AI Writing Detection for Developer Text
Detect patterns characteristic of AI-generated text in developer artifacts. These patterns reduce trust, add noise, and obscure meaning.
## Pattern Categories
| Category | Reference | Key Signals |
|----------|-----------|-------------|
| Content | `references/content-patterns.md` | Promotional language, vague authority, formulaic structure, synthetic openers |
| Vocabulary | `references/vocabulary-patterns.md` | AI word tiers, copula avoidance, rhetorical devices, synonym cycling, commit inflation |
| Formatting | `references/formatting-patterns.md` | Boldface overuse, emoji decoration, heading restatement |
| Communication | `references/communication-patterns.md` | Chat leaks, cutoff disclaimers, sycophantic tone, apologetic errors |
| Filler | `references/filler-patterns.md` | Filler phrases, excessive hedging, generic conclusions |
| Code Docs | `references/code-docs-patterns.md` | Tautological docstrings, narrating obvious code, "This noun verbs", exhaustive enumeration |
## Scope
Scan these artifact types:
| Artifact | File Patterns | Notes |
|----------|--------------|-------|
| Markdown docs | `*.md` | READMEs, guides, changelogs |
| Docstrings | `*.py`, `*.ts`, `*.js`, `*.go`, `*.swift`, `*.rs`, `*.java`, `*.kt`, `*.rb`, `*.ex` | Language-specific docstring formats |
| Code comments | Same as docstrings | Inline and block comments |
| Commit messages | `git log` output | Use synthetic path `git:commit:<sha>` |
| PR descriptions | GitHub PR body | Use synthetic path `git:pr:<number>` |
### What NOT to Scan
- Generated code (lock files, compiled output, vendor directories)
- Third-party content (copied license text, vendored docs)
- Code itself (variable names, string literals used programmatically)
- Test fixtures and mock data
## Detection Rules
### High-Confidence Signals (Always Flag)
These patterns are strong indicators of AI-generated text:
1. **Chat leaks** — "Certainly!", "I'd be happy to", "Great question!", "Here's" as sentence opener
2. **Cutoff disclaimers** — "As of my last update", "I cannot guarantee"
3. **High-signal AI vocabulary** — delve, utilize (as "use"), whilst, harnessing, paradigm, synergy
4. **"This noun verbs" in docstrings** — "This function calculates", "This method returns"
5. **Synthetic openers** — "In today's fast-paced", "In the world of"
6. **Sycophantic code comments** — "Excellent approach!", "Great implementation!"
### Medium-Confidence Signals (Flag in Context)
Flag when 2+ appear together or pattern is repeated:
1. **Low-signal AI vocabulary clusters** — 3+ words from the low-signal list in one section
2. **Formulaic structure** — Rigid intro-body-conclusion in a README section
3. **Heading restatement** — First sentence after heading restates the heading
4. **Excessive hedging** — "might potentially", "could possibly", "it seems like it may"
5. **Synonym cycling** — Same concept called different names within one section
6. **Boldface overuse** — More than 30% of sentences contain bold text
### Low-Confidence Signals (Note Only)
Mention but don't flag as issues:
1. **Emoji in technical docs** — May be intentional project style
2. **Filler phrases** — Some are common in human writing too
3. **Generic conclusions** — May be appropriate for summary sections
4. **Commit inflation** — Some teams prefer descriptive commits
## False Positive Warnings
Do NOT flag these as AI-generated:
| Pattern | Why It's Valid |
|---------|----------------|
| "Ensure" in security docs | Standard term for security requirements |
| "Comprehensive" in test coverage discussion | Accurate technical descriptor |
| Formal tone in API reference docs | Expected register for reference material |
| "Leverage" in financial/business domain code | Domain-specific meaning, not AI filler |
| Bold formatting in CLI help text | Standard convention |
| Structured intro paragraphs in RFCs/ADRs | Expected format for these document types |
| "This module provides" in Python `__init__.py` | Idiomatic Python module docstring |
| Rhetorical questions in blog posts | Appropriate for informal content |
## Integration
### With `beagle-core:review-verification-protocol`
Before reporting any finding:
1. Read the surrounding context (full paragraph or function)
2. Confirm the pattern is AI-characteristic, not just formal writing
3. Check if the project has established conventions that match the pattern
4. Verify the suggestion improves clarity without changing meaning
### With `beagle-core:llm-artifacts-detection`
Code-level patterns (tautological docstrings, obvious comments) overlap with `llm-artifacts-detection`'s style criteria. When both skills are loaded:
- `review-ai-writing` focuses on **writing style** (how it reads)
- `llm-artifacts-detection` focuses on **code artifacts** (whether it should exist at all)
- If `.beagle/llm-artifacts-review.json` exists, skip findings already captured there
## Output Format
Report each finding as:
```text
[FILE:LINE] ISSUE_TITLE
- Category: content | vocabulary | formatting | communication | filler | code_docs
- Type: specific_pattern_name
- Original: "the problematic text"
- Suggestion: "the improved text" or "delete"
- Risk: Low | Medium
- Fix Safety: Safe | Needs review
```
### Risk Levels
- **Low** — Filler phrases, obvious comments, emoji. Removing improves clarity with no meaning change.
- **Medium** — Vocabulary swaps, structural changes, docstring rewrites. Meaning could shift if done carelessly.
### Fix Safety
- **Safe** — Mechanical replacement or deletion. No judgment needed.
- **Needs review** — Rewrite requires understanding context. Human should verify the replacement preserves intent.
FILE:references/code-docs-patterns.md
# Code Documentation Patterns
Detecting AI-generated writing in developer documentation: docstrings, code comments, commit messages, and PR descriptions.
> **Overlap note:** Tautological docstrings and obvious comments also appear in `beagle-core:llm-artifacts-detection` (style-criteria.md). This file focuses on the AI writing style aspect; the artifacts skill focuses on unnecessary code artifacts.
---
## 1. Tautological Docstrings
### What to Look For
Docstrings that restate the function name, parameters, or signature without adding any information a developer couldn't already read from the code. The docstring is a mirror of the declaration, not a supplement to it.
Common tells:
- Docstring is the function name split into words
- Describes the return value using only the return type
- Begins with a verb form of the function name ("Gets", "Sets", "Returns", "Creates")
### Examples
**Python**
```python
# BAD - restates the function name
def get_user(user_id: int) -> User:
"""Gets the user."""
return db.query(User).filter_by(id=user_id).one()
def __init__(self, config: Config):
"""Initializes the class."""
self.config = config
def get_count(self) -> int:
"""Returns the count."""
return self._count
# GOOD - adds information not in the signature
def get_user(user_id: int) -> User:
"""Raises NoResultFound if the user does not exist."""
return db.query(User).filter_by(id=user_id).one()
def __init__(self, config: Config):
"""Starts background sync if config.auto_sync is enabled."""
self.config = config
if config.auto_sync:
self._start_sync()
# GOOD - trivial function, no docstring needed
def get_count(self) -> int:
return self._count
```
**TypeScript**
```typescript
// BAD - restates the function name
/** Gets the user by ID. */
function getUserById(id: string): User {
return users.get(id);
}
/** Creates a new order. */
function createOrder(items: CartItem[]): Order {
return orderService.create(items);
}
// GOOD - explains non-obvious behavior
/** Falls back to guest user if ID is not found. */
function getUserById(id: string): User {
return users.get(id) ?? GUEST_USER;
}
// GOOD - no doc needed for clear functions
function createOrder(items: CartItem[]): Order {
return orderService.create(items);
}
```
**fix_safety:** Safe. Removing a tautological docstring or replacing it with useful content does not change behavior.
---
## 2. Narrating Obvious Code
### What to Look For
Comments that describe what the code does line-by-line rather than explaining why. These read like a narration of the syntax rather than a note from one developer to another. They add vertical noise without aiding comprehension.
Common tells:
- Comment restates the operation: `# Loop through items`, `# Check if valid`
- Comment names the construct: `# Return the result`, `# Set the variable`
- Every block has a comment even when the code is self-documenting
### Examples
**Python**
```python
# BAD - narrates each line
def process_orders(orders: list[Order]) -> list[Receipt]:
# Initialize results list
results = []
# Loop through orders
for order in orders:
# Check if order is valid
if order.is_valid():
# Process the order
receipt = order.process()
# Add receipt to results
results.append(receipt)
# Return the results
return results
# GOOD - comment only where non-obvious
def process_orders(orders: list[Order]) -> list[Receipt]:
results = []
for order in orders:
if order.is_valid():
# process() is idempotent; safe to retry on partial failures
receipt = order.process()
results.append(receipt)
return results
```
**JavaScript**
```javascript
// BAD - narrates obvious DOM operations
function renderList(items) {
// Get the container element
const container = document.getElementById("list");
// Clear existing content
container.innerHTML = "";
// Iterate over items
items.forEach((item) => {
// Create a new list item
const li = document.createElement("li");
// Set the text content
li.textContent = item.name;
// Append to container
container.appendChild(li);
});
}
// GOOD - no comments needed; code is clear
function renderList(items) {
const container = document.getElementById("list");
container.innerHTML = "";
items.forEach((item) => {
const li = document.createElement("li");
li.textContent = item.name;
container.appendChild(li);
});
}
```
**fix_safety:** Safe. Removing narration comments does not change behavior. Keep any comment that explains a non-obvious decision or constraint.
---
## 3. "This noun verbs" Pattern
### What to Look For
Docstrings and descriptions that start with "This function...", "This method...", "This class...", "This module...". This is a signature AI sentence structure rarely used by human developers. Humans typically write in imperative mood ("Calculate totals") or start with the subject of the action ("Totals are calculated..."), not with a self-referential "This".
Common tells:
- "This function calculates..."
- "This method returns..."
- "This class represents..."
- "This module provides..."
- "This component renders..."
- "This hook manages..."
### Examples
**Python**
```python
# BAD - "This function/class/method" pattern
class OrderProcessor:
"""This class represents an order processor that handles
the processing of customer orders."""
def calculate_total(self, items: list[Item]) -> Decimal:
"""This method calculates the total price for the given items."""
return sum(item.price * item.quantity for item in items)
def apply_discount(self, total: Decimal, code: str) -> Decimal:
"""This function applies a discount code to the total."""
discount = self.discounts.get(code, Decimal("0"))
return total - discount
# GOOD - imperative mood, no self-reference
class OrderProcessor:
"""Process customer orders with discount and tax support."""
def calculate_total(self, items: list[Item]) -> Decimal:
"""Sum of price * quantity across all items."""
return sum(item.price * item.quantity for item in items)
def apply_discount(self, total: Decimal, code: str) -> Decimal:
"""Subtract the discount for `code`. Unknown codes are ignored."""
discount = self.discounts.get(code, Decimal("0"))
return total - discount
```
**TypeScript**
```typescript
// BAD - "This component/hook" pattern
/**
* This component renders a user profile card with the user's
* avatar, name, and bio information.
*/
function ProfileCard({ user }: { user: User }) {
return <Card>...</Card>;
}
/**
* This hook manages the authentication state for the application.
*/
function useAuth(): AuthState {
// ...
}
// GOOD - direct description
/** User profile card showing avatar, name, and bio. */
function ProfileCard({ user }: { user: User }) {
return <Card>...</Card>;
}
/** Authentication state with login, logout, and token refresh. */
function useAuth(): AuthState {
// ...
}
```
**fix_safety:** Safe. Rewriting a docstring from "This function does X" to imperative or direct style does not change behavior.
---
## 4. Exhaustive Enumeration
### What to Look For
Documentation that exhaustively lists every parameter, return value, and exception even when most are obvious from the type signature. AI models tend to produce complete Args/Returns/Raises blocks as boilerplate regardless of whether the information is useful. The result is a docstring longer than the function body that a developer will skip entirely.
Common tells:
- Args section where every description is "The [param name]" or "The [param name] to use"
- Returns section that restates the return type hint
- Raises section listing only the obvious (`ValueError` for bad input)
- Docstring is longer than the function body
### Examples
**Python**
```python
# BAD - exhaustive enumeration of obvious params
def send_notification(
user_id: int,
message: str,
channel: Channel = Channel.EMAIL,
) -> bool:
"""Send a notification to a user.
Args:
user_id: The ID of the user to notify.
message: The notification message to send.
channel: The channel to send the notification through.
Defaults to Channel.EMAIL.
Returns:
bool: True if the notification was sent successfully,
False otherwise.
Raises:
ValueError: If user_id is invalid.
ConnectionError: If the notification service is unavailable.
"""
...
# GOOD - document only non-obvious aspects
def send_notification(
user_id: int,
message: str,
channel: Channel = Channel.EMAIL,
) -> bool:
"""Send a notification. Returns False if delivery is deferred
(e.g., user has DND enabled) rather than raising."""
...
```
**TypeScript**
```typescript
// BAD - JSDoc repeats what the types already say
/**
* Fetches paginated results from the API.
*
* @param endpoint - The API endpoint to fetch from.
* @param page - The page number to fetch.
* @param pageSize - The number of items per page.
* @returns A promise that resolves to an array of results.
* @throws {ApiError} If the request fails.
*/
async function fetchPage<T>(
endpoint: string,
page: number,
pageSize: number
): Promise<T[]> {
// ...
}
// GOOD - only document what types don't convey
/**
* Fetch a page of results. Uses cursor-based pagination under
* the hood; `page` is translated to a cursor internally.
*/
async function fetchPage<T>(
endpoint: string,
page: number,
pageSize: number
): Promise<T[]> {
// ...
}
```
**fix_safety:** Needs review. Removing parameter documentation is safe for obvious params, but verify that no non-obvious constraints or side effects are documented in the verbose block before trimming. Keep docs for params with subtle semantics (units, boundaries, format expectations).
---
## Review Questions
When evaluating code documentation for these patterns, ask:
1. Does this docstring tell me something the signature does not?
2. Does this comment explain why, or just restate what?
3. Would a human developer actually write "This function..." in a docstring?
4. Is this Args/Returns/Raises block earning its vertical space?
5. If I deleted this documentation, would any developer be worse off?
FILE:references/communication-patterns.md
# Communication Patterns
Detection criteria for conversational AI artifacts that leak into developer-facing text: documentation, commit messages, PR descriptions, and code comments.
## 1. Chat Leaks
### What to Look For
Conversational phrases from AI chat sessions copy-pasted into committed text. These read as one side of a dialogue rather than authored documentation.
### Detection Patterns
```markdown
# BAD - Chat openers leaked into docs/PRs/comments
Certainly! Here's how to configure the database connection.
Great question! The middleware stack processes requests in order.
I'd be happy to help explain the authentication flow.
# GOOD
Configure the database connection with the following environment variables.
The middleware stack processes requests in order.
```
```python
# BAD - Chat phrasing in code comments or commit messages
# Here's the implementation for the retry logic
# Let me explain what this does
# As requested, this handles the edge case
# Sure thing! This validates the token format
# GOOD
# Retry with exponential backoff, max 3 attempts
# Handles empty input by returning early
```
### fix_safety: Safe
Removing chat preambles does not change technical meaning.
---
## 2. Cutoff Disclaimers
### What to Look For
Training cutoff disclaimers or knowledge-limitation hedges in docs or comments. These expose AI-generated origin and erode reader confidence.
### Detection Patterns
```markdown
# BAD
As of my last update, the API supports cursor-based pagination.
Based on current knowledge, PostgreSQL 16 introduced merge joins.
I cannot guarantee this behavior will remain consistent.
# GOOD
The API supports cursor-based pagination (v2.3+).
PostgreSQL 16 introduced merge joins (see release notes).
This behavior may change; pin your dependency version.
```
### fix_safety: Safe
Replace disclaimers with versioned references or remove entirely.
---
## 3. Sycophantic Tone
### What to Look For
Excessive praise or enthusiasm in code review comments, PR descriptions, and documentation that reads as flattery rather than technical assessment.
### Detection Patterns
```markdown
# BAD
Excellent approach! The factory pattern here is wonderful.
Great implementation! This is a really elegant solution.
This is a brilliant way to handle the race condition!
# GOOD
The factory pattern decouples instantiation from the handler logic.
The mutex prevents the race condition identified in #247.
Consider a read-write lock since reads outnumber writes 10:1.
```
```python
# BAD
# This is a really nice pattern for dependency injection
# GOOD
# Dependency injection via constructor allows test doubles
```
### fix_safety: Safe
Replace praise with neutral technical descriptions. No semantic meaning is lost.
---
## 4. Apologetic Errors
### What to Look For
Over-apologizing in error messages, log output, or documentation. Error handling should be informative and actionable, not socially polite.
### Detection Patterns
```python
# BAD
raise ValueError("I apologize for any inconvenience, but the input is invalid.")
logger.error("Sorry for the confusion, but the config could not be loaded.")
print("Unfortunately, we regret to inform you that the connection failed.")
# GOOD
raise ValueError(f"Invalid input: expected ISO 8601 date, got {value!r}")
logger.error("Failed to load config from %s: %s", config_path, err)
print(f"Connection failed: {host}:{port} (timeout after {timeout}s)")
```
```markdown
# BAD
We're sorry, but this feature is not available in the free tier.
# GOOD
This feature requires a paid plan. See pricing for details.
```
### fix_safety: Needs review
Error message rewording must preserve all diagnostic information (codes, paths, values). Verify the replacement includes the same actionable details.
---
## Review Questions
1. Does the text read like one side of a conversation or like authored documentation?
2. Are there hedges or disclaimers that reference AI knowledge limitations?
3. Does review feedback contain praise without specific technical substance?
4. Do error messages apologize instead of providing actionable remediation steps?
5. Would the text make sense if the reader had no context of an AI interaction?
FILE:references/content-patterns.md
# Content Patterns
Detection patterns for AI-generated writing in developer docs, commit messages, PRs, and code comments.
## 1. Promotional Language
### What to Look For
Inflated claims about code quality or tool capabilities. AI text oversells what code does, using marketing language instead of precise technical description.
### Detection Patterns
**Commit messages**:
Before:
```text
feat: add robust and elegant caching layer for seamless data retrieval
```
After:
```text
feat: add Redis cache for user profile queries
```
**Module docs**:
Before:
```markdown
This powerful and highly flexible authentication module provides a comprehensive,
enterprise-grade solution for all your security needs.
```
After:
```markdown
Authentication module supporting OAuth 2.0 and SAML. Wraps `authlib` with
project-specific defaults and middleware hooks.
```
**Inline comments**:
Before: `// This elegant algorithm efficiently handles all edge cases`
After: `// Handles duplicate keys by keeping the last value`
### Trigger Words
`robust`, `elegant`, `seamless`, `powerful`, `comprehensive`, `cutting-edge`, `best-in-class`, `enterprise-grade`, `highly flexible`, `state-of-the-art`, `effortlessly`
### fix_safety: Safe
Replace promotional adjectives with factual descriptions of what the code actually does.
---
## 2. Vague Authority
### What to Look For
Unattributed claims borrowing authority from unnamed sources. AI text inserts "research shows" or "experts agree" without citations.
### Detection Patterns
**Docs**:
Before:
```markdown
Research shows that structured error handling significantly improves reliability.
Experts agree that the Result pattern is the preferred approach.
```
After:
```markdown
Uses the Result pattern (`Ok`/`Err`) instead of exceptions.
See [ADR-0012](../decisions/0012-result-pattern.md) for the tradeoff discussion.
```
**PR descriptions**:
Before:
```text
Studies have shown that smaller functions lead to better maintainability.
This refactor follows industry-standard practices.
```
After:
```text
Split `process_order` (180 lines) into `validate_order`, `apply_discounts`,
and `submit_order`. Each function is independently testable.
```
**Code comments**:
Before: `// Best practices dictate that we should validate input here`
After: `// Validate before insert; see #422 for the injection that motivated this`
### Trigger Phrases
`research shows`, `studies have shown`, `experts agree`, `it is widely recognized`, `best practices dictate`, `industry-standard`, `as recommended by leading engineers`
### fix_safety: Safe
Replace with specific references (links, ADR numbers, issue numbers) or drop the claim entirely.
---
## 3. Formulaic Structure
### What to Look For
Rigid intro-body-conclusion scaffolding where it adds no value. AI text forces three-act structure onto commit messages and short docs.
### Detection Patterns
**Over-structured commit messages**:
Before:
```text
feat: implement user notification system
Introduction:
This commit introduces a notification system for users.
Changes Made:
- Added NotificationService class
- Added email template
Conclusion:
Users will now receive email notifications when orders are processed.
```
After:
```text
feat: add email notifications on order completion
- Add NotificationService with SendGrid transport
- Add order_completed email template
- Add notifications table migration
```
**Unnecessary preamble in docs**:
Before:
```markdown
## Configuration
Configuration plays a vital role in any application. In this section,
we will explore the various configuration options available.
### Database URL
```
After:
```markdown
## Configuration
### Database URL
```
**Restated conclusions**:
Before:
```markdown
## Conclusion
In summary, this library provides CSV parsing for your data pipeline.
As discussed above, it supports custom delimiters. We hope you find it useful.
```
After: Remove the section, or replace with a "See also" list linking to related docs.
### fix_safety: Safe
Removing empty intros, restated conclusions, and "in this section we will" preambles does not alter technical content.
---
## 4. Synthetic Openers
### What to Look For
Canned opening phrases that add no information and delay the reader. Common in generated docs, PR descriptions, and READMEs.
### Detection Patterns
**Docs**:
Before:
```markdown
In today's fast-paced world of API development, rate limiting has become
an essential component of any production-ready system.
```
After:
```markdown
Token bucket rate limiter for the public API. Defaults to 100 req/min
per API key. Configure via `RATE_LIMIT` in environment.
```
**PR descriptions**:
Before:
```text
In the world of distributed systems, message queues play a crucial role
in decoupling services. This PR adds RabbitMQ support to our pipeline.
```
After:
```text
Add RabbitMQ consumer for the ingestion pipeline. Replaces the polling
loop in `ingest_worker.py` (see #287 for latency benchmarks).
```
**Code comments**:
Before: `// As we all know, caching is important for performance`
After: `// Cache parsed configs; parsing takes ~200ms per file`
### Trigger Phrases
`In today's fast-paced`, `In the world of`, `In the ever-evolving landscape`, `As we all know`, `It's worth noting that`, `It goes without saying`, `When it comes to`, `In the realm of`
### fix_safety: Needs review
Most synthetic openers can be deleted outright, but some may be the only sentence introducing a topic. After removing the opener, verify the paragraph still has a clear lead sentence. If the opener was the entire introduction, write a direct replacement stating scope or purpose.
---
## Review Questions
1. Does this description say what the code does, or how great it is?
2. Is this claim attributed to a specific source, or does it lean on vague authority?
3. Would this doc lose any technical content if the intro and conclusion were deleted?
4. Does the opening sentence deliver information, or is it a generic warm-up?
5. Could a reader skip the first paragraph entirely and miss nothing?
FILE:references/filler-patterns.md
# Filler Patterns
Patterns for detecting AI-generated filler in developer documentation, commit messages, PRs, and code comments.
## Filler Phrases
Dev-specific filler phrases that add no information. These weaken technical writing by burying the actual point.
> Cross-reference: See `beagle-docs:docs-style` for the core phrase simplification table.
The following are dev-specific additions commonly found in AI-generated output:
| Phrase | Fix |
|--------|-----|
| "It's worth noting that" | Delete, or state the fact directly |
| "It should be noted that" | Delete |
| "As mentioned earlier/above" | Link directly to the section, or delete |
| "This allows us to" | State what happens |
| "In this section, we will" | Delete; just start the section |
| "Let's take a look at" | Delete |
| "As we can see" | Delete |
| "Going forward" | Delete, or specify a timeframe |
| "At the end of the day" | Delete |
**What to look for**: Sentences that begin with these phrases and contribute nothing after removal. The surrounding sentence remains grammatically correct and retains its meaning.
### Before / After
```markdown
<!-- Before -->
It's worth noting that the connection pool defaults to 10.
<!-- After -->
The connection pool defaults to 10.
```
```python
# Before
# This allows us to gracefully handle timeout errors
# After
# Handles timeout errors with exponential backoff
```
```markdown
<!-- Before (commit message) -->
Going forward, all API responses will include pagination metadata.
<!-- After -->
All API responses now include pagination metadata.
```
**fix_safety**: Safe -- these phrases can be removed mechanically without changing meaning.
---
## Excessive Hedging
Overuse of qualifiers that weaken technical statements. In technical documentation, either something is true or it is not. Stacking hedges signals that the author (or model) is uncertain about claims that should be definitive.
**What to look for**: Multiple hedging words combined in a single clause, or hedges applied to verifiable facts.
Common patterns:
- "might potentially" -- pick one or neither
- "could possibly" -- pick one or neither
- "it seems like it may" -- state what it does, or document the ambiguity explicitly
- "arguably" -- either make the argument or remove the claim
- "one could say" -- say it or remove it
- "it is generally considered" -- by whom? cite or state directly
- "this should theoretically" -- test it and state the result
### Before / After
```markdown
<!-- Before -->
This approach might potentially reduce latency in some cases.
<!-- After -->
This approach reduces p95 latency by ~40ms in benchmarks (see #214).
```
```markdown
<!-- Before (PR description) -->
The refactor could possibly improve readability and arguably makes
the module easier to test.
<!-- After -->
The refactor separates I/O from parsing, making the module unit-testable
without mocks.
```
```python
# Before
# This should theoretically handle all edge cases
# After
# Handles empty input, None, and negative values (see test_edge_cases)
```
**fix_safety**: Needs review -- removing hedges changes the strength of the claim. Verify the resulting statement is accurate before committing.
---
## Generic Conclusions
Empty summarizing paragraphs that restate what the reader just read. These appear at the end of docs, PRs, and commit descriptions. They add no information and signal AI generation because LLMs are trained on content with formulaic conclusions.
**What to look for**: Final paragraphs that begin with summarizing phrases and contain no new information, action items, or links.
Common patterns:
- "In conclusion, we have seen that..."
- "To summarize, this document covered..."
- "By following these steps, you will be able to..."
- "Overall, this implementation provides..."
- "In summary, the changes above..."
- "With these changes in place, we now have..."
### Before / After
```markdown
<!-- Before (end of PR description) -->
In summary, the changes above refactor the authentication module to use
JWT tokens instead of session cookies, improving security and reducing
server-side state. By following this approach, we ensure that the system
is more maintainable and scalable going forward.
<!-- After -->
## Migration
Existing sessions expire after deploy. Users will need to re-authenticate.
See the migration runbook: docs/runbooks/auth-jwt-migration.md
```
```markdown
<!-- Before (end of a doc page) -->
By following these steps, you will be able to deploy your application
to production successfully. We have covered all the necessary
configuration and setup required for a smooth deployment.
<!-- After -->
## Next steps
- [Set up monitoring](/guides/monitoring) for your production deployment
- [Configure alerting](/guides/alerts) for error rate thresholds
```
```python
# Before (end of module docstring)
# In conclusion, this module provides a comprehensive set of utilities
# for handling date parsing across multiple formats.
# After
# Supported formats: ISO 8601, RFC 2822, Unix timestamps.
# See parse_date() for the full format list.
```
**fix_safety**: Safe -- generic conclusions can be deleted outright. If the section needs a closing, replace it with actionable next steps or concrete references.
FILE:references/formatting-patterns.md
# Formatting Patterns
Detection criteria for AI-generated formatting habits in developer docs, commit messages, PR descriptions, and code comments.
## 1. Boldface Overuse
### What to Look For
AI tends to bold every key term, creating visual noise that dilutes emphasis. In developer documentation, bold should be reserved for UI element labels, key terms on first introduction, and warnings or critical notes.
### Detection Patterns
**Bolding common terms throughout a paragraph**:
```markdown
<!-- BAD - Every term is bold, nothing stands out -->
The **server** reads the **configuration file** on **startup** and
initializes the **database connection pool**. If the **connection**
fails, the **retry logic** kicks in with **exponential backoff**.
<!-- GOOD - Bold only on first introduction of a key concept -->
The server reads the configuration file on startup and initializes
the database connection pool. If the connection fails, the retry
logic kicks in with **exponential backoff** (see Retry Strategies).
```
**Bolding obvious terms in lists**:
```markdown
<!-- BAD - Bold on every list label adds nothing -->
- **Port**: 8080
- **Host**: localhost
- **Protocol**: HTTP
<!-- GOOD - Plain text when the structure already provides emphasis -->
- Port: 8080
- Host: localhost
- Protocol: HTTP
```
**Bolding in commit messages or PR descriptions**:
```markdown
<!-- BAD -->
Fix **race condition** in **connection pool** when **timeout** expires
<!-- GOOD -->
Fix race condition in connection pool when timeout expires
```
### fix_safety
Safe. Removing unnecessary bold formatting does not change meaning.
---
## 2. Emoji Decoration
### What to Look For
Gratuitous emoji in technical writing where plain text is clearer. Common in AI-generated changelogs, PR descriptions, commit messages, and documentation headings. Emoji should only appear when they serve a functional purpose (e.g., status indicators in a table).
### Detection Patterns
**Emoji in changelog or release notes**:
```markdown
<!-- BAD - Emoji adds no information -->
## What's New
- :rocket: Added streaming support for large responses
- :bug: Fixed null pointer in auth middleware
- :sparkles: New CLI flag for verbose output
- :wastebasket: Removed deprecated v1 endpoints
<!-- GOOD - Let the content speak -->
## What's New
- Added streaming support for large responses
- Fixed null pointer in auth middleware
- New CLI flag for verbose output
- Removed deprecated v1 endpoints
```
**Emoji in headings or section titles**:
```markdown
<!-- BAD -->
## :wrench: Configuration
## :book: API Reference
## :warning: Known Issues
<!-- GOOD -->
## Configuration
## API Reference
## Known Issues
```
**Emoji as bullet markers or checkmarks**:
```markdown
<!-- BAD - Emoji replacing standard list markers -->
:white_check_mark: Unit tests passing
:white_check_mark: Integration tests passing
:white_check_mark: Linting clean
<!-- GOOD - Use standard markdown -->
- [x] Unit tests passing
- [x] Integration tests passing
- [x] Linting clean
```
### fix_safety
Safe. Removing decorative emoji does not change technical meaning.
---
## 3. Heading Restatement
### What to Look For
The first sentence after a heading restates the heading in slightly different words. This is filler that delays the reader from reaching useful content. The heading already names the topic; the body should immediately provide substance.
### Detection Patterns
**Restating the heading as a definition**:
```markdown
<!-- BAD - First sentence is the heading in sentence form -->
## Error Handling
Error handling is an important aspect of building robust applications.
When an error occurs...
<!-- GOOD - Jump straight into substance -->
## Error Handling
All functions in this module return `(result, error)` tuples. Check
the error value before using the result.
```
**Restating with "This section describes..."**:
```markdown
<!-- BAD - Meta-commentary about the section itself -->
## Authentication
This section describes how authentication works in the system.
The authentication flow begins when...
<!-- GOOD - Start with what the reader needs -->
## Authentication
Clients authenticate by sending a Bearer token in the `Authorization`
header. Tokens are issued by the `/auth/token` endpoint and expire
after 24 hours.
```
**Restating in code comments**:
```python
# BAD - Comment restates the function name
def calculate_retry_delay(attempt: int) -> float:
"""Calculate the retry delay.
Calculates the delay before retrying a failed request.
"""
return min(2 ** attempt, MAX_DELAY)
# GOOD - Comment adds information the name doesn't convey
def calculate_retry_delay(attempt: int) -> float:
"""Exponential backoff capped at MAX_DELAY seconds.
Uses jitter to prevent thundering herd on service recovery.
"""
return min(2 ** attempt, MAX_DELAY)
```
### fix_safety
Needs review. The restated sentence sometimes contains useful qualifiers or scope limitations mixed in with the filler. Verify that no meaningful context is lost before removing.
---
## Review Questions
1. Does every bold term in this paragraph genuinely need emphasis, or would one or two suffice?
2. Would this changelog entry lose any meaning without the emoji?
3. Does the first sentence after this heading tell the reader something the heading did not?
FILE:references/vocabulary-patterns.md
# Vocabulary Patterns
Detection patterns for AI-generated writing in developer docs, commit messages, PRs, and code comments.
## 1. AI Vocabulary
### What to Look For
Words that appear disproportionately in AI-generated text, organized by signal strength.
### High-Signal Words (Always Flag)
These rarely appear in natural developer writing. Flag every occurrence.
`delve`, `utilize`, `leverage` (meaning "use"), `whilst`, `furthermore`, `moreover`, `harnessing`, `revolutionize`, `paradigm`, `synergy`, `facilitate`, `empower`, `elevate`, `unleash`, `robust` (non-technical), `seamless`, `cutting-edge`, `endeavor`, `pivotal`, `embark`
```markdown
<!-- BAD -->
Let's delve into how this module leverages the cache to facilitate
seamless data retrieval, empowering developers to unleash the full
potential of the framework.
<!-- GOOD -->
This module uses the cache for faster data retrieval.
```
**fix_safety**: Safe. Direct word substitution.
### Low-Signal Words (Flag in Clusters of 3+)
Normal in isolation, but AI signals when clustered. Flag when 3+ appear in one paragraph.
`ensure`, `enhance`, `comprehensive`, `streamline`, `optimize`, `implement`, `innovative`, `significant`, `fundamental`, `essential`
```markdown
<!-- BAD - 4 low-signal words clustered -->
This comprehensive update ensures that the streamlined pipeline
handles all essential edge cases.
<!-- GOOD -->
This update fixes edge case handling in the pipeline.
```
**fix_safety**: Safe. Rewrite with plain language.
---
## 2. Copula Avoidance
### What to Look For
AI avoids simple copula verbs ("is", "are", "was") and substitutes complex verb phrases. The result is stiff and indirect.
### Detection Patterns
Watch for these phrases followed by articles ("a", "an", "the"):
- "stands as" / "serves as" / "acts as" / "functions as" / "remains as" / "exists as"
```markdown
<!-- BAD -->
This module stands as the primary entry point for authentication.
The `Config` struct serves as the central configuration object.
<!-- GOOD -->
This module is the primary entry point for authentication.
The `Config` struct is the central configuration object.
```
```python
# BAD
# This class serves as a wrapper around the database connection
class DB:
...
# GOOD
# Wraps the database connection
class DB:
...
```
Note: "acts as" is valid when describing adapter/proxy design patterns.
**fix_safety**: Safe. Replace with "is" or rewrite as a direct statement.
---
## 3. Rhetorical Devices
### What to Look For
AI overuses rhetorical questions and dramatic framing in documentation. Developers write direct statements; AI writes engagement hooks.
### Detection Patterns
**Rhetorical questions as section openers**:
```markdown
<!-- BAD -->
Ever wondered how to secure your API endpoints? What if you could
add authentication in just a few lines of code?
<!-- GOOD -->
Add authentication to API endpoints using middleware.
```
**"Imagine" framing**:
```markdown
<!-- BAD -->
Imagine a world where your deployments never fail.
<!-- GOOD -->
The CI pipeline catches failures before deployment.
```
**Dramatic introductions**:
```markdown
<!-- BAD -->
In today's fast-paced development landscape, managing state has
become one of the most challenging aspects of building modern
applications. This library was born from the need to...
<!-- GOOD -->
A state management library for React applications.
```
**In PR descriptions**:
```markdown
<!-- BAD -->
Have you ever struggled with flaky tests? This PR tackles that
age-old problem by introducing deterministic test ordering.
<!-- GOOD -->
Fix flaky tests by making test ordering deterministic.
```
**fix_safety**: Safe. Replace with direct statements.
---
## 4. Synonym Cycling
### What to Look For
AI cycles through synonyms for the same concept to avoid repetition. In technical writing, consistency matters more than variety. Call a "function" a "function" every time.
### Detection Patterns
**Cycling technical terms**:
```markdown
<!-- BAD - same thing called 4 different names -->
The `processOrder` function validates the input. The method then
checks inventory. This procedure calculates the total. Finally,
the routine saves the order to the database.
<!-- GOOD - consistent terminology -->
The `processOrder` function validates the input, checks inventory,
calculates the total, and saves the order to the database.
```
**Cycling component names**:
```markdown
<!-- BAD -->
The `UserCard` component renders the avatar. This widget also
shows the username. The element handles click events.
<!-- GOOD -->
The `UserCard` component renders the avatar, shows the username,
and handles click events.
```
**In commit messages**:
```text
# BAD
Refactor auth module, restructure login flow, reorganize session
handling, and rearchitect token management
# GOOD
Refactor auth module: simplify login, session, and token handling
```
Common cycling sets to watch for:
- function / method / procedure / routine
- component / widget / element
- refactor / restructure / reorganize / rearchitect
**fix_safety**: Needs review. Determine which term is most accurate, then use it consistently.
---
## 5. Commit Message Inflation
### What to Look For
AI turns simple changes into grand narratives. Good commits are terse and specific.
### Detection Patterns
**Grandiose verbs for small changes**:
```text
# BAD # GOOD
Revolutionize the authentication flow Fix auth token refresh
Elevate the user experience Improve error messages
Empower the CI pipeline Add unit tests for auth
```
**Marketing language in commit bodies**:
```text
# BAD
This commit introduces a paradigm shift in how we handle database
connections, leveraging connection pooling to deliver a seamless
and robust experience for all downstream consumers.
# GOOD
Switch to connection pooling. Fixes timeout errors under load.
See #234.
```
**Overexplaining obvious changes**:
```text
# BAD
feat: Implement the crucial and fundamental addition of a
comprehensive user validation layer that ensures data integrity
# GOOD
feat: add input validation to user signup
```
### Quick Reference
| Inflated | Plain |
|----------|-------|
| "Revolutionize X" | "Fix X" / "Refactor X" |
| "Comprehensive overhaul" | "Refactor" / "Rewrite" |
| "Elevate the experience" | "Improve" |
| "Introduce a paradigm shift" | "Change" / "Switch to" |
| "Ensure robust handling" | "Fix" / "Handle" |
| "Empower users with" | "Add" |
| "Streamline the workflow" | "Simplify" |
| "Leverage cutting-edge" | "Use" |
**fix_safety**: Safe. Rewrite using Conventional Commits with plain verbs (add, fix, update, remove, refactor).
---
## Review Questions
1. Does the text contain any high-signal AI vocabulary words?
2. Are there 3+ low-signal words clustered in a single paragraph?
3. Does the writing avoid simple "is/are" in favor of complex verb phrases?
4. Are there rhetorical questions or "imagine" framing in technical docs?
5. Is the same concept referred to by multiple different terms within a section?
6. Do commit messages use dramatic verbs for routine changes?
Reference documentation patterns for API and symbol documentation. Use when writing reference docs, API docs, parameter tables, or technical specifications....
---
name: reference-docs
description: Reference documentation patterns for API and symbol documentation. Use when writing reference docs, API docs, parameter tables, or technical specifications. Triggers on reference docs, API reference, function reference, parameters table, symbol documentation.
user-invocable: false
---
# Reference Documentation Patterns
Reference documentation is information-oriented - helping experienced users find precise technical details quickly. This skill provides patterns for writing clear, scannable reference pages.
**Dependency:** Always use this skill in conjunction with `docs-style` for core writing principles.
## Purpose and Audience
- **Who:** Experienced users seeking specific information
- **Goal:** Quick lookup of technical details
- **Mode:** Not for learning, for looking up
- **Expectation:** Brevity, consistency, completeness
## Document Structure Template
Use this template when creating reference documentation:
```markdown
---
title: "[Symbol/API Name]"
description: "One-line description of what it does"
---
# [Name]
Brief description (1-2 sentences). State what it is and its primary purpose.
## Parameters
| Name | Type | Required | Description |
|------|------|----------|-------------|
| `param1` | `string` | Yes | What this parameter controls |
| `param2` | `number` | No | Optional behavior modification. Default: `10` |
## Returns
| Type | Description |
|------|-------------|
| `ReturnType` | What the function returns and when |
## Example
```language
import { symbolName } from 'package';
// Complete, runnable example showing common use case
const result = symbolName({
param1: 'realistic-value',
param2: 42
});
console.log(result);
// Expected output: { ... }
```
## Related
- [RelatedSymbol](/reference/related-symbol) - Brief description
- [AnotherSymbol](/reference/another-symbol) - Brief description
```
## Writing Principles
### Brevity Over Explanation
- State facts, not rationale
- Avoid "why" - save that for Explanation docs
- Cut unnecessary words
**Do:**
```markdown
Returns the user's display name.
```
**Avoid:**
```markdown
This function is useful when you need to get the user's display name
because it handles all the edge cases for you automatically.
```
### Scannable Tables, Not Prose
**Do:**
```markdown
| Name | Type | Description |
|------|------|-------------|
| `userId` | `string` | Unique user identifier |
| `options` | `Options` | Configuration object |
```
**Avoid:**
```markdown
The first parameter is `userId`, which should be a string containing
the unique user identifier. The second parameter is `options`, which
is an Options object containing the configuration.
```
### Consistent Format Across Entries
All reference pages for similar items should follow identical structure:
- Same heading order
- Same table columns
- Same code example format
- Same related links section
### Every Example Must Be Runnable
- Include all imports
- Show complete, working code
- Use realistic values (not "foo", "bar", "test123")
- Include expected output when helpful
## Code Example Patterns
### Show Common Use Case First
```markdown
## Example
### Basic Usage
```typescript
const user = await getUser('user-123');
console.log(user.name);
```
### With Options
```typescript
const user = await getUser('user-123', {
includeMetadata: true,
fields: ['name', 'email', 'role']
});
```
```
### Include Setup and Context
```markdown
```typescript
import { Client } from '@example/sdk';
// Initialize client (required once per application)
const client = new Client({ apiKey: process.env.API_KEY });
// Now use the function
const result = await client.users.list();
```
```
### Use Realistic Values
**Do:** `userId: 'usr_a1b2c3d4'`
**Avoid:** `userId: 'foo'`
**Do:** `email: '[email protected]'`
**Avoid:** `email: '[email protected]'`
## Parameter Documentation Patterns
### Required vs Optional
Clearly indicate which parameters are required:
```markdown
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| `apiKey` | `string` | Yes | - | Your API key |
| `timeout` | `number` | No | `30000` | Request timeout in ms |
| `retries` | `number` | No | `3` | Number of retry attempts |
```
### Complex Types
For object parameters, document the shape:
```markdown
## Parameters
| Name | Type | Required | Description |
|------|------|----------|-------------|
| `options` | `UserOptions` | No | Configuration options |
### UserOptions
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `includeDeleted` | `boolean` | No | Include soft-deleted users |
| `fields` | `string[]` | No | Fields to return |
| `limit` | `number` | No | Maximum results (default: 100) |
```
### Enum Values
Document allowed values clearly:
```markdown
| Name | Type | Values | Description |
|------|------|--------|-------------|
| `status` | `string` | `active`, `pending`, `suspended` | User account status |
```
## Return Value Documentation
### Simple Returns
```markdown
## Returns
`User` - The requested user object, or `null` if not found.
```
### Complex Returns
```markdown
## Returns
| Property | Type | Description |
|----------|------|-------------|
| `data` | `User[]` | Array of user objects |
| `pagination` | `Pagination` | Pagination metadata |
| `total` | `number` | Total matching records |
```
### Error Conditions
```markdown
## Errors
| Error | Condition |
|-------|-----------|
| `NotFoundError` | User does not exist |
| `UnauthorizedError` | Invalid or expired API key |
| `RateLimitError` | Too many requests |
```
## API Reference Specifics
### HTTP Endpoints
```markdown
## Endpoint
```http
GET /api/v1/users/{userId}
```
## Path Parameters
| Name | Type | Description |
|------|------|-------------|
| `userId` | `string` | The user's unique identifier |
## Query Parameters
| Name | Type | Required | Description |
|------|------|----------|-------------|
| `fields` | `string` | No | Comma-separated list of fields |
## Headers
| Name | Required | Description |
|------|----------|-------------|
| `Authorization` | Yes | Bearer token |
| `X-Request-ID` | No | Request tracking ID |
## Response
```json
{
"id": "usr_a1b2c3d4",
"name": "Jane Smith",
"email": "[email protected]"
}
```
```
## Component/Props Reference
For UI components:
```markdown
## Props
| Prop | Type | Default | Description |
|------|------|---------|-------------|
| `variant` | `'primary' \| 'secondary'` | `'primary'` | Visual style |
| `size` | `'sm' \| 'md' \| 'lg'` | `'md'` | Button size |
| `disabled` | `boolean` | `false` | Disable interactions |
| `onClick` | `() => void` | - | Click handler |
## Slots
| Name | Description |
|------|-------------|
| `default` | Button content |
| `icon` | Icon to display before text |
```
## Related Links Section
Always include links to related content:
```markdown
## Related
- [createUser](/reference/create-user) - Create a new user
- [updateUser](/reference/update-user) - Modify user properties
- [deleteUser](/reference/delete-user) - Remove a user
- [User Authentication Guide](/guides/authentication) - How authentication works
```
## Gates (completion order)
Use this **sequenced** workflow before treating a reference page as complete. Finish step *n* before *n+1*; each step has a **Pass** you can check on the written page alone (no “I verified internally”).
1. **Structure** — Sections match your project template (typically Parameters, Returns, Example, Related; HTTP docs add Endpoint, Path/Query, Headers, Response). **Pass:** every required section exists, or a one-line omission note appears under **Related** (e.g. “No query parameters”).
2. **Tables** — Parameters/returns/errors use tables with consistent columns per **Consistent Format Across Entries** and **Required vs Optional**. **Pass:** no blank Description cells; no `TBD` / `???` for shipped APIs.
3. **Runnable example** — At least one example meets **Every Example Must Be Runnable** and **Use Realistic Values**. **Pass:** imports included; user-visible strings are realistic (not generic `foo`/`bar` unless the API is illustrative-only).
4. **Related** — **Pass:** `## Related` contains ≥1 Markdown link to another reference or guide, or one explicit sentence that there are no related symbols.
## Checklist for Reference Pages
After the **Gates (completion order)** above, confirm:
- [ ] Title matches the symbol/API name exactly
- [ ] Description is one clear sentence
- [ ] All parameters documented with types
- [ ] Required vs optional clearly marked
- [ ] Default values specified for optional parameters
- [ ] Return type and structure documented
- [ ] At least one complete, runnable example
- [ ] Example uses realistic values
- [ ] Related pages linked
- [ ] Format matches other reference pages in the docs
Analyze and improve existing documentation using Diataxis principles
---
name: improve-doc
description: Analyze and improve existing documentation using Diataxis principles
disable-model-invocation: true
---
# Improve Doc
Analyze an existing markdown document, classify sections by Diataxis type, identify issues, and interactively refine each section.
## Arguments
- **Path:** Path to the markdown document to improve (required)
## Workflow Overview
```
/beagle-docs:improve-doc docs/guides/getting-started.md
```
The command runs in two phases:
1. **Analysis Phase:** Parse document, classify sections, identify issues
2. **Refinement Phase:** Interactive loop to improve each section
## Gates
Hard sequencing — advance only when the **pass condition** is met (artifact or explicit user input, not assumed).
**Before Phase 2 (refinement):**
1. **Read** — Full contents of the file at **Path** are loaded.
- **Pass:** Enumerated sections (from `#` / `##` / `###` headings) cover every heading in the file; titles match the source.
2. **Core skill** — `beagle-docs:docs-style` is loaded (or its path read) before classification.
- **Pass:** Analysis output reflects at least one concrete principle from that skill (name it or quote briefly).
3. **Handoff** — User saw an analysis summary (template in Step 5 or equivalent) and entered **`start`** to begin refinement, or **`abort`** to exit.
- **Pass:** If `abort`, no writes to **Path**. If `start`, proceed to Phase 2.
**Before overwriting the file (Phase 2, Step 4):**
1. **Choices** — Every section with open issues has a terminal outcome: applied **`yes`**, unchanged **`skip`**, or **`modify`** loop finished with **`yes`** or **`skip`**.
- **Pass:** No section left in a pending `modify` state unless the user aborted the whole session (then do not write).
2. **Skips** — Content for every **`skip`** matches the original section text (copy preserved, not paraphrased).
- **Pass:** Full block equality check against the initial read (line-for-line, including whitespace).
3. **Write** — Only after the above.
- **Pass:** Single save to **Path**; completion report notes backup or major restructure if applicable (Rules).
**Ambiguous Diataxis type** — If classification is uncertain, do not edit that section until the user answers the clarifying fork (Step 2b) or explicitly confirms your stated default.
## Phase 1: Analysis
### Step 1: Read Document
Read the target markdown file and parse into sections based on headings:
- Each `#`, `##`, `###` heading starts a new section
- Capture heading level, title, and content
- Preserve hierarchy for context
### Step 2: Load Core Skill
Load `beagle-docs:docs-style` for core writing principles that apply to all documentation types.
### Step 3: Classify Each Section
For each section, determine the Diataxis type using these indicators:
| Type | Indicators |
|------|------------|
| **Tutorial** | "Let's", "we will", step-by-step learning, builds toward a project, minimal explanation of why |
| **How-To** | "How to" title, task-focused steps, assumes prior knowledge, goal-oriented |
| **Reference** | Parameter tables, type signatures, API specs, factual descriptions, no narrative |
| **Explanation** | "Why", "because", history, trade-offs, alternatives, conceptual discussion |
**Classification rules:**
1. Check title first - "How to X" is always How-To, "Why X" is always Explanation
2. Look for structural patterns - tables with parameters/types suggest Reference
3. Analyze language - learning-oriented ("Let's learn") vs task-oriented ("To accomplish X")
4. Consider context - what comes before/after this section
5. Mark as "Mixed" if section blends types (this is an issue to fix)
### Step 4: Identify Issues
For each section, check for issues based on its detected type:
**Tutorial issues:**
- Explains "why" instead of just guiding the learner
- Skips steps assuming prior knowledge
- No clear learning outcome
- Missing "you will build/learn" framing
**How-To issues:**
- Includes explanatory tangents
- Missing prerequisites
- Steps not atomic (multiple actions per step)
- No verification that goal was achieved
**Reference issues:**
- Missing parameter types or return values
- Narrative text instead of factual description
- Incomplete coverage of options/parameters
- No code examples
**Explanation issues:**
- Includes procedural steps
- Missing context for "why"
- No trade-offs or alternatives discussed
- Reads like reference material
**Cross-type issues (any section):**
- Mixed Diataxis types in single section
- Unclear who the audience is
- Missing or vague heading
- Wall of text without structure
### Step 5: Present Analysis
Display analysis summary to user:
```markdown
## Document Analysis
**File:** `docs/guides/getting-started.md`
**Sections found:** 8
**Estimated time:** ~15 minutes to refine
### Type Breakdown
| Type | Sections | Health |
|------|----------|--------|
| Tutorial | 2 | 1 issue |
| How-To | 3 | 4 issues |
| Reference | 1 | Clean |
| Explanation | 1 | 2 issues |
| Mixed | 1 | Needs split |
### Top Issues
1. **Section "Setting Up"** (How-To): Contains explanatory tangent about architecture
2. **Section "Configuration Options"** (Mixed): Blends reference table with tutorial steps
3. **Section "Authentication"** (How-To): Missing prerequisites, steps not atomic
4. **Section "Why We Built This"** (Explanation): Includes procedural steps
### Ready to Refine?
I'll go through each section with issues. For each one, you can:
- **yes** - Accept the proposed improvement
- **skip** - Keep original, move to next section
- **modify** - Tell me what to change about the proposal
Type "start" to begin refinement, or "abort" to exit without changes.
```
## Phase 2: Interactive Refinement
### Step 1: Load Type-Specific Skills
As you encounter each section type, load the relevant skill if not already loaded:
- Tutorial sections: `beagle-docs:tutorial-docs`
- How-To sections: `beagle-docs:howto-docs`
- Reference sections: `beagle-docs:reference-docs`
- Explanation sections: `beagle-docs:explanation-docs`
### Step 2: Refinement Loop
For each section with issues, in document order:
#### 2a: Show Current State
```markdown
---
## Section 3 of 5: "Setting Up" (How-To)
### Current Content
> ## Setting Up
>
> Before we begin, it's important to understand why the architecture
> works this way. The system uses a microservices pattern because...
> [explanatory content]
>
> To set up the project:
> 1. Clone the repo and install dependencies
> 2. Configure the environment variables
> 3. Start the server
### Issues Found
1. **Explanatory tangent** (lines 1-3): How-To should assume reader knows why; move explanation to dedicated Explanation section
2. **Non-atomic steps** (step 1): "Clone and install" is two actions; split into separate steps
3. **Missing verification**: No way to confirm setup succeeded
```
#### 2b: Ask Clarifying Question (if needed)
If the type classification is uncertain:
```markdown
### Quick Question
This section has characteristics of both How-To (task steps) and Explanation (why content). How would you like to handle it?
1. **Split** - Create separate How-To and Explanation sections
2. **How-To** - Remove explanation, keep as pure How-To
3. **Explanation** - Remove steps, keep as pure Explanation
```
#### 2c: Propose Improvement
```markdown
### Proposed Improvement
> ## Setting Up
>
> **Prerequisites:** Familiarity with microservices architecture
>
> ### Steps
>
> 1. Clone the repository
> ```bash
> git clone https://github.com/example/project.git
> ```
>
> 2. Install dependencies
> ```bash
> cd project && npm install
> ```
>
> 3. Configure environment variables
> ```bash
> cp .env.example .env
> ```
>
> 4. Start the server
> ```bash
> npm start
> ```
>
> ### Verify
>
> Open http://localhost:3000 - you should see the welcome page.
**Changes made:**
- Removed explanatory content (suggest creating "Architecture Overview" section)
- Split "clone and install" into separate steps
- Added verification step
- Added prerequisites reference
---
**Your choice:** [yes / skip / modify]
```
#### 2d: Handle User Choice
**yes** - Apply the proposed changes to the section, continue to next
**skip** - Keep original content unchanged, continue to next
**modify** - User provides feedback:
```markdown
> modify: Keep the explanation but move it to a collapsible "Why?" block
### Revised Proposal
> ## Setting Up
>
> <details>
> <summary>Why this architecture?</summary>
>
> The system uses a microservices pattern because...
>
> </details>
>
> ### Steps
> [same as before]
**Your choice:** [yes / skip / modify]
```
### Step 3: Handle Mixed Sections
For sections classified as "Mixed":
```markdown
---
## Section 5 of 5: "Configuration Options" (Mixed)
### Current Content
> ## Configuration Options
>
> Let's walk through configuring your application. First, you'll need
> to understand the available options:
>
> | Option | Type | Default | Description |
> |--------|------|---------|-------------|
> | port | number | 3000 | Server port |
> | debug | boolean | false | Enable debug mode |
>
> Now let's configure each one step by step...
### Issues Found
1. **Mixed types**: Tutorial framing ("Let's walk through") with Reference content (options table)
### Recommendation
Split into two sections:
1. **Reference section** - "Configuration Reference" with the options table
2. **Tutorial section** - "Configuring Your First App" with learning-oriented walkthrough
Would you like me to:
1. **Split** - Create both sections
2. **Reference only** - Keep just the table, remove tutorial framing
3. **Tutorial only** - Expand into full tutorial, move table to appendix
```
### Step 4: Write Updated Document
After all sections processed:
1. **Build updated content** from accepted changes
2. **Preserve unchanged sections** exactly as they were
3. **Overwrite original file** with updated content
### Step 5: Report Results
```markdown
## Refinement Complete
**File:** `docs/guides/getting-started.md`
### Changes Summary
| Section | Action | Type |
|---------|--------|------|
| Setting Up | Improved | How-To |
| Configuration Options | Split | Reference + Tutorial |
| Authentication | Improved | How-To |
| Why We Built This | Skipped | Explanation |
### Sections Modified
- **Setting Up**: Removed tangent, split steps, added verification
- **Configuration Options**: Split into "Configuration Reference" and "Configuring Your App"
- **Authentication**: Added prerequisites, made steps atomic
### New Sections Created
- **Configuration Reference** (Reference): Options table from split
- **Configuring Your App** (Tutorial): Learning walkthrough from split
### Recommendations
Consider creating these additional documents:
- `docs/explanation/architecture-overview.md` - For content removed from "Setting Up"
The original file has been updated.
```
## Rules
- Follow **Gates** for phase transitions and before saving to **Path**
- Always load `docs-style` skill before analysis
- Load type-specific skills lazily as sections are encountered
- Never modify the file until refinement phase completes (see Gates: overwrite)
- Preserve sections marked "skip" exactly as-is
- When splitting sections, maintain logical reading order
- Ask clarifying questions when type classification is ambiguous (confidence < 70%)
- For "Mixed" sections, always offer split as the first option
- Include specific line references when identifying issues
- Show diff-style changes in proposals when helpful
- Respect user's "modify" feedback - iterate until they say "yes" or "skip"
- Create backup note in output if major restructuring occurred
Rewrite AI-generated developer text to sound human — fix inflated language, filler, tautological docs, and robotic tone. Use after review-ai-writing identifi...
---
name: humanize-ai-writing
description: Rewrite AI-generated developer text to sound human — fix inflated language, filler, tautological docs, and robotic tone. Use after review-ai-writing identifies issues.
disable-model-invocation: true
user-invocable: true
dependencies:
- docs-style
- review-ai-writing
---
# Humanize
Apply fixes from a previous `review-ai-writing` run with automatic safe/risky classification.
## Usage
```text
/beagle-docs:humanize-ai-writing [--dry-run] [--all] [--category <name>]
```
**Flags:**
- `--dry-run` - Show what would be fixed without changing files
- `--all` - Fix entire codebase (runs review with --all first)
- `--category <name>` - Only fix specific category: `content|vocabulary|formatting|communication|filler|code_docs`
## Instructions
### 1. Parse Arguments
Extract flags from `$ARGUMENTS`:
- `--dry-run` - Preview mode only
- `--all` - Full codebase scan
- `--category <name>` - Filter to specific category
### 2. Pre-flight Safety Checks
```bash
# Check for uncommitted changes
git status --porcelain
```
If working directory is dirty, warn:
```text
Warning: You have uncommitted changes. Creating a git stash before proceeding.
Run `git stash pop` to restore if needed.
```
Create stash if dirty:
```bash
git stash push -u -m "beagle-docs: pre-humanize backup"
```
### 3. Load Review Results
Check for existing review file:
```bash
cat .beagle/ai-writing-review.json 2>/dev/null
```
**If file missing:**
- If `--all` flag: Run `/beagle-docs:review-ai-writing --all` first
- Otherwise: Fail with: "No review results found. Run `/beagle-docs:review-ai-writing` first."
**If file exists, validate freshness:**
```bash
# Get stored git HEAD from JSON
stored_head=$(jq -r '.git_head' .beagle/ai-writing-review.json)
current_head=$(git rev-parse HEAD)
if [ "$stored_head" != "$current_head" ]; then
echo "Warning: Review was run at commit $stored_head, but HEAD is now $current_head"
fi
```
If stale, prompt: "Review results are stale. Re-run review? (y/n)"
### 4. Load Reference Material
Read the appropriate reference files based on the findings being fixed:
- Read `references/vocabulary-swaps.md` when applying `ai_vocabulary_high` or `ai_vocabulary_low` fixes
- Read `references/fix-strategies.md` for strategy details and before/after examples for any category
- Read `references/developer-voice.md` for tone/register guidance when rewriting prose
Only load what you need — if fixing only vocabulary, skip the voice guide.
### 5. Filter Findings
If `--category` is set, filter findings to that category only.
Partition remaining findings by `fix_safety`:
**Safe Fixes** (auto-apply):
- `chat_leak` - Delete conversational artifacts
- `cutoff_disclaimer` - Delete knowledge cutoff references
- `filler_phrase` - Delete filler phrases
- `heading_restatement` - Delete restating first sentence
- `emoji_decoration` - Remove emoji from technical text
- `boldface_overuse` - Remove excessive bold formatting
- `ai_vocabulary_high` - Swap high-signal AI words
- `narrating_obvious` - Delete obvious code comments
- `synthetic_opener` - Delete "In today's..." openers
- `sycophantic_tone` - Delete or neutralize praise
- `vague_authority` - Delete unattributed claims
- `excessive_hedging` - Remove qualifiers
- `generic_conclusion` - Delete summary padding
- `copula_avoidance` - Use "is/are" naturally
- `rhetorical_device` - Delete rhetorical questions
- `em_dash_overuse` - Replace formulaic em dashes with commas, parentheses, or colons
- `thematic_break` - Remove horizontal rules before headings
- `title_case_heading` - Convert AI title-case headings to sentence case
- `curly_quotes` - Normalize curly quotes/apostrophes to straight
- `negative_parallelism` - Delete "Not just X, but also Y" filler constructions
- `challenges_and_prospects` - Delete "Despite its... faces challenges..." formulaic wrappers
**Needs Review Fixes** (require confirmation):
- `promotional_language` - Rewrite with specifics
- `formulaic_structure` - Restructure sections
- `synonym_cycling` - Pick consistent term
- `commit_inflation` - Rewrite commit scope
- `tautological_docstring` - Rewrite or delete docstring
- `exhaustive_enumeration` - Trim parameter docs
- `this_noun_verbs` - Rewrite docstring voice
- `ai_vocabulary_low` - Reduce cluster density
- `apologetic_error` - Rewrite error message
- `rule_of_three` - Simplify three-item lists used as filler comprehensiveness
- `inline_header_list` - Restructure boldfaced inline-header vertical lists
- `unnecessary_table` - Convert small tables to prose
- `regression_to_mean` - Restore specific facts replaced by vague praise
### 6. Apply Safe Fixes
If `--dry-run`:
```markdown
## Safe Fixes (would apply automatically)
| # | File | Line | Type | Action |
|---|------|------|------|--------|
| 1 | README.md | 3 | synthetic_opener | Delete "In today's rapidly evolving..." |
| 2 | src/auth.py | 15 | narrating_obvious | Delete "# Check if user exists" |
| 3 | README.md | 42 | ai_vocabulary_high | Replace "utilize" with "use" |
...
```
Otherwise, apply fixes grouped by file to minimize file I/O:
1. Sort findings by file, then by line number (descending, to avoid offset drift)
2. For each file, apply all safe fixes in reverse line order
3. For git artifacts (`git:commit:*`, `git:pr:*`), skip — these can't be auto-fixed. Report them for manual attention.
### 7. Handle Needs Review Fixes
If `--dry-run`, list them:
```markdown
## Needs Review Fixes (would prompt interactively)
| # | File | Line | Type | Original | Suggested |
|---|------|------|------|----------|-----------|
| 4 | README.md | 8 | promotional_language | "powerful, enterprise-grade solution" | "authentication library" |
...
```
Otherwise, for each fix, prompt interactively:
```text
[README.md:8] Promotional language: "powerful, enterprise-grade solution"
Suggested: "authentication library"
(y)es / (n)o / (e)dit / (s)kip all:
```
Track user choices:
- `y` - Apply this fix as suggested
- `n` - Skip this fix
- `e` - User provides custom replacement
- `s` - Skip all remaining interactive fixes
### 8. Validate Results
For each modified markdown file, verify basic validity:
```bash
# Check for broken markdown (unclosed code blocks, broken links)
# Simple check: matching ``` pairs
grep -c '```' "$file" | awk '{print ($1 % 2 == 0) ? "OK" : "WARNING: odd number of code fences"}'
```
For modified source files, check syntax is still valid:
**Python:**
```bash
python3 -c "import ast; ast.parse(open('$file').read())"
```
**TypeScript/JavaScript:**
```bash
npx -y acorn --ecma2020 "$file" > /dev/null 2>&1
```
If validation fails for any file, revert that file:
```bash
git checkout -- "$file"
echo "Reverted $file due to validation failure"
```
### 9. Report Results
```markdown
## Humanize Summary
### Applied Fixes
- [x] README.md:3 - Deleted synthetic opener
- [x] README.md:42 - Replaced "utilize" with "use"
- [x] src/auth.py:15 - Deleted obvious comment
### Interactive Fixes
- [x] README.md:8 - Rewrote promotional language (user approved)
- [ ] docs/guide.md:22 - Skipped by user
### Skipped (Git Artifacts)
- [ ] git:commit:abc1234 - Chat leak in commit message (amend manually)
### Validation
- README.md: OK
- src/auth.py: OK
### Diff Summary
```
```bash
git diff --stat
```
### 10. Cleanup
On successful completion (all validations pass):
```bash
rm .beagle/ai-writing-review.json
```
If any validation fails, keep the file and report:
```text
Review file preserved at .beagle/ai-writing-review.json
Fix issues and re-run, or restore with: git stash pop
```
## Core Principles
1. **Delete first, rewrite second.** Most AI patterns are padding. Removing them improves the text.
2. **Use simple words.** Replace "utilize" with "use", "facilitate" with "help", "implement" with "add".
3. **Keep sentences short.** Break compound sentences. One idea per sentence.
4. **Preserve meaning.** Never change what the text says, only how it says it.
5. **Match the register.** Commit messages are terse. READMEs are conversational. API docs are precise. Read `references/developer-voice.md` for the full register guide.
6. **Don't overcorrect.** A slightly formal sentence is fine. Only fix patterns that read as obviously AI-generated.
7. **Understand regression to the mean.** LLMs produce the most statistically likely output. Specific, unusual facts get replaced with generic, positive descriptions. When humanizing, restore specificity — replace vague praise with concrete details.
8. **Score density, not individual words.** AI vocabulary words co-occur. One or two may be coincidental; a cluster of 3+ is a strong AI tell.
## Example
```bash
# Preview all fixes without applying
/beagle-docs:humanize-ai-writing --dry-run
# Fix only vocabulary issues
/beagle-docs:humanize-ai-writing --category vocabulary
# Full codebase scan and fix
/beagle-docs:humanize-ai-writing --all
# Preview filler fixes only
/beagle-docs:humanize-ai-writing --category filler --dry-run
```
## Rules
- Always load reference material before applying fixes (step 4)
- Never modify files without a stash or clean working directory
- Apply safe fixes in reverse line order to avoid offset drift
- Never auto-fix git artifacts (commits, PRs) — report them for manual action
- Validate every modified file before considering it done
- Revert files that fail validation
- Write JSON report before displaying summary
- Clean up JSON report only on full success
FILE:references/developer-voice.md
# Developer Voice Guidelines
Good developer writing is:
- **Conversational but precise.** Write like you'd explain it to a colleague, but get the details right.
- **Direct.** State opinions. "Use X" not "You might consider using X".
- **Terse where appropriate.** Commit messages and code comments should be short. Don't pad them.
- **Specific.** Replace vague claims with concrete details, numbers, or examples.
- **Consistent.** Pick one term and stick with it. Don't cycle synonyms.
## Register Guide
Match tone and length to the artifact type.
| Artifact | Tone | Length | Example |
|----------|------|--------|---------|
| Commit message | Terse, imperative | 50-72 chars | `fix: prevent nil panic in auth middleware` |
| Code comment | Brief, explains why | 1-2 lines | `// retry once — transient DNS failures are common in k8s` |
| Docstring | Precise, adds value | What the name doesn't tell you | `"""Raises ConnectionError after 3 retries."""` |
| PR description | Structured, factual | Context + what changed + how to test | Bullet points, not paragraphs |
| README | Conversational, scannable | As short as possible | Start with what it does, then how to use it |
| Error message | Actionable, specific | What happened + what to do | `Config file not found at ~/.app/config.yml. Run 'app init' to create one.` |
FILE:references/fix-strategies.md
# Fix Strategies by Category
Strategies for each finding type, organized by category. Each entry includes the fix approach, risk level, and before/after examples.
## Table of Contents
- [Content Patterns](#content-patterns)
- [Vocabulary Patterns](#vocabulary-patterns)
- [Formatting Patterns](#formatting-patterns)
- [Communication Patterns](#communication-patterns)
- [Filler Patterns](#filler-patterns)
- [Code Docs Patterns](#code-docs-patterns)
---
## Content Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Promotional language | Replace superlatives with specifics | Needs review |
| Vague authority | Delete the claim or add a citation | Safe |
| Formulaic structure | Remove the intro/conclusion wrapper | Needs review |
| Synthetic openers | Delete the opener, start with the point | Safe |
| Negative parallelism | Delete the filler construction, keep the point | Safe |
| Rule of three | Simplify to what matters, drop padding items | Needs review |
| Challenges-and-prospects | Delete the formulaic "Despite..." wrapper | Safe |
### Regression to the Mean
LLMs produce the most statistically likely output. Specific, unusual facts get replaced with generic, positive descriptions. When humanizing, restore specificity — replace vague praise with concrete details.
**Before:**
```markdown
In today's rapidly evolving software landscape, authentication is a crucial
component that plays a pivotal role in securing modern applications.
```
**After:**
```markdown
This guide covers authentication setup for the API.
```
### Negative Parallelism
LLMs produce "Not just X, but also Y" and "Not X, but Y" constructions that add words without adding meaning. Delete the construction, keep the point.
**Before:**
```markdown
This is not just a logging library, but also a comprehensive observability
framework that empowers developers to gain valuable insights.
```
**After:**
```markdown
This is a logging library with structured output and trace correlation.
```
### Rule of Three
LLMs overuse three-item lists ("adjective, adjective, adjective") to make superficial analyses appear comprehensive. Keep only items that carry specific meaning.
**Before:**
```markdown
The event features keynote sessions, panel discussions, and networking opportunities.
```
**After (keep only what's specific):**
```markdown
The event includes keynote sessions and breakout workshops.
```
### Challenges-and-Prospects Formula
Rigid formula: "Despite its [positive words], [subject] faces challenges..." ending with vague positive assessment. Delete the wrapper, state the actual limitation with specifics.
**Before:**
```markdown
Despite its robust architecture, the system faces challenges typical of
distributed environments. Despite these challenges, with its strategic
design and ongoing improvements, the platform continues to thrive.
```
**After:**
```markdown
Known limitations: network partitions can cause stale reads for up to 30s.
See the consistency model docs for details.
```
---
## Vocabulary Patterns
| Type | Strategy | Risk |
|------|----------|------|
| High-signal AI words | Direct word swap | Safe |
| Low-signal clusters | Reduce density, keep 1-2 | Needs review |
| Copula avoidance | Use "is/are" naturally | Safe |
| Rhetorical devices | Delete the question, state the fact | Safe |
| Synonym cycling | Pick one term, use it consistently | Needs review |
| Commit inflation | Rewrite to match actual change scope | Needs review |
### Copula Avoidance
LLMs substitute simple "is/are" with elaborate alternatives like "serves as", "stands as", "boasts", "features", "offers". Use the simple form.
**Before:**
```text
feat: Leverage robust caching paradigm to facilitate seamless data retrieval
```
**After:**
```text
feat: add response caching for faster reads
```
See `references/vocabulary-swaps.md` for the complete word swap table.
---
## Formatting Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Boldface overuse | Remove bold from non-key terms | Safe |
| Emoji decoration | Remove emoji from technical content | Safe |
| Heading restatement | Delete the restating sentence | Safe |
| Title case headings | Convert to sentence case | Safe |
| Em dash overuse | Replace with commas, parentheses, or colons | Safe |
| Thematic breaks | Remove horizontal rules before headings | Safe |
| Curly quotes | Normalize to straight quotes/apostrophes | Safe |
| Inline-header lists | Restructure or convert to prose | Needs review |
| Unnecessary tables | Convert small tables to prose | Needs review |
**Boldface overuse — Before:**
```markdown
## Error Handling
**Error handling** is a **critical** aspect of building **reliable** applications.
The `handleError` function **catches** and **processes** all **runtime errors**.
```
**After:**
```markdown
## Error Handling
The `handleError` function catches runtime errors and logs them with context.
```
**Em dash overuse — Before:**
```markdown
The parser — which handles all input formats — validates each field — including nested objects — before returning.
```
**After:**
```markdown
The parser validates each field (including nested objects) before returning. It handles all input formats.
```
**Title case — Before:**
```markdown
## Strategic Negotiations And Global Partnerships
```
**After:**
```markdown
## Strategic negotiations and global partnerships
```
---
## Communication Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Chat leaks | Delete entirely | Safe |
| Cutoff disclaimers | Delete entirely | Safe |
| Sycophantic tone | Delete or neutralize | Safe |
| Apologetic errors | Rewrite as direct error message | Needs review |
**Before:**
```python
# Great implementation! This elegantly handles the edge case.
# As of my last update, this API endpoint supports JSON.
```
**After:**
```python
# Handles the re-entrant edge case from issue #42.
# This endpoint accepts JSON.
```
---
## Filler Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Filler phrases | Delete the phrase | Safe |
| Excessive hedging | Remove qualifiers, state directly | Safe |
| Generic conclusions | Delete the conclusion paragraph | Safe |
**Before:**
```markdown
It's worth noting that the configuration file might potentially need to be
updated. Going forward, this could possibly affect performance.
```
**After:**
```markdown
Update the configuration file. This affects performance.
```
---
## Code Docs Patterns
| Type | Strategy | Risk |
|------|----------|------|
| Tautological docstrings | Delete or add real information | Needs review |
| Narrating obvious code | Delete the comment | Safe |
| "This noun verbs" | Rewrite in active/direct voice | Safe |
| Exhaustive enumeration | Keep only non-obvious params | Needs review |
**Before:**
```python
def get_user(user_id: int) -> User:
"""Get a user.
This method retrieves a user from the database by their ID.
Args:
user_id: The ID of the user to get.
Returns:
User: The user object.
Raises:
ValueError: If the user ID is invalid.
"""
return db.query(User).get(user_id)
```
**After:**
```python
def get_user(user_id: int) -> User:
"""Raises UserNotFound if ID doesn't exist in the database."""
return db.query(User).get(user_id)
```
FILE:references/vocabulary-swaps.md
# Vocabulary Swap Reference
Direct word replacements for high-signal and medium-signal AI vocabulary. Score density, not individual words — a cluster of 3+ AI words in proximity is one of the strongest AI tells.
## High-Signal Words
These words spiked in frequency after 2022 and co-occur in AI-generated text.
| AI Word | Replacement |
|---------|-------------|
| utilize | use |
| leverage (as "use") | use |
| delve | look at, explore, examine |
| facilitate | help, enable, let |
| endeavor | try, work, effort |
| harnessing | using |
| paradigm | approach, model, pattern |
| whilst | while |
| furthermore | also, and |
| moreover | also, and |
| robust (non-technical) | reliable, solid, strong |
| seamless | smooth, easy |
| cutting-edge | modern, latest, new |
| pivotal | important, key |
| elevate | improve |
| empower | let, enable |
| revolutionize | change, improve |
| unleash | release, enable |
| synergy | (delete — rarely means anything) |
| embark | start, begin |
| meticulous/meticulously | careful, thorough |
| intricate/intricacies | complex, details |
| tapestry | (delete or rewrite — never means anything useful) |
| testament | proof, sign, evidence |
| garner | get, earn, attract |
| interplay | interaction, relationship |
| landscape | (delete or use specific noun) |
## Medium-Signal Words
Less distinctive individually, but meaningful in clusters.
| AI Word | Replacement |
|---------|-------------|
| bolstered | supported, strengthened |
| fostering | building, encouraging |
| showcasing | showing |
| underscore | show, highlight |
| enhance | improve |
| crucial | important |
| vibrant | (delete or use specific adjective) |
| nestled | located, in |
| groundbreaking | new, first |
| renowned | well-known, popular |
## Era Context
AI vocabulary shifts across model generations. Words co-occur: where one appears, others cluster nearby.
- **2023-mid 2024 (GPT-4 era):** delve, tapestry, meticulous, intricate, garner, interplay, testament, vibrant
- **Mid 2024-mid 2025 (GPT-4o era):** bolstered, fostering, showcasing, align with, underscore, enhance
- **Mid 2025+ (GPT-5 era):** showcasing, highlighting, emphasizing, enhance (plus notability/attribution words)
## What NOT to Flag
Do NOT treat these as AI indicators (high false-positive rate):
- Perfect grammar alone (many humans write well)
- Formal or academic prose (correlation is with specific words, not formality)
- Transition words alone (only a few specific transitions are AI-overused)
- Mixed casual/formal registers (common in technical fields)
Use when reviewing, critiquing, or stress-testing an existing strategy document. Evaluates seven dimensions — diagnosis quality, guiding policy strength, act...
---
name: strategy-review
description: "Use when reviewing, critiquing, or stress-testing an existing strategy document. Evaluates seven dimensions \u2014 diagnosis quality, guiding policy strength, action coherence, assumption exposure, falsifiability \u2014 with optional 7S, Five Forces, Balanced Scorecard, and Hoshin Kanri lenses. Triggers on: review my strategy, poke holes in this plan, what's weak here, strategy audit, red team this. Does NOT build strategy (use strategy-interview) or brainstorm project ideas (use brainstorm-beagle)."
user-invocable: true
---
# Strategy Review
Pressure-test strategy documents to find where they'll break before reality does it for you. The primary job isn't to evaluate prose quality or check formatting — it's to find the gaps, hidden failure paths, and under-accounted risks that will kill the strategy in execution. A strategy that survives this review has a meaningfully better chance of surviving contact with the real world.
This skill complements the `beagle-analysis:strategy-interview` skill. Strategy-interview helps *build* a strategy through guided conversation; strategy-review subjects an existing strategy to rigorous adversarial evaluation using the same kernel framework (diagnosis, guiding policy, coherent actions) and bad-strategy filter.
## What makes this different from generic feedback
Most strategy feedback falls into two useless categories: vague praise ("this is really thoughtful") or surface-level nitpicking ("consider adding a timeline"). Neither helps the author see whether their thinking actually holds up under stress.
This review does three things that generic feedback doesn't:
1. **Tests structural integrity** — whether the logical chain from diagnosis through guiding policy to coherent actions actually holds together, or whether the "therefore" between them is secretly an "and also."
2. **Hunts for failure paths** — not just what's wrong with the document, but what goes wrong in the *real world* because of what's missing. The slow drift scenario, the capability gap that stalls the load-bearing action, the competitor response nobody modeled, the political resistance the strategy pretends doesn't exist.
3. **Surfaces invisible assumptions** — every strategy is a bet, but most strategies don't name their bets. This review finds the load-bearing assumptions the author hasn't stated and maps the risk of each one being wrong.
## Before you start
Read `references/review-dimensions.md` — it contains the seven evaluation dimensions and their criteria. This is the backbone of the review. Keep it in working memory throughout.
## Hard gates (evidence-bound)
These steps are easy to rationalize without evidence; each gate has a **pass condition** before you advance.
1. **Ratings (Step 3).** Do not assign Strong, Adequate, or Weak for any dimension until you have at least one of: a quoted or section-referenced passage from the strategy inputs, a `strategy-notes.md` (or equivalent) cross-reference, or an explicit **Missing** note stating that no relevant passage exists. **Pass:** every dimension that is rated has one of those anchors on record (in chat, in `dimension-ratings.md` if using durable state, or inline in the final review).
2. **Critical findings (Step 5).** Do not list an item under Critical findings unless it ties to the same kind of anchor (quote, section ref, notes cross-ref, or explicit absence). **Pass:** no critical finding is only a generic critique with no tie to the document or notes.
3. **Judge artifact mode.** **Order:** finalize prose `strategy-review.md` → derive `strategy-review.json` from that prose → run the validation checklist in `references/judge-artifact-schema.md` → emit or save JSON only after the checklist passes (parses, required fields, score arithmetic). **Pass:** JSON validates and matches the prose labels and evidence.
4. **Durable state (when `.beagle/.../reviews/.../` exists).** Before you call the review complete, ensure `review-state.md` exists and its `current_step` and lens/kernel fields match what you actually did (Steps 1–5 and files on disk). **Pass:** ledger reflects the true step and inputs; if `dimension-ratings.md` or `kernel-extraction.md` were created, the final `strategy-review.md` does not contradict them.
## Complementary review lenses
The kernel evaluation (diagnosis, guiding policy, coherent actions) is always the backbone. Four additional lenses load conditionally when the strategy document warrants them. **Do not force them.** Most reviews use one or two; some use none. A lens loads when the document's content triggers it — not as a mandatory checklist.
Lenses sharpen the existing seven dimensions rather than adding new ones. When a lens reveals a gap, the finding belongs under the relevant dimension. The lens is the tool that found the gap; the dimension is where it lives.
| Lens | When it loads | What it adds | Primary dimensions |
|------|--------------|-------------|-------------------|
| **McKinsey 7S** (Execution Alignment) | Strategy requires organizational change — new structure, processes, cultural shift, or cross-functional coordination | Checks whether the strategy accounts for alignment across structure, systems, shared values, style, staff, and skills. The most common gap: the strategy changes direction but assumes the organization will follow. | Dim 3, Dim 6 |
| **Balanced Scorecard** (Objective Translation) | Success criteria are one-dimensional (usually financial only) or vague | Checks whether success is measurable across financial, customer, process, and learning perspectives. Financial metrics are lagging — the strategy needs leading indicators that signal problems before revenue confirms them. | Dim 7 |
| **Porter's Five Forces** (Competitive Pressure Audit) | Diagnosis addresses competitive dynamics — market position, pricing pressure, new entrants, substitution risk | Checks whether the diagnosis saw the full competitive picture, not just direct rivalry. Many strategies diagnose one force while ignoring the others. | Dim 1 |
| **Hoshin Kanri** (Strategy Deployment) | Strategy involves multiple organizational levels — actions need to cascade from executive intent to team execution | Checks whether actions survive translation from strategy to operations. Strategy fails at the translation layers — each level of the org loses fidelity. | Dim 3 |
**Lens selection happens during Step 2** (reading and extracting the kernel). After identifying the kernel elements, mentally check each lens trigger. Load `references/review-lenses.md` for the relevant lenses and weave their pressure-test questions into the Step 3 dimension evaluation.
The reviewer uses lens *thinking* to find gaps — not lens *vocabulary* to lecture the user. A finding that says "the strategy changes direction but doesn't address whether the current org structure supports the new approach" is a 7S-informed finding. The user doesn't need to know it came from McKinsey's framework.
## Review workflow
### Step 1 — Gather the documents
Ask the user what they want reviewed. Typical inputs:
- **`strategy-draft.md` + `strategy-notes.md`** — the pair produced by strategy-interview. The notes file dramatically enriches the review because it contains assumptions, open questions, and the reasoning journey. Always ask for both if only one is provided.
- **A standalone strategy document** — any doc that claims to be a strategy. Could be a Google Doc paste, a PDF, a deck summary, or a markdown file.
- **A slide deck or brief** — extract the strategic claims and evaluate those. Don't critique slide design.
If the document is ambiguous about what it's trying to be (strategy? plan? vision? goals?), name it: "This reads more like a goals document than a strategy — it describes what you want to achieve but not the theory of how. Want me to review it as-is, or should we first identify the missing strategic elements?"
Check the working directory for existing strategy files before asking the user to provide them — they may have already been produced by a previous strategy-interview session.
### Step 2 — Read and extract the kernel
Read the full document. Identify (or note the absence of) the three kernel elements:
1. **Diagnosis** — What does the document say is actually going on? Is there a clear statement of the challenge?
2. **Guiding policy** — What overall approach has been chosen? Does it make a directional choice?
3. **Coherent actions** — What concrete steps carry out the policy? Do they reinforce each other?
If the document uses different terminology (OKRs, pillars, strategic priorities, initiatives), map their concepts to the kernel. Don't force a vocabulary change — evaluate the thinking underneath the labels. A "strategic priority" that functions as a guiding policy should be evaluated as one.
Also note:
- Whether any complementary interview lenses were applied (landscape mapping, choice cascade, value innovation) — if so, note which ones for the interview lens audit in Step 3
- The stated scope and timeframe
- The intended audience
- Any assumptions listed (or conspicuously absent)
### Step 3 — Evaluate against the seven dimensions
Work through each dimension from `references/review-dimensions.md`. For each:
1. **Assess** — What does the document do well or poorly on this dimension?
2. **Find evidence** — Quote or cite specific passages. Don't make claims you can't point to.
3. **Rate** — Assign a rating (Strong / Adequate / Weak / Missing) only after step 2 satisfies **Hard gates → Ratings** (anchor on record before the label).
4. **Recommend** — If Weak or Missing, what specifically should the author do? Not "improve the diagnosis" — rather, "the diagnosis names 'market shift' as the challenge but doesn't specify *which* shift or *why it matters for this company specifically*. A stronger version would identify the one or two structural changes that create the opening or threat."
The seven dimensions (summarized here, detailed in the reference):
| # | Dimension | Core question |
|---|-----------|--------------|
| 1 | **Diagnosis quality** | Does it name the actual challenge with enough specificity to be wrong? |
| 2 | **Guiding policy strength** | Does it make a real choice that rules things out? |
| 3 | **Action coherence** | Do the actions reinforce each other and carry out the policy? |
| 4 | **Kernel chain** | Does the logical chain from diagnosis -> policy -> actions hold? |
| 5 | **Bad-strategy patterns** | Are any of the four Rumelt hallmarks (plus one additional anti-pattern) present? |
| 6 | **Assumption exposure** | Are load-bearing assumptions identified and testable? |
| 7 | **Specificity and falsifiability** | Could you tell in 12 months whether this strategy worked or failed? |
### Step 3b — Interview lens audit (when interview lenses appear in the document)
If the strategy document or its companion notes show evidence that landscape mapping (Wardley), strategic choice cascade (Playing to Win), or value innovation (Blue Ocean) lenses were applied during the interview, audit whether those lens findings actually made it into the strategy. Load `references/interview-lens-audit.md` for the detailed audit criteria for each lens.
The audit checks two things for each lens:
1. **Fidelity** — Did the lens's specific findings survive into the draft, or were they dropped or softened?
2. **Impact** — Did the lens findings actually sharpen the kernel (diagnosis, guiding policy, coherent actions), or were they acknowledged but left decorative?
Record findings under the relevant dimensions (typically Dim 1 for landscape, Dim 2-3 for cascade, Dim 1-4 for value innovation). Also flag any interview lens findings from the notes that were dropped or softened in the draft — these often represent the sharpest thinking that got smoothed away during document polishing.
**Note:** The four *review* lenses (7S, Scorecard, Five Forces, Hoshin Kanri) and the three *interview* lenses serve different purposes. Review lenses are tools the reviewer applies to find gaps. Interview lenses produced findings during the strategy-building conversation — this audit checks whether those findings survived. Don't substitute one for the other: Five Forces can pressure-test competitive claims independently, but it cannot validate whether a Wardley mapping exercise was faithfully represented in the draft.
### Step 4 — Cross-check with notes (if available)
If `strategy-notes.md` or equivalent reasoning notes are available, cross-reference:
- **Open questions**: Are any of these actually show-stoppers that should block the draft from being finalized?
- **Assumptions**: Does the draft rest on assumptions the notes flagged as unverified? If so, how severe is the exposure?
- **Bad-strategy patterns caught during interview**: Were they actually resolved in the final draft, or did they creep back in softer language?
- **Thinking evolution**: Did the draft preserve the sharpest version of the diagnosis, or did it soften during writing?
- **Lens findings**: If landscape mapping, choice cascade, or value innovation analysis was done, did those findings actually make it into the draft?
### Step 5 — Produce the review
Before writing files, check the user's intent. If they asked for quick feedback, a chat-only take, or to "poke holes in this" conversationally, deliver the findings inline — lead with Critical Findings, then the top failure path, then recommendations. Offer to write the full review file afterward. If the user asked for a formal review or didn't specify, confirm the output path before writing.
Write `strategy-review.md` following `references/review-template.md`. The structure:
1. **Review summary** — 3-4 sentences: what this strategy is trying to do, the overall assessment, and the one thing that most needs attention.
2. **Strengths** — What works well and why. Be specific. Start here so the author knows what to protect.
3. **Critical findings** — The 2-4 things that most undermine the strategy's integrity, ordered by severity. These are the items that should be fixed before the strategy is shared or acted on. Lead with these — they are the highest-value part of the review.
4. **Dimension ratings** — The seven dimensions with ratings, evidence, and recommendations. These provide the supporting detail behind the critical findings.
5. **Failure path analysis** — How this strategy could fail in the real world. See "Thinking about failure paths" below.
6. **Assumption risk map** — Load-bearing assumptions ranked by impact x uncertainty. Which ones, if wrong, would break the strategy entirely?
7. **Lens findings** — If any review lenses were applied, what they revealed that the core dimensions alone didn't catch. Omit this section if no lenses were triggered.
8. **Recommended next steps** — Concrete actions to strengthen the document, ordered by impact.
**Compose from artifacts, not memory.** If durable review state exists (see below), use those files as the primary source for the final review rather than reconstructing from the conversation. Update `review-composition.md` with the overall assessment and structure decisions, then compose `strategy-review.md` from the artifacts. If subagents are available, spawn one for document assembly — a fresh context reading structured files produces more accurate reviews.
After writing the file, give a brief chat summary: overall assessment in one sentence, the single most important finding, and the recommended next action.
## Thinking about failure paths
This is where the review delivers the most value that the author can't easily get elsewhere. Most people reviewing a strategy will tell you what's wrong with the *document*. Failure path analysis tells you what could go wrong in the *real world* because of what's in (or missing from) the document.
Strategies rarely fail through dramatic, visible crisis. They fail through predictable, quiet patterns:
**The capability gap**: The load-bearing action requires a capability the organization doesn't have and underestimates the cost of building. "Ship a native mobile app by Q3" sounds like an action item; if the team has never built native mobile, it's a six-month research project dressed up as a deliverable.
**The slow drift**: Nothing forces adherence to the guiding policy's exclusions. Quarter by quarter, exceptions creep in. "We said no enterprise, but this one deal is really big." Within a year, the strategy has been silently reversed by a thousand small yeses.
**The unmodeled response**: The strategy assumes the competitive environment is static. But the competitor reads the same market signals. If the strategy's diagnosis is correct, others will reach similar conclusions — what happens when they respond?
**The political veto**: The strategy requires stopping something that a powerful stakeholder owns. The strategy document doesn't mention this because the author assumes buy-in will materialize. It usually doesn't.
**The assumption cascade**: One key assumption turns out to be wrong, and because the actions are tightly coupled, the failure propagates. The market doesn't grow as expected, so the unit economics don't work, so the funding case falls apart, so the capability never gets built.
**The execution bottleneck**: Multiple actions depend on the same scarce resource (a key team, a specific leader, a budget line) but the strategy treats them as independent. When the bottleneck binds, the author must choose which actions to delay — and the strategy didn't provide guidance on that priority call.
**The measurement void**: No leading indicators were defined, so the organization can't tell whether the strategy is working until lagging indicators arrive (revenue, market share) — by which time it's too late to course-correct.
For each failure path you identify, describe: the scenario in concrete terms, the likelihood, what the strategy currently does to mitigate it (often nothing), and what the author could do to reduce the risk. These are the findings authors most often say changed their thinking — because they expose risks the author knew existed but hadn't made explicit.
## Posture and calibration
### Be honest, not hostile
The author spent real thinking effort on this. Acknowledge what works — genuinely, not as a softener before the bad news. But don't pull punches on structural problems. A strategy that goes to the board with a flawed diagnosis will do more damage than a reviewer who was too direct.
Frame findings as structural observations, not personal criticism: "The diagnosis addresses market dynamics but doesn't identify why this company is specifically exposed" — not "you didn't think hard enough about the diagnosis."
### Calibrate to the document's maturity
- **Early draft / working document**: Focus on the big structural issues. Don't nitpick prose or completeness — the author knows it's rough. Emphasize what's promising and what needs the most work.
- **Near-final / about to be shared**: Higher bar. Flag everything that could undermine credibility with the audience. Check that the "At a glance" section accurately represents the full document. Verify that claimed exclusions in the guiding policy aren't quietly reintroduced in the actions.
- **Post-hoc / strategy already in execution**: Focus on assumption validation. Which assumptions can now be checked against reality? Flag any that have already been falsified by events.
### Don't rewrite the strategy
The review identifies problems and recommends fixes. It does not produce an alternative strategy. If the diagnosis is fundamentally wrong, say so and explain why — but the user needs to rethink it themselves (ideally using the `beagle-analysis:strategy-interview` skill). A reviewer who rewrites the strategy is doing the author's thinking for them, which means the author won't own it.
Exception: if the document has a *missing* element (no diagnosis at all, no stated exclusions), offer a concrete example of what a good version might look like, clearly marked as illustrative, to help the author see the gap.
### Respect different frameworks
If the document uses OKRs, V2MOM, SWOT, Porter's Five Forces, or any other framework — evaluate the *thinking*, not the *format*. The kernel question is always the same: is there a diagnosis, a directional choice, and coherent actions? These may appear under different labels. Map concepts, don't force vocabulary.
Flag genuine structural gaps ("this V2MOM has methods and obstacles but no diagnosis of which obstacle matters most") rather than framework translation ("you should use the kernel format instead").
## Source discipline
When the document makes claims about markets, competitors, or trends:
- Note which claims are sourced and which are asserted without evidence.
- Flag market claims that feel like conventional wisdom rather than verified data.
- If the review depends on external facts you can't verify, say so: "The diagnosis rests on the claim that [X] — I can't verify this, but if it's wrong, the entire strategy changes."
## Variant: reviewing a strategy-interview in progress
If the user is mid-interview (they have `strategy-notes.md` but no `strategy-draft.md`, or the draft is marked `[PROVISIONAL]`):
1. Review what exists so far — the notes capture the thinking even before the draft crystallizes.
2. Focus on the strongest and weakest elements of the emerging kernel.
3. Suggest questions the user should pressure-test before finalizing.
4. Don't produce a full review — produce a "mid-interview check" with the 3-5 most important observations.
## Variant: comparative review
If the user provides two strategy documents (e.g., competing proposals, before/after versions, or strategies from different teams):
1. Review each independently first using the seven dimensions.
2. Then compare: where do they agree on the diagnosis? Where do they diverge? Is one more specific, more coherent, or more falsifiable?
3. Don't declare a winner unless the structural quality is clearly different. Two strategies can be equally rigorous and still disagree — that's a judgment call for the decision-maker, not the reviewer.
## Durable review state
Long or complex reviews lose fidelity — evidence citations drift, dimension assessments reference passages from memory instead of the document, and the final review sounds more certain than the source material supports. For reviews that warrant it, maintain working state in a `.beagle/` directory.
### When to start
Create the directory when any of these appear:
- The input document exceeds roughly 2,000 words or spans multiple files.
- The review is comparative (two or more strategy documents).
- The strategy is near-final or about to be shared with stakeholders.
- Two or more review lenses are activated.
- The document includes sourced market claims that need separate tracking.
### Directory location
- **For interview-produced artifacts**: `.beagle/strategy/<subject-slug>/reviews/<date-or-slug>/` — nests the review alongside the interview state.
- **For standalone reviews**: `.beagle/strategy-review/<subject-slug>/reviews/<date-or-doc-slug>/` — supports re-reviewing the same subject or reviewing v1/v2 documents without mixing evidence.
### Working files
| File | Purpose | Created when |
|------|---------|-------------|
| `review-state.md` | Review ledger — document metadata, lenses triggered, current step | Always, once directory exists |
| `source-evidence.md` | Document quotes and claims with provenance tags | When the document is read (Step 2) |
| `kernel-extraction.md` | Extracted kernel elements with evidence | After Step 2 |
| `dimension-ratings.md` | Per-dimension assessment, evidence, and ratings | During Step 3 |
| `assumption-risk-map.md` | Load-bearing assumptions with impact × uncertainty | During Step 3/4 |
| `failure-paths.md` | Failure scenarios with likelihood and mitigation | After dimension evaluation |
| `lens-notes.md` | Lens-specific findings | When a lens is used |
| `review-composition.md` | Pre-composition outline for the final review | Before Step 5 |
### Evidence tagging (`source-evidence.md`)
Tag each entry to prevent the final review from sounding more certain than the source material supports:
- **`doc quote`** — direct quote from the strategy document, with section reference.
- **`reviewer inference`** — derived from the document, not directly stated.
- **`unverified assumption`** — claim the strategy depends on, not confirmed in the document.
- **`source-backed`** — claim in the document that cites a source.
- **`notes cross-ref`** — finding from strategy-notes.md or interview artifacts.
```markdown
- [doc quote] "We will focus exclusively on mid-market SaaS" (Guiding Policy, p.3)
- [reviewer inference] Mid-market focus implies exiting current enterprise contracts. (Dim 2)
- [unverified assumption] TAM estimate of $2B appears unsourced. (Dim 6)
- [source-backed] "Mobile inspection usage grew 68% YoY (Gartner 2025)" (Diagnosis, p.1)
- [notes cross-ref] Interview notes flagged enterprise exit as contested. (Notes)
```
### Review state ledger (`review-state.md`)
```yaml
document: [filename(s) being reviewed]
subject: [what the strategy is for]
maturity: [early-draft / near-final / post-hoc]
audience: [who reads the strategy]
current_step: [1-5]
lenses_triggered: [7S, scorecard, five-forces, hoshin-kanri, or none]
kernel_found: [yes / partial / no]
diagnosis_summary: [one sentence]
policy_summary: [one sentence]
critical_findings_count: [number]
top_risk: [one sentence]
```
## Optional: judge artifact mode
Normal reviews produce prose (`strategy-review.md`). When the user explicitly requests machine-readable output for evaluation pipelines, RL loops, or LLM-as-judge workflows, produce an additional structured JSON artifact alongside the prose.
### When to activate
Judge artifact mode activates **only** when the user's language signals it. Look for phrases like:
- "machine-readable," "structured output," "JSON report"
- "LLM judge," "LLM-as-judge," "judge loop," "evaluation artifact"
- "RL loop," "reward signal," "evaluation pipeline"
- "produce the JSON," "judge artifact," "structured review"
**Do not activate** for normal reviews, quick takes, or chat-only feedback. If in doubt, ask: "Want me to also produce a machine-readable JSON artifact for evaluation pipelines, or just the prose review?"
### What it produces
In judge artifact mode, produce both files unless the user asks for JSON-only:
1. **`strategy-review.md`** — the standard prose review (written first)
2. **`strategy-review.json`** — structured JSON following `references/judge-artifact-schema.md`
Write the prose review first, then derive the JSON from it. This ensures scores emerge from careful analysis, not from filling a schema. Every dimension label, critical finding, and evidence quote in the JSON must be consistent with the prose.
### Production rules
1. **Scores are projections, not new judgments.** The dimension ratings in JSON map directly from the prose labels: Strong=4, Adequate=3, Weak=2, Missing=1. Do not invent intermediate scores.
2. **Every score must cite evidence.** At least one evidence entry per dimension, each tagged with provenance (`doc_quote`, `reviewer_inference`, `unverified_assumption`, `source_backed`, or `notes_cross_ref`).
3. **Distinguish document facts from reviewer inferences.** A downstream consumer filtering by provenance should be able to separate what the document says from what the reviewer concluded. This is the single most important quality signal for evaluation pipelines.
4. **All seven dimensions and all five bad-strategy patterns are always present.** Use `detected: false` for absent patterns. This gives consumers stable field counts.
5. **Blocking findings gate the reward signal.** A strategy passes (`reward_signal.pass: true`) only if the aggregate score is at least Adequate AND no findings are marked blocking.
6. **Validate before emitting.** JSON must parse, required fields must exist, scores must match labels, weighted total must be arithmetically correct. See the validation checklist in the schema reference.
### Aggregate scoring
The aggregate is a weighted composite of the seven dimension scores:
| Dimension | Weight |
|-----------|--------|
| Diagnosis quality | 20% |
| Guiding policy strength | 20% |
| Action coherence | 15% |
| Kernel chain integrity | 15% |
| Bad-strategy patterns | 10% |
| Assumption exposure | 10% |
| Specificity / falsifiability | 10% |
`weighted_total = Σ(score × weight / 100)`. Range: 1.0–4.0. The aggregate label follows: Strong (≥3.5), Adequate (≥2.5), Weak (≥1.5), Missing (<1.5).
The reward signal normalizes to 0.0–1.0: `(weighted_total - 1.0) / 3.0`.
> **Source of truth:** The canonical weight definitions, threshold table, and validation
> rules live in `references/judge-artifact-schema.md`. If these values are updated,
> update both locations.
## Reference files
- `references/review-dimensions.md` — The seven evaluation dimensions with detailed criteria, examples, and common failure patterns.
- `references/review-lenses.md` — Four complementary review lenses (7S, Scorecard, Five Forces, Hoshin Kanri) with triggers, questions, and common gaps.
- `references/interview-lens-audit.md` — Audit criteria for checking whether interview lens findings (Wardley, Playing to Win, Blue Ocean) survived into the draft.
- `references/review-template.md` — Exact structure of the `strategy-review.md` output file.
- `references/judge-artifact-schema.md` — JSON schema for machine-readable judge artifacts. Defines field structure, score mapping, evidence provenance rules, validation checklist, and relationship to prose output.
- `references/pressure-tests.md` — Expected behaviors for common review scenarios. For skill validation.
FILE:references/interview-lens-audit.md
# Interview Lens Audit
Audit criteria for checking whether interview lens findings survived into the strategy draft. Use this during Step 3b when the strategy document or companion notes show evidence that landscape mapping, strategic choice cascade, or value innovation lenses were applied during the `beagle-analysis:strategy-interview` skill.
The audit is about **fidelity and impact** — did the lens findings make it into the draft, and did they actually sharpen the kernel? A lens finding that appears in the notes but vanishes from the draft is a red flag. A lens finding that appears in the draft but didn't change anything is decorative.
---
## 1. Landscape Mapping (Wardley)
### What to look for in the notes
Wardley-informed interview notes typically contain:
- A value chain or dependency structure (user need at the top, components underneath)
- Evolution stage assessments for key components (genesis, custom-built, product, commodity)
- Tension points: building custom what's becoming commodity, treating genesis as procurement, competitor further along the evolution curve
- Inertia patterns: attachment to a component whose evolution stage has shifted
- Value-capture dynamics: which layer of the chain captures margin, and whether that's shifting
### Audit checklist
Check each item against both the notes and the draft. A "present in notes, absent in draft" finding is the most important signal.
| Criterion | What to check | Dimension |
|-----------|--------------|-----------|
| **User need anchoring** | Does the diagnosis start from a clearly stated user need, or does it start from the company's internal perspective? Wardley mapping anchors everything to the user need at the top of the chain. | Dim 1 |
| **Value-chain / dependency structure** | Does the diagnosis identify which components the strategy depends on and how they relate? Not necessarily a formal map — but awareness of the dependency chain. | Dim 1 |
| **Evolution stage awareness** | Does the diagnosis distinguish between components at different evolution stages? Look for language about commoditization, maturity, or "table stakes" vs. novel/uncertain capabilities. | Dim 1 |
| **Control and dependency points** | Does the strategy identify which components it controls vs. depends on? A strategy built on a component controlled by someone else carries risk the document should acknowledge. | Dim 6 |
| **Value-capture or inertia insights** | Did the interview surface where margin sits in the chain, or where organizational inertia is masking a shift? Does the draft reflect these insights, or did it revert to conventional framing? | Dim 1, Dim 6 |
| **Diagnosis sharpening** | Did the landscape analysis actually change or sharpen the diagnosis — or was it acknowledged but left aside? A good test: does the diagnosis contain a claim that could only come from mapping the landscape (e.g., "we're building custom what has become commodity"), or does it read the same as it would without the mapping? | Dim 4 |
### Common drop patterns
These are the ways Wardley findings most often get lost between notes and draft:
- **Abstraction wash**: The notes contain specific evolution-stage findings ("our data pipeline is custom-built but three vendors now sell this as a product") but the draft abstracts them into generic competitive language ("we face increasing competitive pressure").
- **Inertia silence**: The notes identify organizational inertia around a commoditizing capability, but the draft treats the current approach as a strength rather than a liability.
- **Missing dependency risk**: The notes flag a critical dependency on a component controlled by another party, but the draft's assumption exposure doesn't mention it.
- **Value-chain flattening**: The notes describe a multi-layer value chain with different dynamics at each layer, but the draft collapses it into a single "our market" description.
---
## 2. Strategic Choice Cascade (Playing to Win)
### What to look for in the notes
Cascade-informed interview notes typically contain:
- A winning aspiration — concrete picture of what winning looks like
- Where-to-play choices — specific segments, geographies, or value-chain positions chosen (and excluded)
- How-to-win mechanism — the structural advantage, not just "better execution"
- Capability assessment — 3-5 reinforcing capabilities required, with gap identification
- Management systems — processes, metrics, and feedback loops to keep the strategy on track
### Audit checklist
| Criterion | What to check | Dimension |
|-----------|--------------|-----------|
| **Winning aspiration clarity** | Does the draft contain a concrete picture of success that two people would agree on, or has it softened into a generic vision statement? | Dim 7 |
| **Where-to-play specificity** | Does the guiding policy name specific segments *and* exclusions? If the cascade surfaced a narrow playing field, check whether the draft expanded it back to "everyone." | Dim 2 |
| **How-to-win mechanism** | Does the guiding policy name a structural advantage — not just "superior product" but the specific asymmetry? If the cascade identified why competitors can't copy the advantage, does that reasoning appear? | Dim 2 |
| **Capability gaps in assumptions** | Did capability gaps identified during the cascade appear in the assumption exposure or as caveats in the actions? Or were they silently assumed away? | Dim 6 |
| **Management systems in actions** | Did the cascade's management systems findings — measurement, reinforcement, drift prevention — make it into the coherent actions? This is the most commonly dropped cascade element. | Dim 3 |
| **Policy/action sharpening** | Did the cascade choices actually tighten the guiding policy and actions, or were they acknowledged in passing? Test: remove the cascade language — does the policy read differently? If not, the cascade was decorative. | Dim 4 |
### Common drop patterns
- **Aspiration inflation**: The cascade produced a specific, bounded winning aspiration, but the draft inflated it into a broad vision statement.
- **Exclusion erosion**: The where-to-play analysis excluded specific segments, but the draft hedges ("primarily mid-market, but we'll also serve enterprise when opportunities arise").
- **Capability optimism**: The cascade identified capability gaps, but the draft treats them as tasks rather than risks — "hire mobile team" instead of acknowledging a fundamental capability the org hasn't built.
- **Systems omission**: The most common drop. Management systems are almost always the weakest part of the cascade, and the first thing cut from the draft.
---
## 3. Value Innovation (Blue Ocean)
### What to look for in the notes
Blue Ocean-informed interview notes typically contain:
- A strategy canvas or convergence analysis — factors the industry competes on, with all players roughly similar
- ERRC moves — specific factors to eliminate, reduce, raise, or create
- Noncustomer tier analysis — who isn't buying and why
- A diagnosis reframe from "how to beat competitors" to "the competitive arena itself is the problem"
### Audit checklist
| Criterion | What to check | Dimension |
|-----------|--------------|-----------|
| **Strategy canvas / convergence recognition** | Does the diagnosis acknowledge competitive convergence — that all players compete on the same factors? Or did it revert to "we're behind on features"? | Dim 1 |
| **ERRC moves in actions** | Do the coherent actions reflect eliminate/reduce/raise/create moves — deliberately diverging from competitors? Or do they default to matching and incrementing? | Dim 3 |
| **Noncustomer awareness** | If noncustomer tiers were identified, does the strategy target them, or did the draft narrow back to existing customers? | Dim 2 |
| **Diagnosis reframe** | Did the diagnosis shift from competitive positioning ("we're losing share") to competitive convergence ("the category has converged and further investment in these factors produces declining returns")? This is the most important Blue Ocean contribution to the kernel. | Dim 1 |
| **Value-cost tradeoff** | Does the guiding policy break the value-cost tradeoff (simultaneously reducing cost in some areas and increasing value in others), or does it stay within the conventional differentiation-vs-cost frame? | Dim 2, Dim 4 |
| **Divergence from competitor matching** | Read the coherent actions as a whole: are they about catching up to competitors, or about going somewhere competitors aren't? If the Blue Ocean analysis surfaced a divergent path but the actions converge back, flag it. | Dim 3, Dim 4 |
### Common drop patterns
- **Convergence denial**: The notes document clear competitive convergence, but the draft frames the problem as "we need to execute better on the same dimensions" — reverting to a red ocean diagnosis.
- **ERRC flattening**: The notes contain specific eliminate/reduce/raise/create moves, but the draft turns them into conventional feature prioritization ("we'll focus on X and Y") without the deliberate elimination and reduction that makes the strategy distinctive.
- **Noncustomer retreat**: The notes identify a promising noncustomer tier, but the draft's playing field narrows back to the existing market — often because the existing market feels safer or more measurable.
- **Incremental creep**: The actions start divergent but accumulate "and also" items that pull the strategy back toward competitor matching. Check whether actions added after the Blue Ocean analysis dilute the ERRC moves.
---
## Reporting interview lens audit findings
Record each finding under the relevant dimension in the dimension ratings. In the Lens Findings section of the review output, use this format for the interview lens audit subsection:
```markdown
**Interview lens audit:**
- **[Lens name] — [Reflected / Partially reflected / Missing]**: [What survived, what was dropped or softened, and what was lost. If partially reflected, name the specific elements that made it and the ones that didn't.]
```
The most valuable finding is often the gap between the notes and the draft — the sharpest strategic thinking from the interview that got polished away during document writing. Name it specifically so the author can recover it.
FILE:references/judge-artifact-schema.md
# Judge Artifact Schema
Machine-readable JSON output for strategy reviews. Produced only when judge artifact mode is explicitly requested — never emitted during normal reviews.
The schema bridges the prose review (`strategy-review.md`) and automated evaluation pipelines. Every numeric score maps directly to a prose label the reviewer already assigned. No new judgments are invented for the JSON — it's a structured projection of the same review, not a separate evaluation.
---
## Output file
**Filename:** `strategy-review.json`
**Location:** Same directory as `strategy-review.md`. If durable review state is active, also write to the review state directory.
When the user requests both prose and JSON (the default in judge mode), produce `strategy-review.md` first, then `strategy-review.json`. The JSON must be consistent with the prose — if a dimension is rated "Weak" in the prose, the JSON score must be 2.
---
## Score mapping
Strategy-review uses four rating levels. Map them to integers:
| Label | Score | Description |
|-------|-------|-------------|
| **Strong** | 4 | Element is specific, testable, and structurally sound |
| **Adequate** | 3 | Element is present and functional but has notable gaps |
| **Weak** | 2 | Element has significant structural problems |
| **Missing** | 1 | Element is absent or so vague it provides no guidance |
This is a 1–4 scale, not 1–5. The strategy-review skill has four levels; the JSON reflects them faithfully rather than inventing intermediate granularity.
---
## Schema definition
> **Note:** The example below is a partial illustrative fragment. It shows two of seven required `dimension_ratings`, one of the minimum two `failure_paths`, and one of the required 2–4 `critical_findings` to demonstrate the structure without duplicating the pattern for every entry. A valid `strategy-review.json` must include all seven dimensions, at least two failure paths, and 2–4 critical findings per the field reference below.
```json
{
"schema_version": "1.0",
"evaluator": {
"skill": "beagle-analysis:strategy-review",
"version": "2.11.0"
},
"review_metadata": {
"reviewed_at": "2026-04-10T14:30:00Z",
"documents": [
{
"name": "strategy-draft.md",
"path": "relative/path/or/null",
"sha256": "hex digest or null",
"word_count": 1850
}
],
"subject": "Mobile-first inspection workflows for mid-market",
"maturity": "near-final",
"audience": "Board of directors",
"timeframe": "18 months"
},
"kernel_extraction": {
"diagnosis": {
"found": true,
"summary": "Growth stalled because competitors launched mobile-first while Fieldkit remained desktop-only.",
"evidence": [
{
"text": "68% of inspections happen on-site with a phone, but our mobile experience is a responsive web wrapper that drops offline.",
"provenance": "doc_quote",
"location": "Diagnosis, paragraph 2"
}
]
},
"guiding_policy": {
"found": true,
"summary": "Dominate mobile-first inspection workflows for mid-market; stop pursuing enterprise until core product is defensible.",
"exclusions": [
"Enterprise sales motion",
"SOC 2 / SSO / audit trail features this year",
"Matching BuildOps on feature breadth"
],
"evidence": [
{
"text": "Won't build enterprise compliance features this year.",
"provenance": "doc_quote",
"location": "Guiding Policy"
}
]
},
"coherent_actions": {
"found": true,
"count": 4,
"actions": [
"Ship native mobile app with full offline sync by Q3",
"Kill enterprise sales motion — reassign AEs to mid-market",
"Build HappyCo and Yardi integrations",
"Launch reliability guarantee"
],
"evidence": [
{
"text": "Kill enterprise sales motion — reassign two enterprise AEs to mid-market, cancel SOC 2 engagement ($180K → mobile engineering)",
"provenance": "doc_quote",
"location": "Coherent Actions, item 2"
}
]
},
"chain_paragraph": "Growth stalled because competitors launched mobile-first while Fieldkit remained desktop-only, and the company drifted into enterprise to compensate. Therefore, dominate mobile-first inspection workflows for mid-market and stop pursuing enterprise until the core product is defensible. Which means: ship native mobile with offline sync, kill enterprise sales, build mid-market integrations, and launch a reliability guarantee."
},
"dimension_ratings": [
{
"dimension": 1,
"name": "diagnosis_quality",
"score": 4,
"label": "Strong",
"confidence": "high",
"assessment": "Names a specific structural mistake with concrete evidence. The 68% mobile statistic and the drift-to-enterprise pattern are falsifiable claims.",
"evidence": [
{
"text": "Treating mobile as feature request, not delivery surface.",
"provenance": "doc_quote",
"location": "Diagnosis"
}
],
"recommendation": null
},
{
"dimension": 2,
"name": "guiding_policy_strength",
"score": 4,
"label": "Strong",
"confidence": "high",
"assessment": "Makes a painful cut (enterprise) to concentrate force where the org has an edge. Exclusions are explicit.",
"evidence": [
{
"text": "Won't build enterprise compliance features this year. Won't match BuildOps on breadth.",
"provenance": "doc_quote",
"location": "Guiding Policy"
}
],
"recommendation": null
}
],
"bad_strategy_patterns": [
{
"pattern": "fluff",
"detected": false,
"severity": null,
"evidence": [],
"resolution": null
},
{
"pattern": "failure_to_face_challenge",
"detected": false,
"severity": null,
"evidence": [],
"resolution": null
},
{
"pattern": "goals_as_strategy",
"detected": false,
"severity": null,
"evidence": [],
"resolution": null
},
{
"pattern": "bad_objectives",
"detected": false,
"severity": null,
"evidence": [],
"resolution": null
},
{
"pattern": "strategy_by_analogy",
"detected": false,
"severity": null,
"evidence": [],
"resolution": null
}
],
"assumption_risk_map": [
{
"assumption": "Team can ship native mobile with offline sync by Q3",
"stated_in_doc": false,
"impact_if_wrong": "high",
"uncertainty": "high",
"risk_level": "critical",
"provenance": "reviewer_inference",
"consequence": "Load-bearing action fails. Reliability guarantee becomes liability. Integrations built on unstable foundation."
}
],
"failure_paths": [
{
"name": "Capability gap stalls native mobile",
"pattern": "capability_gap",
"scenario": "Team has never built native mobile. Q3 deadline assumes execution speed the org hasn't demonstrated. Six-month research project dressed as a deliverable.",
"likelihood": "medium",
"current_mitigation": "Budget reallocation from SOC 2 engagement",
"suggested_mitigation": "Prototype sprint in month 1. If offline sync isn't working by week 6, the Q3 date is fiction — trigger contingency plan."
}
],
"review_lenses_applied": [
{
"lens": "balanced_scorecard",
"trigger": "Success criteria are financial-only (ARR growth target)",
"findings": [
"No leading indicators from customer or internal-process perspectives. Strategy has a 6-month blind spot between cause and detection."
]
}
],
"interview_lens_audit": [
{
"lens": "landscape_mapping",
"assessment": "partially_reflected",
"survived": [
"Evolution stage awareness for mobile delivery surface"
],
"dropped": [
"Notes identified data pipeline as custom-built when three vendors now offer it as product — draft omits this"
],
"impact_on_kernel": "Diagnosis sharpened by mobile evolution insight but missed the infrastructure commoditization finding",
"evidence": [
{
"text": "Our data pipeline is custom-built but three vendors now sell this as a product",
"provenance": "notes_cross_ref",
"location": "lens-notes.md, landscape mapping section"
},
{
"text": "We face competitive pressure in infrastructure",
"provenance": "doc_quote",
"location": "Diagnosis, paragraph 3"
}
]
}
],
"critical_findings": [
{
"title": "Native mobile capability assumed but unverified",
"severity": "critical",
"description": "The load-bearing action (native mobile app with offline sync by Q3) requires a capability the team has never demonstrated.",
"impact": "If this action slips, the reliability guarantee becomes a liability and the enterprise exit loses its rationale.",
"evidence": [
{
"text": "Ship native mobile app with full offline sync by Q3",
"provenance": "doc_quote",
"location": "Coherent Actions, item 1"
}
],
"recommendation": "Add a capability validation milestone in month 1. Define what 'on track' looks like at week 6."
}
],
"blocking_findings": [
"Native mobile capability assumed but unverified"
],
"unresolved_questions": [
"Has the team built native mobile before, or is this a first attempt?",
"What happens to existing enterprise customers during the transition?"
],
"aggregate_score": {
"weighted_total": 3.45,
"weights": {
"diagnosis_quality": 20,
"guiding_policy_strength": 20,
"action_coherence": 15,
"kernel_chain_integrity": 15,
"bad_strategy_patterns": 10,
"assumption_exposure": 10,
"specificity_falsifiability": 10
},
"label": "Adequate",
"confidence": "high"
},
"reward_signal": {
"pass": false,
"score": 0.82,
"rationale": "One critical blocking finding (unverified capability assumption on load-bearing action). Strategy is structurally sound but not ready to act on without resolving the capability question."
}
}
```
---
## Field reference
### Top-level fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `schema_version` | string | yes | Schema version. Currently `"1.0"`. |
| `evaluator` | object | yes | Skill identifier and version. |
| `review_metadata` | object | yes | Document metadata and review context. |
| `kernel_extraction` | object | yes | Extracted kernel elements with evidence. |
| `dimension_ratings` | array[7] | yes | All seven dimensions, each with score and evidence. |
| `bad_strategy_patterns` | array[5] | yes | All five patterns, each with detected flag. Always include all five — `detected: false` for absent patterns. |
| `assumption_risk_map` | array | yes | Load-bearing assumptions. May be empty if none found (unusual). |
| `failure_paths` | array | yes | Failure scenarios. Minimum 2, typically 3–5. |
| `review_lenses_applied` | array | no | Omit if no review lenses triggered. |
| `interview_lens_audit` | array | no | Omit if no interview lenses were used during the `beagle-analysis:strategy-interview`. |
| `critical_findings` | array | yes | The 2–4 highest-severity findings. |
| `blocking_findings` | array | yes | Titles of findings that should block the strategy from being finalized. May be empty. |
| `unresolved_questions` | array | yes | Questions the review couldn't answer. May be empty. |
| `aggregate_score` | object | yes | Weighted composite score. |
| `reward_signal` | object | yes | Pass/fail determination for evaluation loops. |
### Review metadata fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `reviewed_at` | string (ISO 8601) | yes | Timestamp of the review. |
| `documents` | array | yes | Documents reviewed, each with `name`, `path`, `sha256`, `word_count`. |
| `subject` | string | yes | What the strategy is about. |
| `maturity` | enum | yes | Document maturity stage. One of: `early-draft`, `near-final`, `post-hoc`. |
| `audience` | string | yes | Intended audience for the strategy. |
| `timeframe` | string | no | Strategy timeframe if stated in the document. |
### Evidence objects
Evidence entries appear throughout the schema. Every evidence object has:
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | yes | The quoted passage or stated finding. |
| `provenance` | enum | yes | One of: `doc_quote`, `reviewer_inference`, `unverified_assumption`, `source_backed`, `notes_cross_ref`. |
| `location` | string | no | Section or page reference in the source document. |
**Provenance rules** — these match the evidence tagging in `source-evidence.md`:
- **`doc_quote`**: Direct quote from the strategy document. Must be verifiable against the source.
- **`reviewer_inference`**: Derived from the document but not directly stated. The reviewer connected dots the author didn't.
- **`unverified_assumption`**: Claim the strategy depends on that isn't confirmed in the document or notes.
- **`source_backed`**: Claim in the document that cites an external source.
- **`notes_cross_ref`**: Finding from strategy-notes.md or interview durable state artifacts.
**Notation mapping:** The durable review state files (`source-evidence.md`) use a slightly different format for the same tags: `doc quote`, `reviewer inference`, `unverified assumption`, `source-backed`, `notes cross-ref`. The JSON enum values use underscores (`doc_quote`, `source_backed`, `notes_cross_ref`) per JSON convention. The semantics are identical.
### Dimension rating objects
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `dimension` | integer | yes | 1–7, matching the review dimensions. |
| `name` | string | yes | Snake_case dimension name (e.g., `diagnosis_quality`). |
| `score` | integer | yes | 1–4. See score mapping above. |
| `label` | string | yes | `Strong`, `Adequate`, `Weak`, or `Missing`. Must match score. |
| `confidence` | enum | yes | `high`, `medium`, or `low`. How much evidence supported the assessment. |
| `assessment` | string | yes | 2–3 sentence assessment (same content as the prose review). |
| `evidence` | array | yes | At least one evidence entry per dimension. |
| `recommendation` | string/null | yes | Null if `label` is `Strong`. Required otherwise. |
### Bad-strategy pattern objects
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `pattern` | enum | yes | One of: `fluff`, `failure_to_face_challenge`, `goals_as_strategy`, `bad_objectives`, `strategy_by_analogy`. |
| `detected` | boolean | yes | Whether the pattern was found. |
| `severity` | enum/null | yes | `null` if `detected: false`. One of: `critical`, `serious`, `moderate` when detected. |
| `evidence` | array | yes | Evidence entries. Empty array if `detected: false`. |
| `resolution` | string/null | yes | Suggested fix. `null` if `detected: false`. |
### Interview lens audit objects
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `lens` | string | yes | Lens identifier (e.g., `landscape_mapping`, `choice_cascade`, `value_innovation`). |
| `assessment` | enum | yes | One of: `reflected`, `partially_reflected`, `missing`. |
| `survived` | array[string] | yes | Specific findings that made it into the draft. May be empty. |
| `dropped` | array[string] | yes | Specific findings lost between notes and draft. May be empty. |
| `impact_on_kernel` | string | yes | How the lens findings affected the kernel elements. |
| `evidence` | array | yes | Evidence entries with provenance. |
### Aggregate score
| Field | Type | Description |
|-------|------|-------------|
| `weighted_total` | number | `Σ(dimension_score × weight / 100)`. Range: 1.0–4.0. |
| `weights` | object | Percentage weights per dimension. Must sum to 100. |
| `label` | enum | Overall label derived from `weighted_total`: Strong (≥3.5), Adequate (≥2.5), Weak (≥1.5), Missing (<1.5). |
| `confidence` | enum | `high`, `medium`, or `low`. Reflects the lowest confidence among the three highest-weighted dimensions. |
**Default weights:**
| Dimension | Weight | Rationale |
|-----------|--------|-----------|
| Diagnosis quality | 20 | Foundation — everything downstream depends on it |
| Guiding policy strength | 20 | The core strategic choice |
| Action coherence | 15 | Execution readiness |
| Kernel chain integrity | 15 | Overall logical coherence |
| Bad-strategy patterns | 10 | Anti-pattern detection |
| Assumption exposure | 10 | Risk awareness |
| Specificity / falsifiability | 10 | Testability and accountability |
### Reward signal
| Field | Type | Description |
|-------|------|-------------|
| `pass` | boolean | `true` if `weighted_total ≥ 2.5` AND `blocking_findings` is empty. |
| `score` | number | Normalized 0.0–1.0: `(weighted_total - 1.0) / 3.0`. |
| `rationale` | string | One-sentence explanation of pass/fail. Must cite blocking findings if `pass` is `false`. |
**Pass/fail logic:**
- A strategy passes if its aggregate is at least Adequate AND no findings are blocking.
- A critical finding with severity "critical" that affects a load-bearing element (diagnosis, guiding policy, or the load-bearing action) is blocking by default.
- The reviewer may mark additional findings as blocking based on judgment.
### Failure path pattern vocabulary
Use these values for the `pattern` field in `failure_paths`. These cover the most common failure modes. If a scenario doesn't fit any of these, use a descriptive snake_case value — but prefer these standard patterns for pipeline consistency:
| Pattern | Description |
|---------|-------------|
| `capability_gap` | Load-bearing action requires capability the org doesn't have |
| `slow_drift` | Guiding policy exclusions erode through accumulated exceptions |
| `unmodeled_response` | Strategy assumes static competitive environment |
| `political_veto` | Strategy requires stopping something a powerful stakeholder owns |
| `assumption_cascade` | One failed assumption propagates through tightly coupled actions |
| `execution_bottleneck` | Multiple actions depend on same scarce resource |
| `measurement_void` | No leading indicators; problems invisible until too late |
---
## Validation rules
Before emitting the JSON, verify:
1. **Parseable**: Output is valid JSON. No trailing commas, no comments, no markdown fencing in the file itself.
2. **Required fields present**: All fields marked required in the field reference exist.
3. **Score-label consistency**: Every `score` matches its `label` per the mapping table. A `score: 3` with `label: "Weak"` is invalid.
4. **All seven dimensions present**: `dimension_ratings` has exactly seven entries, dimensions 1–7.
5. **All five patterns present**: `bad_strategy_patterns` has exactly five entries, one per pattern.
6. **Evidence non-empty**: Every dimension rating has at least one evidence entry. Every critical finding has at least one evidence entry.
7. **Weighted total correct**: `aggregate_score.weighted_total` equals `Σ(score × weight / 100)` within ±0.01.
8. **Aggregate label correct**: Label matches the weighted_total per the threshold table.
9. **Reward signal consistent**: `pass` is `true` only if `weighted_total ≥ 2.5` AND `blocking_findings` is empty.
10. **Blocking findings reference real findings**: Every entry in `blocking_findings` matches a `title` in `critical_findings`.
11. **Provenance valid**: Every `provenance` value is one of the five allowed enum values.
12. **Recommendations present when needed**: `recommendation` is non-null for any dimension with `label` other than `Strong`.
13. **Confidence values valid**: Every `confidence` value is one of: `high`, `medium`, `low`.
14. **Bad-strategy pattern names valid**: The five `bad_strategy_patterns` entries use exactly these `pattern` values: `fluff`, `failure_to_face_challenge`, `goals_as_strategy`, `bad_objectives`, `strategy_by_analogy`.
15. **Failure paths minimum count**: `failure_paths` has at least 2 entries.
16. **Critical findings count**: `critical_findings` has 2–4 entries.
---
## Relationship to prose review
The JSON artifact is a structured projection of the prose review, not a replacement. The two must be consistent:
- Every dimension label in the JSON matches the rating in `strategy-review.md`.
- Every critical finding in the JSON appears in the prose Critical Findings section.
- The aggregate label in the JSON matches the overall assessment tone in the Review Summary.
- Evidence quotes in the JSON are verifiable against the same document passages cited in the prose.
When both are produced, write the prose first, then derive the JSON from it. This ensures the reviewer's judgment flows from careful prose analysis, not from filling in a schema.
---
## What downstream consumers can expect
External frameworks (evaluation pipelines, RL loops, comparison dashboards) can rely on:
- **Stable field names** within a `schema_version`. Breaking changes increment the version.
- **Complete dimension coverage**: All seven dimensions, all five bad-strategy patterns, always present.
- **Evidence provenance**: Every score traces to tagged evidence. Consumers can filter by provenance to distinguish document facts from reviewer inferences.
- **Deterministic pass/fail**: The `reward_signal.pass` field follows a documented formula — no hidden judgment.
- **Blocking findings as a gate**: A pipeline can use `blocking_findings.length === 0` as a quality gate.
FILE:references/pressure-tests.md
# Pressure-test scenarios
Expected behaviors for the strategy-review skill. Use these to validate that the skill handles common review entry points correctly.
| Scenario | Expected behavior |
|----------|-------------------|
| User provides a standalone goals document (all aspirations, no diagnosis) | Name it: "This reads more like a goals document than a strategy." Rate Diagnosis as Missing, Bad-Strategy Patterns as Weak (goals masquerading as strategy). Offer to help identify the missing strategic elements. |
| User provides `strategy-draft.md` only, notes exist but weren't offered | Ask for `strategy-notes.md` — the notes dramatically enrich the review. Proceed without if user declines. |
| Draft + notes mismatch: notes contain sharper thinking than draft | Flag in Notes Cross-Reference: "The interview produced a sharper diagnosis than the draft contains." Quote both versions. |
| Polished board deck with weak diagnosis | Calibrate to near-final maturity — higher bar, flag everything that undermines credibility. Focus on the diagnosis gap specifically because a polished presentation with a vague diagnosis is the most dangerous kind of bad strategy. |
| Comparative review: two competing proposals | Review each independently first using the seven dimensions, then compare. Don't declare a winner unless structural quality clearly differs. |
| Mid-interview provisional draft (`[PROVISIONAL]` tag) | Produce a "mid-interview check" with 3-5 observations, not a full review. Focus on strongest and weakest emerging kernel elements. Suggest questions to pressure-test before finalizing. |
| Strategy with sourced market claims | Check which claims are sourced vs. asserted. Flag conventional wisdom presented as fact. Note claims you can't verify: "I can't verify this, but if wrong, the strategy changes." If durable state is active, tag claims in `source-evidence.md`. |
| Long multi-appendix document (>3,000 words) | Activate durable review state. Extract evidence to `source-evidence.md` during reading. Compose final review from artifacts, not conversation memory. |
| User says "poke holes in this" or "quick take" | Deliver findings inline in chat — lead with Critical Findings, top failure path, recommendations. Offer to write full `strategy-review.md` afterward. Do not write the file without confirming. |
| Strategy already in execution (post-hoc review) | Focus on assumption validation — which assumptions can now be checked against reality? Flag any already falsified by events. |
| Strategy uses OKRs/V2MOM/SWOT instead of kernel language | Evaluate the thinking, not the format. Map concepts to kernel. Flag genuine structural gaps ("this V2MOM has methods but no diagnosis") rather than forcing vocabulary. |
| Notes contain Wardley findings (evolution stages, value-chain dependencies, inertia insight); draft drops them | Flag in interview lens audit: "Landscape mapping — Partially reflected" or "Missing." Identify the specific findings that were dropped — e.g., notes say "our data pipeline is custom-built but three vendors now offer this as a product" but the draft says "we face competitive pressure in infrastructure." Quote both versions. Record under Dim 1 (diagnosis quality) because the landscape insight was meant to sharpen the diagnosis. Do NOT substitute Five Forces questions to validate the Wardley findings — Five Forces tests competitive pressure, not value-chain mapping fidelity. |
| Notes contain cascade capability gaps; draft treats them as action items | Flag in interview lens audit: "Strategic choice cascade — Partially reflected." The cascade identified a capability the org doesn't have as a *risk*; the draft treats it as a *task* ("hire mobile team by Q3"). Record under Dim 6 (assumption exposure) — the gap between "we need this capability" and "we have this capability" is a load-bearing assumption. |
## Judge artifact mode scenarios
| Scenario | Expected behavior |
|----------|-------------------|
| Normal review — user says "review this strategy" with no judge/JSON language | Produce only `strategy-review.md` (or chat-only feedback). Do NOT emit `strategy-review.json`. No mention of JSON artifacts unless the user asks. |
| User says "produce a machine-readable review" or "LLM judge output" | Activate judge artifact mode. Produce `strategy-review.md` first, then `strategy-review.json`. JSON must parse as valid JSON, contain all seven dimension ratings and all five bad-strategy pattern entries, and every score must match its label per the mapping (Strong=4, Adequate=3, Weak=2, Missing=1). |
| User says "JSON only" or "just the evaluation artifact" | Produce only `strategy-review.json`, skip the prose file. All validation rules still apply. |
| Judge mode review of a strategy with missing diagnosis | JSON `dimension_ratings[0]` (diagnosis_quality) must have `score: 1, label: "Missing"`. The `critical_findings` array must include a finding about the missing diagnosis. That finding's title must appear in `blocking_findings`. The `reward_signal.pass` must be `false` because blocking findings are non-empty. The rationale must cite the missing diagnosis. |
| Judge mode review where interview notes contain Wardley lens findings that were dropped from the draft | JSON `interview_lens_audit` must include an entry with `lens: "landscape_mapping"` and `assessment: "partially_reflected"` or `"missing"`. The `dropped` array must name the specific findings lost between notes and draft. The `evidence` array must contain at least one `notes_cross_ref` entry (the original finding) and one `doc_quote` entry (what replaced it or the absence). The prose review's Lens Findings section must contain the same audit finding. |
| Judge mode — weighted total and reward signal arithmetic | `aggregate_score.weighted_total` must equal `Σ(score × weight / 100)` within ±0.01. `reward_signal.score` must equal `(weighted_total - 1.0) / 3.0` within ±0.01. `reward_signal.pass` must be `true` only when `weighted_total ≥ 2.5` AND `blocking_findings` is empty. |
| Judge mode — prose and JSON consistency | Every dimension label in JSON must match the rating in `strategy-review.md`. Every critical finding title in JSON must appear in the prose Critical Findings section. If the prose says "Diagnosis Quality — Weak" but the JSON has `score: 3, label: "Adequate"`, that is a validation failure. |
| Judge mode — adequate aggregate with blocking findings | Strategy scores Adequate or above on aggregate (`weighted_total ≥ 2.5`) but has a critical finding in `blocking_findings`. The `reward_signal.pass` must be `false` despite the adequate aggregate. The `rationale` must cite the blocking finding, not the score. This validates that blocking findings gate independently of the aggregate — the AND condition in the pass/fail logic requires BOTH adequate score AND empty blocking findings. |
FILE:references/review-dimensions.md
# Review Dimensions
Seven dimensions for evaluating strategy documents. Each dimension has criteria for each rating level, common failure patterns, and the kinds of pressure-test questions to apply.
The overall goal: find where the strategy could break — the gaps, unexamined risks, and failure paths that the author hasn't accounted for. A strategy that survives this review has a meaningfully better chance of surviving contact with reality.
---
## 1. Diagnosis Quality
**Core question:** Does the diagnosis name the actual challenge with enough specificity that you could imagine being wrong about it?
A diagnosis isn't a description of the situation — it's a *judgment* about what matters most. It picks from a complex reality and says "this is the thing." That act of picking is what makes it useful and what makes it testable.
### Rating criteria
**Strong**: Names a specific challenge. Uses the company's particular situation, not generic industry observations. You could argue against this diagnosis — it takes a position. Often uses analogy or identifies a structural pattern ("this is a classic disintermediation problem," "we're in the position Kodak was in when digital hit 5% market share").
**Adequate**: Identifies a real challenge but stays somewhat generic. Could apply to several companies in the same industry without modification. Correct but not sharp enough to generate a distinctive guiding policy.
**Weak**: Describes the situation without identifying what matters most. Lists multiple challenges without prioritizing. Uses language like "the market is changing" or "we face increasing competition" without specifying *how* or *why it matters for this company specifically*.
**Missing**: No diagnosis at all. The document jumps from context to goals or actions. The most common version: a SWOT analysis with no "therefore."
### Pressure-test questions
- Could a competitor read this diagnosis and say "yes, that's exactly our challenge too"? If so, it's not specific enough.
- Does the diagnosis identify *why now*? What changed that makes this challenge urgent or this opportunity available?
- Is there an unstated assumption about the external environment that, if wrong, would invalidate the diagnosis entirely?
- Does the diagnosis identify the *root* challenge, or a symptom? ("We're losing market share" is a symptom. "Our product architecture prevents us from serving the mobile-first workflow that 68% of inspections now use" is a root cause.)
- What would the diagnosis look like if the author's biggest fear were true? What if their most optimistic assumption were false?
### Common failure patterns
- **The comfortable diagnosis**: Identifies a challenge that the organization already knows how to solve, avoiding the harder truth that requires a different approach.
- **The diagnosis-by-data-dump**: Presents extensive market research and trend analysis but never synthesizes it into a judgment. The reader is left to figure out what it all means.
- **The inherited diagnosis**: Repeats the diagnosis from last year's strategy without checking whether the underlying situation has changed.
- **The consensus diagnosis**: So carefully worded to avoid offending any stakeholder that it says nothing specific. Usually produced by committee.
**Lens connection — Competitive Pressure Audit**: When the diagnosis addresses competitive dynamics, apply Five Forces thinking from `references/review-lenses.md`. Check whether the diagnosis accounts for all five forces (rivalry, new entrants, substitutes, supplier power, buyer power) or focuses narrowly on direct rivalry while ignoring structural pressures that could reshape the competitive environment. A diagnosis that says "we're losing to Competitor X" but doesn't consider substitution risk or eroding entry barriers has only seen part of the picture.
---
## 2. Guiding Policy Strength
**Core question:** Does the guiding policy make a real choice that rules things out — including things a reasonable person might choose?
A guiding policy that everyone agrees with hasn't decided anything. The value is in what it excludes. If a competitor could adopt the same policy verbatim without contradiction, it's not doing strategic work.
### Rating criteria
**Strong**: Makes a clear directional choice. Names what the organization will *not* do. The rejected alternatives are plausible — someone could reasonably argue for them. Exploits a specific asymmetry or pivot point. Short enough to remember.
**Adequate**: Makes a choice but the exclusions are implicit rather than stated. You can infer what's been ruled out but it hasn't been made explicit, which means different readers may interpret the scope differently.
**Weak**: Describes a desirable direction without excluding anything. Could be adopted by any company in the industry. Language like "focus on the customer," "drive innovation," "be the best" — these sound like policies but aren't, because they don't constrain decisions.
**Missing**: No guiding policy. The document has goals and actions but nothing connecting them — no principle that explains why *these* actions and not others.
### Pressure-test questions
- What does this policy *prevent* the organization from doing? Can the author name three specific initiatives or investments they would decline because of this policy?
- Is there a real alternative to this policy that a thoughtful person might advocate? If not, it's not a choice — it's a platitude.
- Does the policy address the diagnosis? Trace the line: the diagnosis says X is the challenge, the policy says we'll address it by Y. Does Y actually respond to X, or is it a non sequitur?
- Who in the organization would be uncomfortable with this policy? If nobody, it hasn't decided anything.
- What does this policy look like in a bad quarter? Does it still hold when resources are tight, or would the organization quietly abandon it?
### Common failure patterns
- **The umbrella policy**: Broad enough to accommodate any initiative anyone wants to pursue. "Invest in growth" covers everything from R&D to acquisitions to sales hiring — it constrains nothing.
- **The stealth goal**: "Our policy is to achieve market leadership in segment X" — that's a goal wearing a policy costume. The policy would be the *approach* to achieving market leadership: through price, through service depth, through platform lock-in, etc.
- **The both/and policy**: "We will be the cost leader AND the innovation leader AND the customer service leader." Strategies that refuse to choose between contradictory positions are not strategies.
- **The re-emerging rejected alternative**: The guiding policy explicitly rules something out, but one of the coherent actions quietly reintroduces it. Check for this specifically.
---
## 3. Action Coherence
**Core question:** Do the actions reinforce each other, and would they actually carry out the guiding policy?
Coherence is the tell. An incoherent action set — actions that pull in different directions, duplicate effort, or silently contradict each other — is the signature of a strategy that was assembled by committee rather than designed as a system.
### Rating criteria
**Strong**: Actions clearly reinforce each other — doing A makes B easier, B makes C cheaper, C protects A. The set has focus (3-6 actions, not 15). Each action has an owner, a timeframe, and a clear connection to the guiding policy. The author can trace the chain: if you remove any one action, the system degrades.
**Adequate**: Actions are consistent with the guiding policy and don't contradict each other, but the reinforcement is weak or unstated. They're a reasonable to-do list for someone pursuing this policy, but they haven't been designed as a system.
**Weak**: Actions include items that contradict or compete with each other for resources. Or the action list is too long (8+) and lacks prioritization — it's a project portfolio, not a coherent set. Or actions are so vague ("invest in technology") that coherence can't be assessed.
**Missing**: No actions, or actions that are disconnected from both the diagnosis and the guiding policy. The document ends with the policy and leaves execution to "the teams."
### Pressure-test questions
- For each pair of actions: does A make B easier or harder? Are there hidden resource conflicts?
- Which action is load-bearing? If it fails, does the rest of the strategy still work, or does it collapse? This identifies single points of failure.
- Are there actions that would happen regardless of this strategy — business-as-usual relabeled as strategic? Remove those and see what's left.
- Do the actions require capabilities the organization doesn't have? If so, is building those capabilities itself one of the actions?
- What's the *sequence*? Some actions must precede others. If the document treats them as parallel but they're actually sequential, the timeline is fiction.
- What happens if the most expensive or most difficult action is delayed by six months? Does the strategy degrade gracefully or break?
### Common failure patterns
- **The laundry list**: Every team's top priority became a "strategic action." No prioritization, no sequencing, no resource allocation.
- **The invisible dependency**: Action 3 depends on action 1 being complete, but this isn't stated. The actions look parallel but are actually sequential, and the timeline doesn't account for it.
- **The orphan action**: One action doesn't connect to any other. It was probably added to satisfy a stakeholder and wasn't part of the original strategic logic.
- **The missing action**: The guiding policy implies something that doesn't appear in the action list. For example, the policy says "focus on segment X" but no action addresses exiting segment Y.
**Lens connection — Execution Alignment**: When the strategy requires organizational change, apply 7S thinking from `references/review-lenses.md`. Actions that demand new capabilities, cross-functional coordination, or cultural shifts carry invisible requirements the strategy may not account for — check whether the actions address alignment across structure, systems, shared values, style, staff, and skills, or assume the organization will simply follow the new direction.
**Lens connection — Strategy Deployment**: When the strategy involves multiple organizational levels, apply Hoshin Kanri thinking from `references/review-lenses.md`. Check whether actions are stated at the right altitude and whether translation from executive intent to team-level execution has been accounted for. If the strategy adds actions without subtracting existing work, execution teams will make their own prioritization choices — and those choices may silently undermine the strategy.
---
## 4. Kernel Chain Integrity
**Core question:** Does the logical chain — "because [diagnosis], we will [guiding policy], which means we will [actions]" — actually hold?
This is the meta-dimension. Dimensions 1-3 evaluate each kernel element individually; this one evaluates whether they fit together. A strategy can have a good diagnosis, a good policy, and good actions that don't logically connect.
### Rating criteria
**Strong**: The "therefore" test passes cleanly. Reading "[diagnosis]. Therefore, [policy]. Which means [actions]." sounds like a logical argument, not a slide transition. Each link is tight — the policy responds to the specific diagnosis (not a different challenge), and the actions carry out the specific policy (not a generic version of it).
**Adequate**: The chain is traceable but requires some inference. The connections are present but not tight — you can see how they relate, but the policy could serve a somewhat different diagnosis, or the actions could serve a somewhat different policy.
**Weak**: Significant gaps in the chain. The diagnosis and policy address different topics. Or the actions seem disconnected from the policy — they might be good actions for a different strategy.
**Missing**: The document has components that don't form a chain at all — either because elements are missing, or because they were written independently and assembled without checking fit.
### Pressure-test questions
- Read the strategy as a single paragraph: "[Diagnosis]. Therefore, [policy]. Which means [actions]." Does it sound coherent, or do the transitions feel forced?
- If the diagnosis were different — say the opposite were true — would the guiding policy still make sense? If yes, the policy isn't responding to the diagnosis.
- If the guiding policy ruled something out but an action quietly reintroduces it, the chain is broken. Check the exclusion list against each action.
- Could someone read just the actions and correctly guess the diagnosis? If the actions don't contain the DNA of the diagnosis, the chain has broken by the time it reaches execution.
---
## 5. Bad-Strategy Pattern Detection
**Core question:** Are any of the four hallmarks of bad strategy (plus the additional anti-pattern) present in the document?
Bad strategy isn't the absence of strategy — it's the *presence of specific patterns* that look like strategy but aren't. These patterns are dangerous because they create confidence without substance. A document that contains them will feel "strategic" to casual readers while providing no actual guidance for decisions.
### The four Rumelt hallmarks (plus one additional anti-pattern)
**1. Fluff** — Buzzword-heavy language that sounds sophisticated but says nothing.
- Signal phrases: synergy, leverage, ecosystem, platform play, holistic, transformational, best-in-class, next-generation, customer-centric.
- The test: replace the buzzword with its plain-language definition. If the sentence now says nothing, it was fluff.
- Note: a single buzzword in an otherwise concrete document isn't a finding. The pattern is when fluff *replaces* specificity rather than supplementing it.
**2. Failure to face the challenge** — No clear statement of what the actual problem is. The document is all aspiration and no obstacle.
- The test: can you state the challenge in one sentence? If you can only state the desired outcome, the challenge has been avoided.
**3. Mistaking goals for strategy** — Revenue targets, market share goals, or outcome metrics presented as if they were strategy. "Grow 30% YoY" is a wish, not a plan.
- The test: does the document explain *how* and *why that how*? If the "strategy" section could be replaced by a spreadsheet target, it's goals.
**4. Bad strategic objectives** — Either a laundry list with no prioritization (the "dog's dinner") or blue-sky objectives that assume away the hard part ("we will eliminate tech debt").
- The test: if every objective is "high priority," none of them are. If the objective restates the problem as if naming it solved it, it's blue-sky.
**5. Strategy by analogy** (additional anti-pattern, not a Rumelt hallmark) — "Netflix did X, so we should too" without examining whether the conditions that made it work for Netflix exist here.
- The test: can the author name three conditions that made the analogy work for the original company, and verify they hold here?
### Rating criteria
**Strong**: No hallmarks detected, or minor traces that don't undermine the strategy's substance.
**Adequate**: One or two hallmarks present in mild form — some fluff in the executive summary, a section that reads more like goals than strategy. The core thinking is sound but the document could be tightened.
**Weak**: Multiple hallmarks present. The document has significant sections that sound strategic but lack substance. The reader would struggle to make a concrete decision based on this strategy.
**Missing** (meaning: the entire document is bad strategy): The document is primarily goals, fluff, or laundry lists with no underlying strategic logic. This is a "start over" finding.
---
## 6. Assumption Exposure
**Core question:** Has the strategy identified its load-bearing assumptions, and are those assumptions testable?
Every strategy is a bet. The question isn't whether it rests on assumptions — all strategies do — but whether the author *knows* which assumptions are load-bearing and has a plan for what happens if they're wrong.
### Rating criteria
**Strong**: Load-bearing assumptions are explicitly stated, distinguished from background assumptions. Each one is testable — the author could describe what evidence would confirm or refute it. The strategy acknowledges what changes if the biggest assumption is wrong. Risk exposure is proportional to the stakes.
**Adequate**: Some assumptions are stated but the list is incomplete. The most obvious ones are there; the subtle or uncomfortable ones are missing. Limited discussion of what happens if assumptions fail.
**Weak**: Assumptions are either absent or buried in optimistic language ("as the market continues to grow..."). The strategy reads as if the environment is certain and the plan will execute as designed.
**Missing**: No assumptions stated. The strategy presents its view of the world as fact. This is especially dangerous when the strategy depends on market trends, competitor behavior, or technology adoption curves.
### Pressure-test questions
- What are the three assumptions that, if wrong, would break this strategy entirely? Are they stated in the document?
- Does the strategy assume competitor behavior (e.g., "competitors will be slow to respond")? That's almost always optimistic.
- Does the strategy assume capability that doesn't yet exist (e.g., "once we build the new platform")? What happens if it takes twice as long?
- Does the strategy assume market conditions will continue (e.g., "the market is growing at 15% annually")? What if growth stalls?
- Are there second-order effects that haven't been considered? If action A succeeds, what does the competitor do in response? What does the customer do?
- Does the strategy assume organizational willingness? "We will stop doing X" — will the team that owns X agree? Is there political resistance the strategy pretends doesn't exist?
- What's the failure path that *doesn't* involve a dramatic crisis — the slow drift scenario where the strategy dies quietly because nobody noticed it wasn't working?
### Common failure patterns
- **Optimism as strategy**: The document's assumptions about the future consistently break in the org's favor. Markets grow, competitors stumble, technology works, talent arrives. No downside scenarios.
- **The missing "what if"**: No discussion of what changes if a key assumption fails. A strategy without contingency exposure is a strategy without humility.
- **Invisible assumptions**: The most dangerous kind. The strategy implicitly assumes things that nobody has stated — for example, that the team has the skills to execute a new approach, or that the board will fund the investment, or that customers will switch behavior.
- **The political assumption**: The strategy assumes organizational alignment that doesn't exist. "We will consolidate the three product lines" — has anyone told the three product line owners?
**Lens connection — Execution Alignment**: When the strategy requires organizational change, the most dangerous assumptions are often about the organization itself. Apply 7S thinking from `references/review-lenses.md` to surface unstated assumptions about whether current structure, systems, culture, and leadership style will support the new direction. The "slow drift" and "political veto" failure paths are frequently 7S misalignment that nobody named as an assumption.
---
## 7. Specificity and Falsifiability
**Core question:** Could you tell in 12 months whether this strategy worked, partially worked, or failed? If so, how?
A strategy that can't be evaluated can't be improved. Specificity isn't about adding metrics for the sake of metrics — it's about making the strategy's claims testable. If the diagnosis is vague, you can't tell whether the guiding policy addressed it. If the success criteria are vague, you can't tell whether the actions worked.
### Rating criteria
**Strong**: Each kernel element is specific enough to be falsified. The diagnosis names something concrete that can be verified. The guiding policy's exclusions are clear enough that you'd notice a violation. The actions have enough detail to tell whether they happened. Success indicators are observable, not aspirational.
**Adequate**: Mostly specific but with pockets of vagueness. The diagnosis is concrete but the success indicators are fuzzy. Or the actions are clear but the guiding policy is broad enough to accommodate almost any action set.
**Weak**: Significant vagueness throughout. You could read the strategy a year from now and have no way to assess whether it was followed, whether the diagnosis was correct, or whether the actions produced the intended effect.
**Missing**: The strategy is entirely unfalsifiable — so vague that no outcome could ever contradict it. "We will drive growth through innovation" cannot fail because it never committed to anything specific.
### Pressure-test questions
- Imagine you're reviewing this strategy in 12 months. What evidence would tell you it succeeded? What evidence would tell you it failed? If you can't describe the failure case, the strategy isn't specific enough.
- Are the success indicators leading indicators (detect problems early) or lagging indicators (confirm what already happened)? A strategy with only lagging indicators can't course-correct.
- Could a new employee read this document and understand what they should and shouldn't work on? If not, it's not specific enough to guide decisions.
- Take each action: what does "done" look like? Is there a clear difference between "we did this" and "we said we would do this but didn't"?
**Lens connection — Objective Translation**: When success criteria are one-dimensional or vague, apply Balanced Scorecard thinking from `references/review-lenses.md`. Check whether the strategy has leading indicators from customer and internal-process perspectives that would signal problems before financial metrics confirm them. A strategy measurable only in revenue and market share has a 6-12 month blind spot between cause and detection.
FILE:references/review-lenses.md
# Review Lenses
Four complementary frameworks that sharpen the review when the strategy document warrants them. Lenses load conditionally — most reviews use one or two; some use none. The reviewer weaves lens-specific questions into the existing seven dimensions rather than adding new sections.
**The lenses do not ask the author to adopt a new framework.** They give the reviewer sharper questions by applying structured thinking that the core dimensions alone might miss — particularly around execution feasibility and competitive environment accuracy.
---
## 1. McKinsey 7S — Execution Alignment
### When to load
When the strategy requires organizational change — new structure, new processes, cultural shift, or cross-functional coordination. Load this lens when you see:
- Actions that imply structural reorganization (new teams, new reporting lines, consolidated functions)
- A guiding policy that requires different skills or capabilities than the organization currently has
- Cross-functional dependencies that the actions treat as straightforward
- References to "culture change," "mindset shift," or "new ways of working"
### What to look for
The seven elements: strategy, structure, systems, shared values, style, staff, skills. The most common gap: the strategy changes the "strategy" element but assumes the other six will follow.
Check alignment between what the strategy demands and what the organization currently is:
- **Structure**: Does the current org structure support the new direction, or does it create friction? If teams need to collaborate differently, has the strategy addressed how?
- **Systems**: Do existing processes, tools, and decision-making systems support the change, or will they produce antibodies?
- **Shared values**: Does the strategy conflict with deeply held organizational beliefs? A strategy requiring "move fast" in an organization that values "measure twice" will face silent resistance.
- **Style**: Does leadership behavior reinforce the strategy? If the strategy says "empower teams" but leadership is command-and-control, the strategy loses.
- **Staff**: Does the organization have the people this requires? Not just headcount — the specific mix of experience, judgment, and domain knowledge.
- **Skills**: Does the organization have the capabilities? Not individual skills — organizational capabilities like "shipping mobile products" or "managing enterprise sales cycles."
### Pressure-test questions
1. Which of the six non-strategy elements (structure, systems, shared values, style, staff, skills) must change for this strategy to work? Does the document acknowledge any of them?
2. Where will current systems and processes produce friction against the new direction? Is that friction addressed or ignored?
3. If the strategy requires cross-functional coordination that doesn't exist today, what's the plan for creating it? "Better communication" is not a plan.
4. Does leadership style match what the strategy demands? A strategy requiring decentralized decisions won't work under centralized leadership.
5. What's the gap between the skills the organization has and the skills the strategy needs? Is closing that gap part of the action plan, or is it assumed away?
### Common gaps this reveals
- **The structural orphan**: A key action requires coordination between teams that don't currently work together, and no one owns making that coordination happen.
- **The systems antibody**: Existing approval processes, budgeting cycles, or performance metrics will actively undermine the new strategy because nobody thought to change them.
- **The cultural contradiction**: The strategy requires behaviors that conflict with "how we actually do things here." The document won't mention this; the reviewer should.
- **The skills gap dressed as a timeline**: "Hire a mobile team by Q2" treats a capability-building challenge as a procurement problem.
**Dimension connections**: Primarily strengthens Dimension 3 (Action Coherence) and Dimension 6 (Assumption Exposure). The "slow drift" and "political veto" failure paths are often 7S misalignment in disguise.
---
## 2. Balanced Scorecard — Objective Translation
### When to load
When the strategy's success criteria are one-dimensional or vague. Load this lens when you see:
- Success measured only in financial terms (revenue, margin, market share)
- "What success looks like" is aspirational rather than observable
- No leading indicators — only outcomes that take 6-12 months to appear
- Actions without clear measures of progress
### What to look for
Check whether the strategy translates into objectives across four perspectives — not as a formal scorecard, but as a completeness check on how success is defined:
- **Financial**: Revenue, margin, cost targets. Almost always present. The question is whether these are the *only* measures.
- **Customer**: What changes for the customer? How would you know the strategy is working from the customer's perspective before financial results arrive? Acquisition cost, retention, NPS, usage patterns, time-to-value.
- **Internal Process**: What processes must improve or be created? How do you measure whether the operational capabilities the strategy requires are actually being built?
- **Learning & Growth**: Is the organization developing the capabilities, knowledge, and culture the strategy needs? Staff development, knowledge management, innovation pipeline health.
A strategy with only financial success criteria can't course-correct — by the time revenue tells you something's wrong, the underlying causes are 6-12 months old.
Beyond the four perspectives, check for the deeper Balanced Scorecard elements that separate rigorous objective translation from surface-level "balanced KPI coverage":
- **Causal linkage**: Do the objectives across perspectives form a cause-and-effect chain? Learning & Growth improvements should drive Internal Process improvements, which should drive Customer outcomes, which should drive Financial results. If the objectives are just four independent lists, the scorecard thinking is decorative — it hasn't identified the causal theory of how the strategy creates value.
- **Targets**: Are there specific, time-bound targets for each objective — not just metrics to watch, but thresholds that distinguish success from failure? "Track NPS" is monitoring; "NPS above 45 by Q3" is a target the org can rally around and course-correct against.
- **Strategic initiatives**: For each objective gap (where-we-are vs. target), is there a funded initiative to close it? Objectives without initiatives are wishes. Check whether the coherent actions map to specific objective gaps or float independently.
- **Review cadence**: Does the strategy specify how often progress will be reviewed and by whom? The Balanced Scorecard's operational power comes from regular strategy review meetings where leading indicators trigger mid-course corrections. Without a review rhythm, the scorecard degrades into a dashboard nobody checks.
### Pressure-test questions
1. If the strategy is working as intended, what would you see in 90 days that isn't a financial metric? If there's no answer, the strategy has no early warning system.
2. What customer behavior would change first if the strategy is succeeding? Is that behavior being measured?
3. Which internal processes are load-bearing for this strategy? How would you know if they're improving or degrading?
4. What capabilities must the organization build? How would you measure progress on capability-building before end results arrive?
5. If the financial targets are missed in Q3, what non-financial indicators from Q1-Q2 should have flagged the problem? If none exist, the strategy is flying blind.
6. Can you trace a causal chain from a Learning & Growth objective through Internal Process and Customer objectives to a Financial outcome? If the objectives don't link causally, they're four independent wish lists.
7. For each success indicator — is there a specific target with a deadline, or just a metric to "track"? Monitoring without thresholds can't trigger corrective action.
8. For each gap between current state and target — is there a funded initiative to close it, or is the gap expected to close on its own?
9. How often will the strategy's leading indicators be reviewed, by whom, and what authority do they have to adjust course? If there's no review cadence, the scorecard is a one-time artifact, not an operating tool.
### Common gaps this reveals
- **The financial-only trap**: Success defined entirely in outcomes (revenue, market share) with no visibility into the drivers. Problems are discovered 6+ months after they start.
- **The activity-as-progress illusion**: Actions have timelines but no outcome measures. "Launch mobile app by Q3" tells you whether the action happened, not whether it worked.
- **The missing learning loop**: No measures for capability development. The strategy requires new skills but has no way to tell whether the organization is acquiring them.
- **The customer blind spot**: Financial projections assume customer behavior changes, but no customer-facing metrics would confirm whether those changes are happening.
**Dimension connections**: Primarily strengthens Dimension 7 (Specificity and Falsifiability). Also informs the "measurement void" failure path.
---
## 3. Porter's Five Forces — Competitive Pressure Audit
### When to load
When the diagnosis addresses competitive dynamics. Load this lens when you see:
- Market position or competitive standing as part of the diagnosis
- Pricing pressure, margin erosion, or commoditization discussed
- New market entrants or emerging competitors mentioned
- Technology substitution or category disruption referenced
- Supply chain dependencies or customer concentration acknowledged
### What to look for
Check whether the diagnosis has identified the full set of competitive pressures, not just the most visible one:
1. **Rivalry among existing competitors**: Most strategies address this. The question is whether the analysis goes beyond "we compete with X" to understand structural drivers of rivalry — industry growth rate, differentiation levels, switching costs, exit barriers.
2. **Threat of new entrants**: What barriers prevent new competitors from entering? Are those barriers eroding? Strategies that assume today's competitor set is fixed often miss that the real threat is someone who doesn't compete today.
3. **Threat of substitutes**: Not just direct competitors — alternative approaches to the same customer need. The strategy might be winning against Competitor X while a substitute technology makes the entire category irrelevant.
4. **Bargaining power of suppliers**: How dependent is the organization on specific suppliers, platforms, or partners? A strategy built on a platform you don't control carries supplier-power risk.
5. **Bargaining power of buyers**: How much leverage do customers have? Strategies that assume pricing power should check whether buyer concentration or switching-cost dynamics support that assumption.
### Pressure-test questions
1. The diagnosis addresses rivalry — but has it considered which other forces shape the competitive environment? Which forces are notably absent from the analysis?
2. What prevents a new entrant from pursuing the same guiding policy? If the answer is "nothing," the strategy's advantage window may be shorter than assumed.
3. Is there a substitute technology, business model, or approach that could make the strategy's value proposition irrelevant — not by competing directly, but by redefining the problem?
4. Does the strategy depend on a platform, supplier, or partner whose interests might diverge? What happens if they change terms, compete directly, or disappear?
5. Does the strategy assume customer loyalty or switching costs that may not hold? What happens if buyers gain more leverage through alternatives or consolidation?
### Common gaps this reveals
- **The rivalry-only diagnosis**: Competitive dynamics treated as a two-player game (us vs. main competitor) while ignoring structural forces shaping the entire environment.
- **The invisible substitute**: Direct competitors correctly identified, but a different category of solution is about to absorb the customer need.
- **The platform dependency**: Strategy built on a platform (cloud provider, app store, marketplace) treated as infrastructure but actually a supplier with increasing bargaining power.
- **The eroding moat**: Strategy assumes barriers to entry that are weakening — regulatory protection being removed, proprietary technology being commoditized, talent becoming more available.
**Dimension connections**: Primarily strengthens Dimension 1 (Diagnosis Quality). For each force the strategy mentions, check the analysis depth. For forces it doesn't mention, ask whether the omission is justified or a blind spot.
---
## 4. Hoshin Kanri — Strategy Deployment
### When to load
When the strategy involves multiple organizational levels — when actions need to cascade from executive intent to team-level execution. Load this lens when you see:
- Actions stated at different altitudes (some executive-level, some operational)
- Multiple teams or departments responsible for different parts of execution
- Initiatives that require translation from strategic intent to operational plans
- A gap between the strategy's ambition and the specificity of its actions
### What to look for
Strategy fails at the translation layers. The board-level strategy becomes a VP-level initiative becomes a team-level OKR, and at each translation, meaning is lost:
- **Altitude consistency**: Are actions stated at the right level? If they're all executive-level ("launch mobile app"), do they account for the team-level translation needed? If they're all operational ("implement offline sync"), is there a clear line back to strategic intent?
- **Catchball evidence**: Has the strategy been tested bidirectionally — top-down intent meeting bottom-up feasibility? If it reads as purely top-down, the actions may not survive contact with the teams who must execute them.
- **Translation gaps**: Between each organizational level, ask: could a team lead read this and know what to prioritize this quarter? If not, translation work is needed that the strategy doesn't acknowledge.
- **Breakthrough vs. operational**: Does the strategy distinguish between breakthrough objectives (change the trajectory) and operational objectives (keep the business running)? Breakthrough objectives need protection from being crowded out by operational demands.
### Pressure-test questions
1. If a team lead reads this strategy, can they derive their team's priorities for the next quarter? If not, where does translation break down?
2. Is there evidence that the actions were validated with the people who must execute them — or do they read as mandates from above?
3. Which actions are "breakthrough" (change the trajectory) and which are "operational" (keep the business running)? Are the breakthrough actions protected from being deprioritized when operational demands surge?
4. If the strategy requires coordination between teams, who owns the integration points? If nobody does, those are the points where execution will fragment.
5. What gets deprioritized to make room for these strategic actions? If the strategy adds without subtracting, execution teams will make their own subtraction choices — and those choices may undermine the strategy.
### Common gaps this reveals
- **The altitude mismatch**: Executive-level actions with no operational translation. "Enter the enterprise market" is a strategy-level statement; the teams need to know which segment, through which channel, with which product changes.
- **The missing catchball**: Actions designed without input from executors. The strategy assumes feasibility that hasn't been validated. This surfaces as "we had no idea this was coming" when strategy is announced.
- **The crowded-out breakthrough**: Breakthrough objectives that compete with operational demands for the same resources. Without explicit protection, operational urgency always wins — and the strategy quietly dies.
- **The orphaned integration**: Actions that depend on cross-team coordination, but no one owns the coordination itself. Each team optimizes their piece; the whole fragments.
**Dimension connections**: Primarily strengthens Dimension 3 (Action Coherence). Also feeds the "execution bottleneck" failure path.
---
## Using lenses in the review
### Selection
After reading the strategy document and extracting the kernel (Step 2), mentally check each lens trigger. Load the relevant lenses for Step 3. Most reviews activate one or two; activating all four suggests the strategy has broad structural gaps.
### Integration
Lens questions weave into the existing seven dimensions — they don't create separate evaluation sections. When a lens reveals a gap, record the finding under the relevant dimension (usually 1, 3, 6, or 7). The lens is the tool that found the gap; the dimension is where the gap lives.
### Reporting
If any lenses were applied, include a "Lens Findings" section in the review output (see review-template.md). This section captures what the lenses revealed that the core dimensions alone would likely have missed — particularly systemic issues that span multiple dimensions.
### What lenses don't do
- They don't ask the author to adopt a new framework or fill out a grid
- They don't add new dimensions to the review
- They don't replace the core kernel analysis
- They don't require the reviewer to explain the framework to the user — use the thinking, not the vocabulary
FILE:references/review-template.md
# Review Output Template
For formal or file-output reviews, produce one file: `strategy-review.md`. Write it in the user's working directory unless they specify otherwise. If the user asked for quick feedback or a chat-only take, deliver findings inline instead — see the chat-only branch in SKILL.md Step 5. This template applies only when file output is confirmed.
The tone is direct and specific. Every finding must point to evidence in the document. Every recommendation must be concrete enough that the author can act on it without asking "but what specifically should I do?"
---
```markdown
# Strategy Review: [subject from the strategy document]
_Review of [document name(s)]. Reviewed on [date]._
## Review Summary
[3-4 sentences. What this strategy is trying to do, the overall assessment of its structural integrity, and the single most important thing that needs attention. Don't bury the lede — if the strategy has a critical structural flaw, say it here.]
## What Works
[Specific strengths. Name what the author should protect and build on. This isn't a politeness section — genuinely good elements help anchor what "good" looks like for the rest of the review. 2-4 bullet points, each with a specific citation from the document.]
- **[Strength]**: [Why it works, with reference to the specific passage.]
## Critical Findings
_The 2-4 issues that most undermine the strategy's integrity. These should be addressed before the strategy is shared or acted on. Ordered by severity — most critical first. Lead with these because they are the highest-value part of the review._
### Finding 1: [Title — short, specific]
**Severity:** [Critical / Serious / Moderate]
**What's wrong:** [Specific description of the gap, risk, or failure path.]
**Why it matters:** [What goes wrong in execution if this isn't addressed.]
**Evidence:** [Passage from the document that demonstrates the issue.]
**Recommended fix:** [Concrete action to address it.]
### Finding 2: [Title]
...
## Dimension Ratings
_Supporting detail behind the critical findings. Each dimension evaluates one aspect of the strategy's structural integrity._
### 1. Diagnosis Quality — [Strong / Adequate / Weak / Missing]
**Assessment:** [2-3 sentences on what the diagnosis does well or poorly.]
**Evidence:** [Quote or cite the relevant passage.]
**Recommendation:** [If Adequate or below — specific guidance on what to change. Skip for Strong.]
### 2. Guiding Policy Strength — [Strong / Adequate / Weak / Missing]
**Assessment:** [2-3 sentences.]
**Evidence:** [Quote or cite.]
**Recommendation:** [If needed.]
### 3. Action Coherence — [Strong / Adequate / Weak / Missing]
**Assessment:** [2-3 sentences.]
**Evidence:** [Quote or cite. Name specific action pairs that reinforce or conflict.]
**Recommendation:** [If needed.]
### 4. Kernel Chain Integrity — [Strong / Adequate / Weak / Missing]
**Assessment:** [Read the strategy as "[Diagnosis]. Therefore, [policy]. Which means [actions]." Does it hold?]
**The chain:** [Write it out as a single paragraph. This makes gaps visible.]
**Recommendation:** [If needed. Name the weakest link.]
### 5. Bad-Strategy Patterns — [Strong / Adequate / Weak / Missing]
**Assessment:** [Which patterns, if any, are present.]
**Evidence:** [Quote specific passages for each pattern found. Name the hallmark.]
**Recommendation:** [For each pattern found — how to fix it.]
### 6. Assumption Exposure — [Strong / Adequate / Weak / Missing]
**Assessment:** [Are load-bearing assumptions identified?]
**Unstated assumptions found:** [List assumptions the reviewer identified that the document doesn't state.]
**Recommendation:** [If needed.]
### 7. Specificity and Falsifiability — [Strong / Adequate / Weak / Missing]
**Assessment:** [Could you evaluate this strategy in 12 months?]
**Evidence:** [Point to specific vague or unfalsifiable claims.]
**Recommendation:** [If needed.]
## Failure Path Analysis
_How this strategy could fail — not through dramatic crisis, but through the predictable patterns that kill most strategies. These are the risks the author may not have fully accounted for._
### [Failure path name — e.g., "Capability gap stalls the load-bearing action"]
**The scenario:** [Describe the failure path in concrete terms. What happens, in what order?]
**Likelihood:** [High / Medium / Low — and why.]
**Current mitigation:** [What, if anything, does the strategy do to prevent this? "None" is a valid answer.]
**Suggested mitigation:** [What would reduce the risk.]
### [Next failure path]
...
## Assumption Risk Map
_Load-bearing assumptions ranked by (impact if wrong) x (uncertainty). Focus on the assumptions that could break the strategy, not background assumptions._
| Assumption | Stated in doc? | Impact if wrong | Uncertainty | Risk |
|-----------|----------------|----------------|------------|------|
| [assumption] | Yes / No | High / Medium / Low | High / Medium / Low | [H x H = Critical, etc.] |
| ... | | | | |
[For the top 2-3 highest-risk assumptions, add a paragraph: what happens if this assumption is wrong, and what could the author do now to test it or reduce exposure.]
## Lens Findings
_Include this section when complementary review lenses (7S, Scorecard, Five Forces, Hoshin Kanri) were applied during the review, OR when interview lenses (landscape mapping, choice cascade, value innovation) appear in the strategy document or notes. Omit entirely if neither applies._
**Review lenses applied:** [List which review lenses were activated and why — the trigger condition observed in the document.]
**What the review lenses revealed:**
[For each applied review lens, describe what it found that the core seven dimensions alone would likely have missed. Focus on systemic gaps — the kind that span multiple dimensions or create failure paths the standard review might not catch. 2-4 findings total across all applied lenses.]
- **[Lens name] — [Finding title]**: [What the lens revealed. Point to the specific gap, unstated assumption, or misalignment. Reference the dimension where the finding was also recorded.]
**Interview lens audit:** [Include this subsection only when the strategy document or notes show that Wardley mapping, Playing to Win cascade, or Blue Ocean value innovation analysis was done during the interview.]
[For each interview lens that was applied, assess whether its findings survived into the draft. Did landscape mapping insights sharpen the diagnosis? Did cascade capability gaps appear in assumptions? Did ERRC moves shape the coherent actions? Flag interview lens findings that were dropped or softened — these often represent the sharpest strategic thinking that got polished away.]
- **[Interview lens] — [Assessment]**: [Whether the lens findings are reflected in the draft, partially reflected, or missing. If missing, what was lost.]
## Notes Cross-Reference
_Include this section only when strategy-notes.md or equivalent reasoning notes were available for review._
- **Open questions still open:** [Which questions from the notes remain unresolved in the draft? Are any of them show-stoppers?]
- **Patterns that crept back:** [Bad-strategy patterns caught during the interview that reappeared in the draft, possibly in softer language.]
- **Thinking that was sharpened then softened:** [Places where the notes show a sharper version of the diagnosis or policy than the draft contains.]
- **Lens findings not reflected:** [Landscape mapping, cascade, or value innovation findings from the notes that didn't make it into the draft.]
## Recommended Next Steps
_Ordered by impact. What should the author do with this review?_
1. **[Action]** — [Why this is the highest priority, what it addresses.]
2. **[Action]** — ...
3. **[Action]** — ...
```
---
## Notes on producing the review
- Write the review file in the user's working directory unless they specify another location.
- Quote the document. Don't make assertions about what it says — point to the passages. "The diagnosis states: '[quoted text]'" is credible. "The diagnosis is vague" without evidence is not.
- Keep the review proportional to the document. A short strategy memo doesn't need a 2000-word review. Match depth to depth.
- Critical Findings come before Dimension Ratings because they are the highest-value section. A reader who stops after Critical Findings should still walk away with the most important feedback. Dimension Ratings provide the supporting evidence and complete picture.
- The Failure Path Analysis section is where the most unique value lives. Most reviewers tell you what's wrong with the document; this section tells you what could go wrong in the *real world* because of what's in (or missing from) the document. Invest thought here.
- If the notes cross-reference reveals that the interview produced better thinking than the draft contains, say so directly. Drafts often smooth away the edges that made the thinking sharp.
- After writing, summarize in chat: overall assessment in one sentence, the single most important finding, and the recommended next action. Then stop.
Use when the user wants to build or think through a strategy via guided conversation — for a company, product, team, career, or initiative. Triggers on "help...
---
name: strategy-interview
description: "Use when the user wants to build or think through a strategy via guided conversation \u2014 for a company, product, team, career, or initiative. Triggers on \"help me figure out our direction\", \"what should we focus on\", strategic planning, competitive positioning, go-to-market strategy. Also catches indirect requests like prioritization struggles or \"we have too many priorities\". Does NOT review existing strategy documents (use strategy-review) or brainstorm project features (use brainstorm-beagle)."
user-invocable: true
---
# Strategy Interview
Turn Claude into a strategy interviewer who helps the user produce a strategy document grounded in the kernel framework (diagnosis, guiding policy, coherent action), enhanced with three complementary lenses applied when the conversation warrants them. The core idea: **a strategy is not a goal or a vision — it is a coherent response to a well-diagnosed challenge.**
The user runs this expecting a conversation, not a form. Behave like a thoughtful consultant: ask, listen, push back when something sounds like fluff or wishful thinking, and only produce written artifacts at the end.
<hard_gate>
Do NOT produce `strategy-draft.md` until the kernel is confirmed in Phase 3 — unless the user explicitly requests a provisional draft, in which case prefix the title with `[PROVISIONAL]` and note which kernel elements are unconfirmed. Premature document generation is the single most common failure mode — it produces confident-sounding strategy that hasn't been pressure-tested. Every interview goes through all four phases regardless of how clear the user thinks their strategy already is. "Clear" strategies are where unexamined assumptions do the most damage.
In-progress working notes under `.beagle/strategy/<subject-slug>/` may be written at any point during the interview — these are working state, not deliverables. Final `strategy-notes.md` is normally written at interview end. If the user stops mid-interview, update `.beagle/strategy/<subject-slug>/state.md` and optionally produce `strategy-notes.md` as a resume artifact, but do not write `strategy-draft.md`.
</hard_gate>
## What the framework requires
Before starting, load these into working memory. If anything feels fuzzy, read `references/kernel.md` and `references/bad-strategy.md` — they are the entire basis of the interview.
**The kernel of good strategy** has three parts:
1. **Diagnosis** — a judgment about what is actually going on. Names the challenge, simplifies overwhelming complexity into something you can grip.
2. **Guiding policy** — the overall approach chosen to cope with or overcome the obstacles identified in the diagnosis. Not a goal. A direction that rules things in and rules many things out.
3. **Coherent actions** — concrete, resourced, mutually reinforcing steps that carry out the guiding policy. Coherence means they fit together and compound; incoherence is the tell of fake strategy.
**The four hallmarks of bad strategy** (watch for these constantly):
- **Fluff** — abstract, buzzword-heavy language that sounds sophisticated but says nothing.
- **Failure to face the challenge** — no clear statement of what the actual problem is.
- **Mistaking goals for strategy** — "grow revenue 30%" is a goal. Strategy is *how*, and more importantly *why that how*.
- **Bad strategic objectives** — either a laundry list with no priority, or blue-sky objectives that restate the problem as if wishing made it so.
**Additional anti-pattern** (not a Rumelt hallmark, but equally dangerous in practice):
- **Strategy by analogy** — copying what worked for another company without examining whether the conditions match. "Spotify did squads" is not a strategy argument.
## Complementary lenses
The kernel is always the backbone. Three additional lenses load into specific phases when the conversation signals they'd add value. **Do not force them.** Most interviews use one or two; some use none. Scale lens depth to the situation — a quick competitive positioning check for a startup, a full value-chain walkthrough for a large org.
| Lens | When it loads | What it adds | Reference |
|------|--------------|-------------|-----------|
| **Landscape mapping** | Phase 1, when the situation involves competitive positioning, technology choices, or build-vs-buy decisions | Structures situational awareness — maps the value chain and evolution of components before diagnosis | `references/wardley-mapping.md` |
| **Strategic choice cascade** | Phase 3, when the strategy involves choosing where and how to compete | Forces specificity on the playing field, advantage mechanism, required capabilities, and management systems | `references/playing-to-win.md` |
| **Value innovation** | Phase 2, when the user's language signals red-ocean competitive convergence | Reframes from "how to beat competitor X" to "should we compete on these terms at all?" | `references/blue-ocean.md` |
**Lens selection happens organically, not as a menu.** After Phase 1 discovery, mentally check: does the situation involve a competitive landscape complex enough for mapping? Is the user locked in competitor-matching thinking? Will the kernel need a capabilities pressure-test? Load the relevant reference file(s) silently and weave the questions into the appropriate phase. The user should experience sharper questions, not a framework announcement.
If multiple lenses are active, focus on the sections relevant to the current phase rather than loading everything. Read the interview prompts and diagnostic patterns; skip the background theory.
## Interview workflow
Run the interview in four phases. **Do not skip to Phase 4.** The value is in Phases 1-3.
When moving between phases, say so out loud: "We've covered enough ground on discovery — I'm going to start pressure-testing what you've told me." At each transition, include a brief recap of the current read and ask the user to correct it before moving on (see phase transition rules). This anchors context and catches misunderstandings early.
### Phase 0 — Check for prior work
Before starting a new interview, check for prior state in two places:
- **Durable state**: Look for `.beagle/strategy/` directories. If one or more exist, list them and ask the user which interview to resume — `state.md` inside each directory has the full interview ledger.
- **Final artifacts**: Check if `strategy-notes.md` already exists in the working directory.
If prior state is found:
1. Read `state.md` (preferred) or `strategy-notes.md` silently.
2. Summarize where the interview left off: "Last time we got through [phase] — here's where we landed: [one-sentence kernel summary]. Want to pick up from there, or start fresh?"
3. If continuing and the `.beagle/strategy/<subject-slug>/` directory has substantial content, prefer spawning a subagent to read all files and produce a concise briefing — this preserves main-context budget. If subagents are not available, read `state.md` directly and skim other files for key entries. Jump to the appropriate phase with the context loaded.
4. If no files are found but the user references a prior interview, ask them to point to the notes file or `.beagle/strategy/` directory.
5. If both `strategy-notes.md` and `strategy-draft.md` exist, check whether they're from the same interview by comparing the subject lines. If they don't match, ask the user which interview they want to continue (or whether to start fresh).
This matters because strategy interviews frequently span sessions. Don't make the user re-explain what they already told you.
### Phase 1 — Discovery (broad, no kernel framing yet)
Start by explaining what's about to happen:
> "I'm going to ask you some open questions to understand the situation. I'll push back if things sound vague — that's the point. Once I understand the terrain, we'll shape it into a strategy. Sound good?"
After understanding the subject, calibrate depth. A personal career strategy needs 10-15 minutes of discovery; a business unit strategy for a large org might need 30+. Adjust the number of discovery questions and the rigor of Phase 2 accordingly — don't run a 45-minute interrogation for someone thinking through whether to pivot their side project.
Then ask discovery questions. Ask **one or two at a time**, not a wall. Adapt based on answers. Cover this ground in roughly this order, but let the user lead:
- **The subject**: What is the strategy *for*? (Company? Product line? Team? Career?) Scope and timeframe.
- **The audience**: "Who needs to read the final document, and what decision are they making with it?" Board presentation vs. engineering team vs. founder's own thinking — this shapes tone, detail level, and emphasis.
- **The trigger**: Why now? What changed, what's broken, what opportunity appeared? If "we just do this every year," that's a finding — note it.
- **The situation**: Landscape — competitors, customers, technology shifts, internal constraints, political reality.
- **Assets and constraints**: What do they actually have — money, people, brand, tech, relationships, time? What can't or won't they do?
- **What they've tried**: Past attempts and outcomes. Past failures are the most honest data.
- **What they think the answer is**: Ask this late, not early. The user often has a hunch — acknowledge it, then deliberately set it aside: "Good — I'm going to hold onto that but explore the space a bit more before we come back to it." Their intuition is data, not a conclusion.
You are looking for: the real challenge underneath the stated challenge, the one or two asymmetries they could exploit, and the things they're avoiding saying.
If the subject spans multiple entities (portfolio of products, multi-sided platform, several business units), scope to the one that matters most for this conversation, or agree to produce separate kernels. A single kernel that tries to cover a portfolio will be too vague to be useful.
#### Conflicting stakeholders
When the user represents multiple internal factions or is synthesizing input from several people, surface the disagreement explicitly rather than averaging it away. When `.beagle/strategy/<subject-slug>/` exists, capture both views in `evidence.md` with `contested` tags so the disagreement is preserved in durable state; otherwise note it for inclusion when final files are written. Either way, summarize contested points in the final `strategy-notes.md`. Help the user see where the real fork in thinking is — often the disagreement *is* the diagnosis.
#### Landscape mapping (when warranted)
If the situation involves competitive positioning, technology choices, or build-vs-buy decisions, load `references/wardley-mapping.md` and weave its questions into discovery. The goal is to understand which components in the user's value chain they control vs. depend on, and where those components sit on the evolution curve (genesis → custom → product → commodity). This surfaces structural insights — "you're building custom what's becoming commodity," "your competitor is further along this curve" — that dramatically sharpen the eventual diagnosis.
Skip this for personal strategies, career pivots, or situations with no competitive landscape. When in doubt, ask one or two probing questions about the value chain; if the user's answers reveal complexity worth mapping, continue. If not, move on.
### Phase 2 — Challenge and pressure-test
Before moving to the kernel, push on what you heard. Apply the bad-strategy filter in real time:
- **Abstract problems** ("we need to innovate more," "alignment issues"): *"Can you give me a specific example from the last 90 days?"*
- **Goals masquerading as strategy** ("we're going to double ARR"): *"Right — that's the goal. What's the theory of how that actually happens? What has to be true?"*
- **Laundry lists** (11 priorities): *"If you could only do three of these, which three, and why those?"*
- **Missing obstacles** (desired end state, no friction): *"What's stopping this from already being the case?"* — this forces the diagnosis.
- **Fluff** (synergy, leverage, ecosystem, platform, holistic, transformational): Reflect it back plainly: *"When you say 'platform play,' what would that literally look like on a Tuesday?"*
Be direct but not adversarial. Frame pushback as "let me make sure I understand" rather than "that's wrong." If the user resists, note the resistance and move on — surface it in the reasoning notes later.
See `references/bad-strategy.md` for more patterns and redirection scripts.
#### Value innovation challenge (when red ocean signals appear)
If the user's language during discovery and pressure-testing reveals competitive convergence — persistent competitor fixation, benchmarking-as-strategy, feature arms races, margin erosion framed as inevitable — load `references/blue-ocean.md` and deploy its challenge frame. The core question: "Are you fighting over existing, saturated demand when you could create new demand?"
Use the conversational strategy canvas (asking the user to name the 5-6 factors everyone competes on, then checking where all offerings converge) and the four actions framework (eliminate, reduce, raise, create) to test whether the user's problem is their position in the market or the market structure itself. If a value-innovation insight lands, it often rewrites the diagnosis entirely — loop back to Phase 1 briefly to explore the noncustomer landscape, then re-enter Phase 3 with a reshaped kernel.
**Do not force this.** If the user has a clear defensible advantage in existing space, or the problem is execution not positioning, the red ocean reframe is bad advice. See the reference file for explicit guardrails on when to skip it.
### Phase 3 — Map to the kernel
Once you have a grounded picture, make the kernel explicit. Walk through it collaboratively, one piece at a time:
1. **Diagnosis**: "Here's what I'm hearing as the actual challenge: [one or two sentences]. Does that land? What would you sharpen?"
A good diagnosis is specific, often uses analogy, and simplifies without lying. If you can't write it in three sentences, you don't have one yet — go back to Phase 1 on the fuzzy thread.
2. **Guiding policy**: "Given that diagnosis, what's the overall approach? Not the list of things to do — the principle that tells you which things to do and which to refuse."
Push for something that rules things out, not just in. Good guiding policies create advantage by focusing force on a pivot point.
3. **Coherent actions**: "If that's the approach, what are the 3-6 concrete actions that carry it out, and how do they reinforce each other?"
Ask explicitly: *"Does action B make action A easier or harder?"* Incoherent action sets are the most common failure mode.
If any piece is weak, say so and loop back. The kernel is only as strong as its weakest part.
See `references/kernel.md` for deeper guidance on each element.
#### Strategic choice cascade (when competitive positioning is central)
When the strategy involves choosing where and how to compete — a business picking segments, a product competing for users, a team positioning itself in a large org — load `references/playing-to-win.md` and use the cascade to pressure-test and extend the kernel:
- **Where to play** forces the guiding policy to name specific segments, geographies, or channels — and what's excluded. If the guiding policy works for every possible customer, it's missing a playing-field choice.
- **How to win** demands the structural advantage mechanism. Not "be better" — the asymmetry that makes this work for the user and not for a competitor who copies the strategy.
- **Capabilities** are the feasibility check on coherent actions. For each major action, ask: does the org actually have the capability to do this, or does the strategy assume it? Unfunded capability assumptions are where most strategies quietly break.
- **Management systems** answer "and then what?" — how does the org know the strategy is working, and what prevents slow drift back to the old way?
Fold cascade findings into the kernel output (sharper guiding policy, capability gaps as assumptions in the notes) rather than producing a separate cascade document. Skip the cascade for internal reorgs, personal strategies, or existential "should we exist" questions — the kernel handles those on its own.
### Coherence check (gate to Phase 4)
Before producing documents, read the kernel back as a single paragraph: "[Diagnosis]. Therefore, [guiding policy]. Which means we will [actions]." Say it to the user. If it doesn't read as a logical chain — if the "therefore" or "which means" feel forced — something is broken. Loop back to whichever element is weakest.
Then check the guiding policy's *exclusions* against the coherent actions. If the policy rules something out but an action quietly reintroduces it ("We said we're not pursuing enterprise, but action 4 is build SSO support"), surface the conflict.
**Pass condition (all required before Phase 4 file writes):** The user has confirmed the one-paragraph kernel readback (including corrections applied in-chat), and any exclusion-vs-action conflict is resolved or explicitly parked with rationale in `strategy-notes.md` (or `state.md` / `evidence.md` if durable state exists).
### Phase 4 — Produce the deliverables
Before writing files, confirm the user wants file output: "I'd like to write two files — a strategy draft and reasoning notes. Want me to write those, or would you prefer I summarize in chat?" If chat-only, deliver the "At a glance" block inline and offer to write files later. Confirm the output path — don't assume the working directory is correct.
**Compose from artifacts, not memory.** If `.beagle/strategy/<subject-slug>/` exists, use its files as the primary source for document composition rather than reconstructing from the chat transcript:
1. Update `state.md` and `evidence.md` with any final-phase findings.
2. Write or update `composition.md` with the confirmed kernel, explicit exclusions, success signals, unresolved assumptions, and selected evidence with source tags.
3. Compose `strategy-draft.md` and `strategy-notes.md` from the `.beagle/strategy/<subject-slug>/` artifacts. If subagents are available, prefer spawning one — a fresh context reading persisted files produces better documents than the main context reconstructing from a long transcript. Without subagents, re-read each artifact file before composing.
4. Before treating `strategy-draft.md` as done, run the **Source discipline** sequence (section **Source discipline (gate before market/competitor claims in files)**) and meet its **Pass condition** (inventory → classify → tag or remove unsupported claims).
When — and only when — the kernel feels solid, produce **two files** in the user's working directory (or wherever they indicate):
1. **`strategy-draft.md`** — the draft strategy document, following `references/output-template.md`. The artifact they share, revise, and eventually publish.
2. **`strategy-notes.md`** — reasoning notes: what you heard, what you pushed back on, things the user couldn't answer, assumptions that need testing, bad-strategy patterns caught, and open questions. For the user's own thinking, not for sharing.
After writing both files, give a short chat summary: diagnosis in one sentence, guiding policy in one sentence, top open question. Then stop.
## Style and posture
- **Interview, don't lecture.** The user knows their situation; you know the framework. Ask the questions the framework demands.
- **One or two questions per turn.** Walls of questions get walls of shallow answers.
- **Quote the user's own words back** when formalizing the kernel — builds trust and catches misinterpretation early.
- **Don't name-drop frameworks or sources.** The framework shows up in what you ask, not in citations.
- **It's okay to end inconclusively.** If the user doesn't have a diagnosis yet, say so in `strategy-notes.md` and recommend what they'd need to learn first. An honest "not yet" is far more valuable than a confident fake strategy.
- **Resist the urge to soften.** Your natural instinct will be to produce balanced, diplomatic language — exactly wrong for a diagnosis. A diagnosis that everyone is comfortable with isn't specific enough. The user can always soften later; your job is to find the sharp version first.
- **Lean on domain experts.** When the user is in a highly specialized domain (biotech, defense, regulated industries, deep tech), lean harder on their expertise — ask more "teach me" questions. Flag when a diagnosis rests on domain knowledge you can't verify. Never confidently diagnose in unfamiliar territory; use the user's own framing and push for specificity rather than substituting shallow knowledge.
## Source discipline (gate before market/competitor claims in files)
Strategy lenses can invite confident claims about markets, competitors, and trends. **Do not invent** statistics, competitor capabilities, or industry trends.
**Sequence before writing or materially editing `strategy-draft.md` (and when updating `composition.md`):**
1. **Inventory:** List each sentence that asserts market size, competitor behavior, industry trend, regulatory fact, or other externally verifiable claim.
2. **Classify each line:** `user-provided` (quote or close paraphrase from the user), `inference` (follows from user statements and is labeled as such in notes), or `unsupported`.
3. **Handle `unsupported`:** Mark as `[assumption — verify]` in both `strategy-draft.md` and `strategy-notes.md` (and in `evidence.md` if durable state is in use), or remove the claim.
4. **Pass condition:** Every such sentence in `strategy-draft.md` is either anchored to the user's words / agreed inference in notes **or** explicitly tagged `[assumption — verify]`. Zero unsourced numbers or "market facts" invented to sound credible.
If the user wants research-backed claims, they must supply sources or run a separate research pass — do not fabricate citations.
## Durable interview state
Long interviews lose fidelity — facts blur, contested points get averaged away, the final document drifts from what the user actually said. For interviews that grow beyond a few substantive turns, maintain working state in `.beagle/strategy/<subject-slug>/`.
### When to start
Create the directory when any of these appear:
- The interview exceeds roughly 5-7 substantive user replies.
- Multiple stakeholders or contested views surface.
- Multiple possible strategies or scopes appear.
- Any complementary lens is activated.
- Phase 4 is approaching and no state files exist yet.
`<subject-slug>` is kebab-case from the strategy subject — e.g., `platform-team-h1-2026`.
### Working files
| File | Purpose | Created when |
|------|---------|-------------|
| `state.md` | Compact interview ledger | Always, once directory exists |
| `evidence.md` | User quotes, facts, assumptions, contested points | When notable evidence appears |
| `lens-notes.md` | Wardley / cascade / value innovation findings | When a lens is used |
| `composition.md` | Pre-draft outline for Phase 4 | Before final document writing |
### State ledger (`state.md`)
A ledger, not a transcript. Update at phase boundaries and every 5-7 substantive user replies.
```yaml
subject: [what the strategy is for]
audience: [who reads the final document]
timeframe: [planning horizon]
current_phase: [0-4]
last_completed_phase: [0-4 or none]
trigger: [why now]
current_next_question: [question that would resume the interview]
diagnosis_candidate: [one sentence, or "none yet"]
guiding_policy_candidate: [one sentence, or "none yet"]
coherent_actions_candidate: [action names, or "none yet"]
explicit_exclusions: [what the strategy will not do]
lenses_used: [landscape-mapping, choice-cascade, value-innovation, or none]
decisions_made:
- [decision and rationale]
open_questions:
- [question]
unresolved_weak_spots:
- [weak spot and why it matters]
```
### Evidence tagging (`evidence.md`)
Tag each entry to prevent unsourced assumptions from becoming confident claims:
- **`user said`** — direct statement or close paraphrase.
- **`inference`** — derived from what the user said, not stated explicitly.
- **`assumption — verify`** — claim the strategy depends on, unconfirmed.
- **`contested`** — stakeholders or evidence disagree.
- **`decision`** — choice made during the interview, with rationale.
```markdown
- [user said] "We lose deals to X on onboarding time, not features." (Phase 1)
- [inference] Onboarding problem is downstream of product complexity. (Phase 2)
- [assumption — verify] Competitor X's onboarding is faster — unverified. (Phase 1)
- [contested] Engineering: platform scales. Sales: customers hit limits at 10k. (Phase 2)
- [decision] Scoped to platform team; parking enterprise expansion. (Phase 1)
```
### Using fresh context for artifact-heavy operations
The `.beagle/strategy/<subject-slug>/` directory exists so that fresh context can read it — not just the degraded main conversation thread. When the environment supports subagents, prefer using them for operations that need the full artifact set. When subagents are not available, re-read the artifact files directly before each operation — the structured persisted state is still a better source than raw chat memory, even without the fresh-context benefit.
**Phase 4 composition**: If subagents are available, spawn one to compose the final deliverables — it reads `state.md`, `evidence.md`, `lens-notes.md`, and `composition.md` with fresh context, then writes `strategy-draft.md` and `strategy-notes.md`. The main context provides the confirmed kernel and last-minute adjustments; the subagent does the document assembly. Without subagents, re-read each artifact file before composing.
**Evidence audit before composition**: Before entering Phase 4, optionally audit `evidence.md` — flag unresolved `assumption — verify` entries, surface `contested` points that were never resolved, and identify evidence gaps. A subagent is ideal for this; without one, scan the file directly and note findings before composing.
**Resumption briefing**: When resuming a long interview in Phase 0 and the `.beagle/strategy/<subject-slug>/` directory has substantial content, a subagent can read all files and produce a concise briefing (current phase, kernel candidates, open questions, unresolved weak spots). Without subagents, read `state.md` first (it has the ledger), then skim other files for key entries rather than loading everything into the main context.
## Phase transition rules
These gates prevent the most common failure mode: producing a strategy document before the thinking is done.
- **Phase 1 -> 2**: Move on when you have a concrete picture of the situation, the trigger, and the landscape. If you can't summarize the situation in a paragraph using the user's own words, you're not ready.
- **Phase 2 -> 3**: Move on when major bad-strategy patterns have been surfaced and addressed (or explicitly noted as unresolved). If the user's description of the problem is still mostly goals and aspirations, stay in Phase 2.
- **Phase 3 -> 4**: Move on when all three kernel elements exist and the user has confirmed each one. If the guiding policy doesn't clearly address the diagnosis, or the actions don't carry out the guiding policy, loop back.
**Recap checkpoints**: At each phase gate, briefly summarize the current read — diagnosis candidate, emerging policy direction, key facts — and ask the user to correct it before moving on. If `.beagle/strategy/<subject-slug>/` exists, update `state.md` at the same time.
**Incomplete or early exit:**
- If the user stops mid-interview, update `.beagle/strategy/<subject-slug>/state.md` (if it exists) with the current interview state and next question. Optionally produce `strategy-notes.md` as a resume artifact. Do not write `strategy-draft.md`.
- If the user explicitly asks for a provisional draft before the kernel is confirmed, write it but prefix the title with `[PROVISIONAL]` and note which kernel elements are unconfirmed. This is the only case where a partial draft is acceptable.
## Variant: improving an existing strategy through conversation
If the user brings an existing strategy document and wants to *improve* it through guided conversation:
> **Routing note:** If the user wants a standalone critique or evaluation of an existing strategy document — without an interactive interview to improve it — use the `beagle-analysis:strategy-review` skill instead. This variant is for when the user wants to use the document as a starting point for a collaborative improvement conversation.
1. Read the document first. If the document is in a format Claude can't read (PDF, slides, Figma), ask the user to paste the relevant sections as text.
2. Run the bad-strategy filter on it before any discovery questions — this is the primary value the user is looking for.
3. Lead with strengths before gaps. If the doc is partially good, say so — name what works and why before listing what's missing. Don't trash a document that's 70% solid just because you found problems.
4. Calibrate pushback intensity. A polished board deck that's about to ship needs precise, high-stakes feedback. A rough internal draft needs directional guidance and encouragement to keep going. Match the energy to the artifact's maturity.
5. Phase 1 becomes filling gaps: what context is missing from the doc that you'd need to evaluate it fairly?
6. Phases 2-4 proceed as normal, but the existing doc provides the starting kernel to pressure-test rather than building from scratch.
7. If the doc uses a different strategic framework (OKRs, V2MOM, SWOT-only), don't force a kernel translation. Work within their frame first — identify what's working in their terms. Then surface what the kernel would add: "Your OKRs name what you want to achieve, but I don't see the diagnosis — what's the challenge these objectives respond to?" Translate only when the user sees value in it.
## Reference files
- `references/kernel.md` — Detailed guidance on diagnosis, guiding policy, and coherent action with examples.
- `references/bad-strategy.md` — The five hallmarks of bad strategy, signal phrases, and redirection scripts.
- `references/wardley-mapping.md` — Landscape mapping: value chains, evolution stages, and diagnostic patterns. Load during Phase 1 when competitive/technology landscape is complex.
- `references/playing-to-win.md` — Strategic choice cascade: where to play, how to win, capabilities, and management systems. Load during Phase 3 for competitive strategy.
- `references/blue-ocean.md` — Value innovation: competitive convergence detection, four actions framework, noncustomer tiers. Load during Phase 2 when red ocean signals appear.
- `references/output-template.md` — Exact structure of the output files.
- `references/pressure-tests.md` — Expected behaviors for common entry points. For skill validation.
FILE:references/bad-strategy.md
# Bad Strategy: Patterns to Catch in Real Time
There are four hallmarks of bad strategy (Rumelt's canonical set), plus one additional anti-pattern. Listen for all five constantly during the interview and redirect when you hear them. The user will often not realize they're doing it — these patterns feel like strategy from the inside.
## 1. Fluff
**What it sounds like:** Abstract, buzzword-heavy language that creates an illusion of expertise without conveying actual content.
**Classic tells:**
- "Synergies," "leverage," "ecosystem," "platform play," "holistic approach"
- "Customer-centric," "data-driven," "world-class"
- "Transformational," "next-generation," "best-in-class"
- Any sentence that could appear unchanged in any other company's deck
**Why it's dangerous:** Fluff lets people feel like they've said something meaningful without committing to any claim that could be wrong. It's strategy-shaped language.
**Redirection scripts:**
- "Can you translate that into something concrete? What would a skeptic see on a Tuesday that tells them this is happening?"
- "If I recorded a video of someone doing this well, what would I see them doing?"
- "Pretend I'm a new engineer on the team — describe it in plain words."
## 2. Failure to Face the Challenge
**What it sounds like:** The conversation is about ambitions, targets, and aspirations, but nobody has said what the actual obstacle is. When you ask "what's the problem?" you get a description of the desired outcome instead.
**Classic tells:**
- "We want to be the leader in X." (What's stopping that?)
- "We need a strategy for growth." (Growth is prevented by what?)
- "Our strategy is to win in Europe." (Why aren't you winning already?)
- Long SWOT analyses with no prioritization, no narrative, no "therefore."
**Why it's dangerous:** If you don't name the obstacle, you can't tell whether any action would help. You end up doing everything, which is the same as doing nothing.
**Redirection scripts:**
- "What's stopping this from already being true?"
- "If this is so obviously good, why haven't you done it yet? What's in the way?"
- "Let's work backwards. What's the *problem* that this outcome would solve?"
- "What would have to be true for us to succeed — and which of those things is currently not true?"
## 3. Mistaking Goals for Strategy
**What it sounds like:** "Our strategy is to grow revenue 30% YoY while improving margins and expanding internationally." That's a goal stack with zero information about *how* or *why that how*.
**Classic tells:**
- Any "strategy" statement that is entirely numbers and outcomes
- OKRs presented as strategy
- "We will become the #1 player in..."
- Mission statements with targets bolted on
**Why it's dangerous:** Goals tell you where you want to end up. Strategy is the theory of how getting there is actually possible given real obstacles and finite resources. Without that theory, you're just wishing louder.
**Redirection scripts:**
- "Okay — that's the goal. What's the theory of how it happens?"
- "What has to be true about the world, or about us, for this to work?"
- "If you woke up tomorrow and this had happened, what would be the story of how? Who did what?"
- "A competitor has the same goal. What are we doing differently that makes us more likely to hit it?"
## 4. Bad Strategic Objectives
Two sub-flavors:
### 4a. The "dog's dinner" — a laundry list
**What it sounds like:** 11 priorities, each with 4 sub-priorities, all marked "high importance." Nothing has been traded off.
**Why it's dangerous:** A list without priority is not strategy; it's an inventory of wishes. When resources are scarce (always), everything-is-important means nothing-is-important.
**Redirection scripts:**
- "If you could only do three of these, which three?"
- "Which of these, if done well, makes the others easier?"
- "What's on this list because it's actually important, and what's on it because someone would be upset if it weren't?"
### 4b. Blue-sky objectives
**What it sounds like:** The "strategy" restates the problem as if naming it solved it. ("We will eliminate technical debt." "We will become a product-led organization.")
**Why it's dangerous:** It assumes away the work. If eliminating tech debt were easy, it would already be eliminated. The strategy needs to explain what changes about *how*, not just declare that it will happen.
**Redirection scripts:**
- "What's the mechanism? What changes that makes this possible now when it wasn't before?"
- "If this were easy, it would already be done. What's the thing that has to give?"
## 5. Strategy by Analogy (additional anti-pattern)
**What it sounds like:** References to what successful companies did, presented as self-evident justification. "Spotify did squads, so we should too." "Netflix has a culture deck, let's write one."
**Classic tells:**
- "X did it and it worked"
- Company names used as strategy arguments
- Copying visible practices without understanding invisible prerequisites
- "We should do what [company] does" without examining fit
**Why it's dangerous:** The success conditions of the original strategy are invisible. The adopter copies the *what* without the *why*, and the structural context that made it work. Spotify's squads worked because of Spotify's engineering culture, hiring bar, and product shape — not because someone drew a matrix on a whiteboard.
**Redirection scripts:**
- "That worked for [company] under specific conditions — what's similar about your situation, and what's different? Which of those differences could break the approach?"
- "What problem were they solving when they did that? Is that the same problem you have?"
- "What's the part of their approach you *can't* see from the outside? That's usually where the magic is."
## Posture when pushing back
Push back directly but stay on the user's side. You are not a critic; you are a careful reader of their thinking. Frame pushback as curiosity ("help me understand...") rather than judgment.
If the user pushes back on your pushback, don't escalate — note the disagreement in `strategy-notes.md` as an open question and keep moving. The goal is a better document at the end, not winning an argument in the middle.
If the user explicitly says they don't want to be challenged on a particular point right now, respect that and flag it in the notes file. Sometimes people need to think out loud before they're ready to stress-test.
FILE:references/blue-ocean.md
# Value Innovation: When the Real Problem Is the Competitive Arena Itself
This reference loads during Phase 2 when the user's language suggests they are locked in competitive thinking — fighting for share in a crowded, commoditizing space where every player converges on the same value factors. The intervention reframes the conversation: instead of "how do we beat competitor X," ask "should we be competing on these terms at all?"
Do not name-drop frameworks, books, or authors to the user. The concepts show up in what you ask, not in citations.
## When to deploy this lens
### Red ocean signals — patterns in the user's language
Listen for these during discovery and pressure-testing. Any cluster of three or more is a strong signal.
**Classic tells:**
- Persistent competitor fixation: "They launched X, so we need to match it." "We're losing deals to Y on feature Z."
- Benchmarking as strategy: "We need to be best-in-class on every dimension they compete on."
- Margin erosion framed as inevitable: "The market is commoditizing — everyone's cutting price."
- Feature arms race: "We need to close the gap on [feature list]." "We're behind on 4 of 7 evaluation criteria."
- Market share as the goal: "We need to take 3 points of share from the leader."
- Industry conventions treated as laws: "That's just how this category works." "Customers expect X, Y, Z."
- Customer segmentation that mirrors competitors exactly: same segments, same tiers, same buyer personas.
**What it sounds like as a whole:** The user is trying to be a slightly better version of their competitors rather than a fundamentally different kind of offering. They're optimizing within an accepted set of rules nobody has questioned.
**Why it's dangerous:** Incremental improvement in a crowded space produces incremental results at increasing cost. When every competitor converges on the same value factors, the only differentiator left is price — and that's a race nobody wins.
### When NOT to use this lens
Not every strategy needs a value-innovation reframe. Skip it when:
- The user has a clear, defensible advantage in the existing space (cost structure, regulatory moat, network effects, switching costs). Telling them to abandon working terrain is bad advice.
- The market is genuinely growing and the user's share problem is an execution problem, not a positioning problem. Blue ocean thinking applied to an execution gap produces distraction, not insight.
- The user is in a winner-take-all market where network effects or standards lock-in make creating a new space structurally impractical.
- The user is early-stage and hasn't established product-market fit yet. They need to find *one* market that works before they can think about redefining categories.
If you're unsure, test with one or two questions. If the user's situation clearly fits within an existing competitive frame, respect that and move on.
## Core concepts — how to use them in the interview
### 1. The competitive convergence problem
Every industry develops a shared mental model of "what we compete on." Over time, all players converge on the same factors: more features, lower price, better support, faster delivery. The strategy canvas is the diagnostic tool — plot your offering and competitors' offerings against the factors the industry competes on, and look for where everyone's curves overlap.
You won't literally draw this with the user. Instead, build the picture conversationally.
**Interview prompts:**
- "Walk me through the 5-6 factors that matter most when a customer evaluates you versus alternatives. Price, features, support — what's on that list?"
- "Now — for each of those factors, how different are you from the top two competitors? Where are you roughly the same?"
- "If I lined up your offering and your two closest competitors on those dimensions, where would a buyer actually see a difference?"
- "Which of those factors does your industry spend the most time and money competing on — and are customers actually willing to pay more for improvement there, or has it become table stakes?"
What you're listening for: if the user describes 5-6 factors and their offering is roughly similar to competitors on most of them, that's the convergence pattern. The entire industry is fighting over the same saturated, convention-bound space.
### 2. Breaking the value-cost tradeoff — Eliminate, Reduce, Raise, Create
The conventional assumption is that you can either differentiate (high cost) or lead on cost (low differentiation). Value innovation breaks this by changing *which* factors you compete on, not just *how hard* you compete on the existing ones.
Four moves:
- **Eliminate** — which factors does the industry take for granted that buyers don't actually value, or that serve the industry's ego more than the customer's need?
- **Reduce** — which factors have been over-designed well beyond what buyers use or need?
- **Raise** — which factors should be raised well above the industry standard because they solve real pain?
- **Create** — which factors should the industry offer that it has never offered — factors that would unlock entirely new demand?
The classic illustration: Cirque du Soleil eliminated animal acts and star performers (massive cost), reduced the thrill and danger emphasis, raised the artistic and theatrical production quality, and created an experience closer to theater than circus — attracting adults who would never buy a ticket to Ringling Brothers.
Southwest Airlines eliminated meals, lounges, seat assignments, and hub routing (cost and complexity), reduced turnaround time, raised departure frequency, and created a "fun" low-cost flying experience that pulled customers away from driving, not just from other airlines.
**Interview prompts:**
- "Think about what your industry competes hardest on. Which of those factors do your customers actually care about — and which ones do they tolerate because everyone offers them?"
- "Is there something your customers are over-served on? Something the industry keeps improving that buyers stopped caring about three iterations ago?"
- "What's a pain point in the buyer's experience that nobody in your space addresses because it's considered 'not our problem'?"
- "If you could strip out half of what you currently offer and redirect that effort into one or two things done dramatically better, what would you strip and what would you pour into?"
### 3. Three tiers of noncustomers — where new demand lives
Most strategy focuses on existing customers. But the biggest growth often comes from people who *aren't* buying — not because they don't have the need, but because the current offerings don't work for them.
Three tiers:
- **Soon-to-be noncustomers** — they buy minimally, out of necessity, and are actively looking for alternatives. They're on the edge of your market and would leave if they could.
- **Refusing noncustomers** — they've considered your category and consciously decided against it. They see the offerings and say "not for me." Their objections reveal what the category gets wrong.
- **Unexplored noncustomers** — they've never considered your category at all because it seems irrelevant to them. The largest pool, the hardest to see.
Nintendo Wii is the textbook case. Sony and Microsoft competed for hardcore gamers (existing customers) with better graphics and processing power. Nintendo asked: who isn't gaming? Families. Older adults. People intimidated by complex controllers. The Wii created a motion-based, simple, social experience — and outsold both competitors by pulling in people who'd never owned a console.
**Interview prompts:**
- "Who *almost* buys your product but doesn't quite get there? What's the friction that stops them?"
- "Is there a group that has consciously looked at what your industry offers and said 'no thanks'? What turned them off?"
- "Who has the underlying need your product serves but has never even considered your category as a solution? What would have to change for them to see it as relevant?"
- "Right now you're fighting for share of existing buyers. What would it look like to create new buyers instead?"
### 4. Buyer experience pain points
Map the buyer's full journey — from purchase evaluation through delivery, use, maintenance, and disposal. At each stage, ask: is there friction, waste, complexity, or risk that the industry accepts as normal but buyers would love to see eliminated?
This is the operational version of "Create" in the four actions framework. It finds specific places where value innovation can land.
**Interview prompts:**
- "Walk me through what it's like to buy and use your product, from the customer's first Google search to a year after purchase. Where do they get frustrated, confused, or surprised by cost?"
- "What do your customers complain about that every company in your space treats as 'just how it works'?"
- "Is there a stage of the experience — onboarding, support, renewal, switching — that nobody in your industry has tried to make dramatically better?"
## How blue ocean findings reshape the kernel
A value-innovation insight doesn't sit alongside the kernel — it transforms it. When you surface this pattern, help the user see how it rewrites what they already have:
**Diagnosis shift:** The diagnosis often moves from "we're losing to competitor X on features" to something structural: "The real challenge isn't that we're behind on features — it's that the entire category has converged on the same value factors, and further investment in those factors produces declining returns. We're in a commodity fight."
**Guiding policy shift:** Instead of "catch up on features and compete on execution," the policy becomes "redefine what we compete on — eliminate [factors], create [factors], and target [noncustomer tier] rather than fighting for existing market share."
**Coherent action shift:** Actions move from matching competitors to deliberately diverging: stopping investment in industry-standard factors, building capabilities that serve noncustomers, and pricing/packaging for a different buyer entirely.
**Redirection scripts for the kernel mapping:**
- "We started with a diagnosis about falling behind competitors. But what I'm hearing now is something bigger — the industry itself has converged, and catching up just puts you back in the same commodity fight. Is the real challenge the category structure, not your position in it?"
- "Your guiding policy was about competing harder on the same dimensions. But you've also told me that customers barely differentiate between you and the top three competitors on those dimensions. What if the policy was about competing on *different* dimensions entirely?"
- "These actions are all about closing gaps with competitors. What if instead of closing gaps, you deliberately opened new ones — in a direction nobody else is going?"
## Posture when introducing this lens
Don't lecture about value innovation theory. Use the interview prompts above to let the user discover the pattern themselves. If they describe 5 factors where everyone converges, reflect it back: "So on the dimensions your industry competes on, you and the top competitors are roughly similar. What's that tell you about the value of investing more in those same dimensions?"
If the user engages with it, run the four actions and noncustomer prompts. If they resist — "no, we just need to execute better on what we have" — note it in `strategy-notes.md` as an open question and move on. They may be right. They may also not be ready to hear it yet. Either way, forcing it helps nobody.
If the user does reframe, this changes the interview significantly. Go back to Phase 1 briefly to explore the noncustomer landscape, then re-enter Phase 3 with a new diagnosis. The kernel will likely look very different the second time through.
FILE:references/kernel.md
# The Kernel of Good Strategy
A strategy's kernel contains three elements — **diagnosis**, **guiding policy**, and **coherent action**. Everything else (vision, mission, values, goals, OKRs) is scaffolding around this or, more often, a distraction from it.
## 1. Diagnosis
**What it is:** A judgment about the nature of the situation. It names the challenge and simplifies an overwhelming reality into something the decision-maker can grip and act on.
**What makes it good:**
- Specific enough that you could imagine being wrong about it.
- Often uses a metaphor, analogy, or reference to a previously understood situation ("this is a classic commoditization problem," "we're in the position IBM was in around 1990").
- Reduces complexity by identifying the *one or two* things that matter most, not by listing everything.
- Uncomfortable. A diagnosis that makes everyone nod happily is probably not a real diagnosis — it's a restatement of the ambition.
**What makes it bad:**
- "We need to grow faster." (Goal, not diagnosis.)
- "The market is changing." (Vague. Changing *how*, and *why does that matter for us specifically*?)
- "We have execution problems." (Where? On what? Caused by what?)
- Long lists of challenges with no weighting. A good diagnosis picks.
**Interview prompts to sharpen a diagnosis:**
- "If you had to describe this situation to a smart outsider in three sentences, what would you say?"
- "What is the *one* thing that, if it were different, would change everything?"
- "What are people inside the org afraid to say out loud about this?"
- "Is there an analogy — another company, another industry, a historical moment — that this reminds you of?"
## 2. Guiding Policy
**What it is:** The overall approach chosen to cope with or overcome the obstacles identified in the diagnosis. A directional choice — a way of channeling action — not a specific plan and not a goal.
A useful test: a guiding policy **rules things out**. If it doesn't exclude any reasonable option, it isn't doing work.
**What makes it good:**
- Creates advantage by anticipating actions and reactions, reducing ambiguity, exploiting asymmetries, or creating coherent policies.
- Short. One or two sentences.
- A *choice* — there was a plausible alternative that was rejected.
- Focuses force on a pivot point where concentrated effort can break something loose.
**What makes it bad:**
- "Be the best in class." (Best at what? Via what mechanism?)
- "Focus on the customer." (Who isn't claiming this?)
- Anything a direct competitor could adopt verbatim without contradiction.
**Examples of real guiding policies:**
- "Compete on service depth in the segment the incumbents find unprofitable to serve well, and refuse work outside it."
- "Rebuild the platform around a single primary workflow before adding any new feature surface."
- "Trade near-term margin for distribution lock-in in the two geographies where the competitor is weakest."
**Interview prompts:**
- "What does this approach explicitly *not* do?"
- "If a competitor copied this word-for-word tomorrow, would it still be a good plan for us?"
- "What's the asymmetry we're exploiting? What do we have, or know, or can tolerate, that they can't?"
## 3. Coherent Action
**What it is:** The concrete, resourced, mutually reinforcing steps that carry out the guiding policy.
The operative word is **coherent**. Individual actions can be fine; a set of actions that pull in different directions is not a strategy, it's a to-do list with ambitions.
**What makes it good:**
- Each action is concrete enough to tell next quarter whether it happened.
- Actions reinforce each other: doing A makes B easier, B makes C cheaper, C protects A.
- The set has focus — usually 3 to 6 things, not 15.
- Resources (money, people, attention) are actually redirected to match. If the action list doesn't change how the budget or calendar looks, it isn't real.
**What makes it bad:**
- The "dog's dinner" — a long undifferentiated list of everything everyone wants.
- Actions that quietly contradict each other (e.g., "move upmarket" and "cut prices to win SMB deals").
- Actions with no owner, no resourcing, and no way to tell if they happened.
- Things that would have happened anyway, relabeled as strategic.
**Interview prompts:**
- "Of these actions, which one is load-bearing? If that one fails, does the rest still work?"
- "Does action B make action A easier or harder? Walk me through it."
- "What are we going to stop doing to make room for these?"
- "Who owns each of these, and what changes on their calendar on Monday?"
## The kernel as a whole
The three parts have to fit. A sharp diagnosis with a vague guiding policy is useless. A clever guiding policy with incoherent actions dies in execution. Coherent actions in service of no diagnosis is a well-run busywork factory.
When reviewing a draft kernel, ask: **does the guiding policy actually address the diagnosis, and do the actions actually carry out the guiding policy?** If you can't trace the line from action back to diagnosis, something is broken — go back and fix it before writing the document.
## Worked example: a complete kernel
**Company:** Fieldkit — a 90-person B2B SaaS company selling inspection-management software to mid-size commercial property firms. $14M ARR, growing 30% YoY until last year, when growth dropped to 11%.
**Diagnosis:** Fieldkit's growth stalled because its two largest competitors (BuildOps, FacilityIQ) launched mobile-first products while Fieldkit remained desktop-oriented. 68% of inspections happen on-site with a phone, but Fieldkit's mobile experience is a responsive web wrapper that drops offline and loses data. The company has been treating mobile as a feature request rather than the core delivery surface. Meanwhile, the sales team is compensating by moving upmarket to enterprise accounts where desktop workflows still dominate — but Fieldkit lacks SOC 2 certification, SSO, and audit trails that enterprise buyers require. The company is drifting into a segment it cannot win while abandoning the mid-market segment where its domain expertise is strongest.
**Guiding policy:** Dominate mobile-first inspection workflows for mid-market commercial property firms (50–500 properties) and stop pursuing enterprise deals until the core product is defensible. Fieldkit will not build enterprise compliance features this year. It will not try to match BuildOps on breadth of facility management. It will win on reliability and speed of the on-site inspection experience specifically.
**Coherent actions:**
1. Ship a native mobile app with full offline sync by Q3 — this is the load-bearing action; everything else depends on it.
2. Kill the enterprise sales motion: reassign the two enterprise AEs to mid-market and cancel the SOC 2 engagement ($180K saved, redirected to mobile engineering).
3. Build an integration with HappyCo and Yardi (the two property-management systems 70% of mid-market customers already use) to make Fieldkit's inspection data flow automatically into existing workflows.
4. Launch a "reliability guarantee" — if an inspection is lost due to sync failure, Fieldkit credits the account. This forces engineering accountability and becomes a sales differentiator.
**Why this holds together:** The diagnosis identifies a specific structural mistake — chasing enterprise to escape a mobile gap — not a generic "we need to grow." The guiding policy makes a painful cut (enterprise) to concentrate resources where the company actually has an edge (deep inspection-workflow knowledge in mid-market). Each action reinforces the others: the native app (1) makes the reliability guarantee (4) credible; killing enterprise (2) frees the budget and headcount for the app; the integrations (3) raise switching costs once customers adopt mobile, protecting the position the app creates. A competitor reading this strategy could not copy it without also abandoning their enterprise pipeline, which is exactly the kind of commitment that makes a strategy real.
FILE:references/output-template.md
# Output Templates
Produce two files at the end of the interview. Keep both concise — good strategy is short because it has genuinely decided what matters. A strategy document that sprawls is usually hiding unfinished thinking.
---
## `strategy-draft.md` — the shareable strategy document
Use this exact structure. Fill with the user's own words where possible; paraphrase only when the original was fluffy.
```markdown
# Strategy: [short, concrete subject — e.g., "Platform team H1 2026"]
_Draft produced via interview on [date]. Author: [user]. Status: draft for review._
## At a glance
> **Challenge:** [One sentence — the diagnosis in plain language.]
> **Approach:** [One sentence — the guiding policy.]
> **Key moves:** [Top 3 actions, comma-separated, no detail.]
## Diagnosis
[2-5 sentences. What is actually going on? What is the core challenge — the one or two things that matter most? Be specific enough to be wrong. If there's a useful analogy or metaphor, use it.]
## Guiding Policy
[1-3 sentences. The overall approach chosen to address the diagnosis. State what this approach does — and, crucially, what it rules out. This should be a directional choice, not a goal.]
**What this explicitly does not do:** [1-3 bullets naming the reasonable alternatives being declined, so the choice is visible.]
## Coherent Actions
[3-6 concrete actions that carry out the guiding policy. For each, one or two sentences. No sub-bullets — if something needs that much structure it belongs in a separate planning doc.]
1. **[Action name]** — [what it is, who owns it, what changes as a result. Note which other action(s) it reinforces.]
2. **[Action name]** — ...
3. **[Action name]** — ...
## How these actions reinforce each other
[One short paragraph tracing the coherence: how action 1 makes action 2 easier, how action 3 protects action 1, etc. If you can't write this paragraph, the actions aren't coherent — go back.]
## Decisions required
_Include this section when the strategy depends on approvals, budget reallocation, or organizational changes that someone other than the author must greenlight._
- **[Decision]** — [who needs to decide, by when, what happens if delayed]
## Required capabilities
_Include this section only when the strategic choice cascade was used during the interview._
[3-5 capabilities the organization must have to execute this strategy. For each: whether the org has it today, and what it takes to build if not. Flag any capability the strategy assumes but hasn't resourced — these are the most common silent points of failure.]
1. **[Capability]** — [status: have it / building / gap]. [One sentence on why it's load-bearing.]
2. ...
## What success looks like in [timeframe]
[3-5 observable indicators. Not metrics for their own sake — leading signals that the guiding policy is working.]
```
---
## `strategy-notes.md` — the reasoning companion (not for sharing)
The messy, honest file. For the user's own use — shared thinking, not a polished deliverable.
```markdown
# Strategy Notes — [subject]
_Companion to strategy-draft.md. Internal thinking, open questions, things that were pushed back on. Not for circulation._
## Interview state
- **Subject:** [what the strategy is for]
- **Last completed phase:** [Phase 1 / 2 / 3 / 4]
- **Confirmed diagnosis:** [yes/no — if yes, one-sentence summary]
- **Confirmed guiding policy:** [yes/no — if yes, one-sentence summary]
- **Confirmed coherent actions:** [yes/no — if yes, count]
- **Lenses used:** [landscape mapping / value innovation / choice cascade / none]
- **Next question to ask:** [the question that would resume the interview]
## What I heard in the interview
[A short narrative of the situation as the user described it. Use their phrasing where it was vivid. This is the raw material the kernel was built from.]
## How the thinking evolved
[Trace the arc from where the user started to where they ended up. What was their initial framing? Where did it shift? What question or pushback caused the biggest change in thinking? This section is often the most valuable part of the notes — it captures the reasoning journey, not just the destination.]
## Bad-strategy patterns caught during the interview
[For each: what the pattern was, how it showed up, how it was resolved or whether it's still unresolved. Be honest — if the user resisted a pushback and you let it go, say so here.]
- **[Pattern, e.g., "Mistaking goal for strategy"]**: [what was said, how it was redirected, final state.]
- ...
## Assumptions this strategy depends on
[Every strategy rests on claims about the world that could be wrong. List the load-bearing ones so they can be tested. Flag assumptions the user didn't state but the strategy implicitly requires.]
- ...
## Open questions
[Things the user couldn't answer yet, or that deserve follow-up before the draft becomes real. Phrase each as a question.]
- ...
## Alternatives considered and rejected
[Capture the paths the guiding policy rules out, so future-them can see why — and revisit if circumstances change.]
- ...
## Landscape analysis
_Include this section only when landscape mapping was used during the interview._
[Summary of the value chain as understood: which components the user controls, which they depend on, and where key components sit on the evolution curve. Note any structural findings — commodity components being built custom, genesis-stage problems treated as procurement decisions, competitors further along the evolution curve, impending disruptions. These findings should already be reflected in the diagnosis; this section preserves the reasoning behind them.]
## Cascade pressure-test
_Include this section only when the strategic choice cascade was used during the interview._
[Findings from the five cascade choices — where to play, how to win, capabilities, management systems. Focus on gaps: which choices were explicit and strong, which were implicit or missing. Especially note capability gaps (things the strategy assumes the org can do but hasn't verified) and management system gaps (nothing in the org's operating rhythm that would keep this strategy alive).]
## Value innovation findings
_Include this section only when the value innovation lens was used during the interview._
[What the competitive convergence analysis revealed: the factors everyone competes on, where offerings converge, the four-actions assessment (eliminate/reduce/raise/create), and any noncustomer tiers explored. Note whether the user accepted the reframe or resisted it, and whether the diagnosis was reshaped as a result.]
## What I'd sharpen next
[Candid take on the one or two parts of the kernel that still feel weakest, and what evidence or thinking would strengthen them. Don't skip this to be polite — it's the most useful paragraph in the file.]
```
---
## Notes on producing the files
- Write both files in the user's current working directory unless they've specified another location.
- Use the user's language where possible; don't over-polish into management-speak.
- If the kernel is genuinely incomplete — e.g., the diagnosis is still fuzzy — **do not write `strategy-draft.md` unless the user explicitly requests a provisional draft.** Instead, capture the current state in `strategy-notes.md` only. If the user does request a provisional draft, prefix the title with `[PROVISIONAL]` and mark each unconfirmed kernel element: `[UNCONFIRMED — diagnosis still under development, see notes]`. This prevents incomplete thinking from masquerading as a finished strategy. An honest `strategy-notes.md` is better than a `strategy-draft.md` with fake confidence.
- The "At a glance" section is for upward communication — a busy stakeholder should be able to read just this block and understand the strategy. Write it last, after the full document is done.
- **Composing from durable state**: When `.beagle/strategy/<subject-slug>/` files exist (`state.md`, `evidence.md`, `lens-notes.md`, `composition.md`), compose both `strategy-draft.md` and `strategy-notes.md` from those artifacts rather than reconstructing from the chat transcript. For `strategy-notes.md`, the "Interview state" section maps from `state.md`, "What I heard" and "Assumptions" draw from `evidence.md`, lens-specific sections pull from `lens-notes.md`, and the overall structure follows `composition.md`. For `strategy-draft.md`, the confirmed kernel, exclusions, and success signals come from `composition.md`, with supporting evidence from `evidence.md`. This prevents context-decay drift in long interviews.
- After writing, give a short chat summary: diagnosis in one sentence, guiding policy in one sentence, top open question. Then stop.
FILE:references/playing-to-win.md
# The Strategic Choice Cascade
A complementary lens to the kernel. Where the kernel excels at diagnosing a challenge and building a coherent response, the cascade adds structure the kernel underserves: explicitly defining the playing field, the basis of competitive advantage, and the organizational machinery required to execute.
Use this lens in Phase 3 when the strategy involves competitive positioning — choosing where and how to compete. It maps onto and extends the kernel:
- **Winning aspiration** sharpens what the diagnosis is *in service of*.
- **Where to play** and **how to win** together sharpen the guiding policy — they force specificity about the field and the advantage.
- **Capabilities** and **management systems** pressure-test whether the coherent actions are actually feasible.
## When to use this lens vs. the kernel alone
**Use the cascade** when the strategy involves a competitive market: a business choosing segments, a product competing for users, a team positioning itself inside a large org.
**Skip it** when the strategy is purely internal (reorg, process improvement), personal (career pivot), or existential (should we exist at all). The kernel handles these well on its own. Don't force "where to play" onto a problem that has no playing field.
---
## 1. Winning Aspiration
**What it is:** A concrete picture of what winning looks like — not a vision statement, not a revenue target. The state of the world when this strategy has succeeded.
**What makes it good:**
- Specific enough that two people in the org would agree on whether they'd arrived.
- Describes the *impact*, not the effort ("the default tool inspectors reach for" vs. "a market-leading platform").
- Time-bounded or at least stage-bounded.
**What makes it bad:**
- "Be the best." (Best at what? For whom? Measured how?)
- "Maximize shareholder value." (A result, not a picture.)
- Anything that could be pasted into a competitor's deck unchanged.
**Interview prompts:**
- "Forget the mission statement for a second. If this strategy works perfectly, what does the world actually look like in two years? Paint me the picture."
- "Who specifically notices that you've won? What changes for them?"
- "If I visited your company after you'd won, what would I see that's different from today?"
## 2. Where to Play
**What it is:** The choice of playing field — which customers, segments, geographies, channels, categories, or stages of the value chain. This is a *decision*, not a description of where you currently operate.
**What makes it good:**
- Names specific segments *and* names what's excluded.
- Grounded in something the org actually knows about those segments — not aspirational adjacency.
- The chosen field is small enough to dominate, not just participate in.
**What makes it bad:**
- "We serve everyone." (No playing-field choice was made.)
- Segments defined by demographics alone with no behavioral or needs-based logic.
- An attractive market the org has no real reason to win in.
**How it extends the kernel:** "Where to play" forces the guiding policy to be geographically or segment-specific. A guiding policy of "compete on service depth" becomes sharper as "compete on service depth *in the mid-market commercial property segment*." If the user's guiding policy works equally well for every possible customer, it's missing a "where to play" choice.
**Interview prompts:**
- "Who is your ideal customer — not the broadest definition, but the narrowest one where you'd crush it?"
- "What segments are you explicitly *not* going after, and what would change your mind?"
- "If you had to cut your addressable market in half tomorrow, which half do you keep?"
## 3. How to Win
**What it is:** The specific competitive advantage on the chosen playing field. Not "be great" — the mechanism by which you win against the alternatives the customer actually considers.
**What makes it good:**
- Names the advantage *and* why competitors can't easily copy it.
- Rooted in a real asymmetry — cost structure, proprietary data, switching costs, network effects, domain expertise, regulatory position.
- Testable: you could ask a customer "why us over them?" and hear this answer back.
**What makes it bad:**
- "Superior product." (Everyone claims this. What specifically, and why can't they match it?)
- "Better team." (A temporary fact, not a structural advantage.)
- An advantage that requires the competitor to be stupid or asleep.
**How it extends the kernel:** "How to win" is the mechanism inside the guiding policy. The kernel asks for a guiding policy that creates advantage; this forces you to name *what kind* of advantage and *why it holds*. If the user can't articulate the structural reason they win, the guiding policy is aspirational.
**Interview prompts:**
- "If your top competitor hired your best people tomorrow, could they replicate your advantage? If yes, it's not structural."
- "What do you do that's genuinely hard for someone else to copy — not hard to imagine, but hard to *build*?"
- "When a customer picks you over the alternative, what's the real reason? Not the marketing reason — the honest one."
## 4. Capabilities
**What it is:** The 3-5 reinforcing capabilities the organization must have to execute the "where to play" and "how to win" choices. These aren't aspirational — they're the things that must be true about the org.
**What makes it good:**
- Small set (3-5), not a capability wishlist.
- Each capability is specific and observable — you can tell whether the org has it or not.
- The capabilities reinforce each other: having capability A makes capability B stronger.
- At least one capability is hard to build from scratch, creating a barrier.
**What makes it bad:**
- "Innovation." (What kind? Applied where? Measured how?)
- A list of 12 capabilities with no prioritization.
- Capabilities that any well-run company would need — they don't differentiate.
**How it extends the kernel:** Capabilities are the feasibility check on coherent actions. If an action requires a capability the org doesn't have and can't build in the strategy's timeframe, the action is fiction. This is where most strategies quietly break — the actions sound good but assume capabilities that don't exist.
**Interview prompts:**
- "What are the three things your org must be *genuinely good at* for this strategy to work? Not nice-to-haves — load-bearing capabilities."
- "Of those, which ones do you already have, and which ones are you hoping will materialize?"
- "If I talked to someone on your team, would they say the org actually has this capability today — or is it something leadership believes but the front line doesn't experience?"
## 5. Management Systems
**What it is:** The processes, structures, metrics, and feedback loops that keep the strategy on track. The infrastructure of execution — how the org actually operationalizes the choices above.
**What makes it good:**
- Directly tied to the strategy — not generic "good management" practices.
- Includes both measurement systems (how you know it's working) and reinforcement systems (how you prevent drift).
- Addresses the most likely failure modes for *this specific* strategy.
**What makes it bad:**
- "Regular check-ins and dashboards." (Every company says this. What specifically are you measuring, and what changes when the number moves?)
- Systems borrowed from a different strategy that don't fit the current one.
- No system at all — the most common case. The strategy exists in a deck; nothing in the org's operating rhythm references it.
**How it extends the kernel:** The kernel stops at coherent actions. Management systems answer: "And then what? How do you know the actions happened? How do you course-correct?" Most strategies die here — not because the thinking was bad, but because nothing in the organization's daily life changed.
**Interview prompts:**
- "Six months from now, how will you know whether this strategy is working — before the revenue numbers tell you?"
- "What meeting, review, or ritual will force you to look at whether the strategy is on track? If it doesn't exist yet, when will you create it?"
- "What's the most likely way this strategy dies quietly — not with a dramatic failure, but by slow drift back to the old way? What would prevent that?"
---
## Reverse engineering: using the cascade on an existing strategy
When the user brings a strategy that already exists, use the cascade to find gaps. Walk through it top-down:
1. **State the winning aspiration implied by their strategy.** Often it's never been made explicit. Ask: "What does this strategy look like when it's won? Can you describe it concretely?"
2. **Identify the where-to-play choice.** If the strategy applies equally to all customers and segments, flag it: "I notice this doesn't choose a playing field — it reads like it's trying to serve everyone. Is that intentional?"
3. **Name the how-to-win mechanism.** If competitive advantage is implied but not stated, surface it: "What's the structural reason this works for you and not for [obvious competitor]?"
4. **Check capabilities against actions.** For each major action, ask: "Do you currently have the capability to do this, or does the strategy assume you'll build it?" List the assumed capabilities. Most strategies have 2-3 capabilities they assume but haven't resourced.
5. **Look for management systems.** This is almost always the biggest gap. Ask: "What changes in how the org operates day-to-day once this strategy is adopted?" If the answer is "nothing" — flag it clearly in the notes.
The typical pattern: strategies are strongest at the top of the cascade (aspiration, playing field) and weakest at the bottom (capabilities, management systems). The cascade's main diagnostic value is making this gap visible.
---
## Integrating cascade findings into the kernel output
Do not produce a separate cascade document. Instead, fold insights into the kernel:
- **Where to play** findings sharpen the guiding policy's specificity.
- **How to win** findings become the mechanism statement within the guiding policy.
- **Capability gaps** surface as assumptions in `strategy-notes.md` and, if severe, as caveats in the draft: "[DRAFT — assumes capability X exists; needs verification]."
- **Management systems gaps** become items in the "open questions" section of the notes.
In the strategy notes, add a section: "Cascade pressure-test" that captures any findings from the five choices — especially where the strategy was strong vs. where it had gaps.
FILE:references/pressure-tests.md
# Pressure-test scenarios
Expected behaviors for the strategy-interview skill. Use these to validate that the skill handles common entry points correctly.
| Scenario | Expected behavior |
|----------|-------------------|
| User says "write a strategy to grow 30%" | Enter discovery, do not produce a draft — "grow 30%" is a goal, not a strategy |
| User provides an existing strategy doc | Start with bad-strategy filter before discovery |
| User stops mid-interview (short interview, no `.beagle` state) | Write `strategy-notes.md` with resume state only, no `strategy-draft.md` |
| User describes a feature arms race | Deploy value innovation prompts from blue-ocean lens |
| User asks for career strategy | Skip Wardley/cascade unless conversation warrants them |
| User says "just critique this, don't rewrite it" | Use critique variant, respect chat-only if requested |
| Long interview with 15+ substantive exchanges | Create/update `.beagle/strategy/<subject-slug>/state.md`; compose final documents from artifacts, not raw transcript |
| Conflicting stakeholder views surface | Capture both sides in `evidence.md` with `contested` tags; surface the disagreement in notes rather than averaging it away |
| User stops mid-interview (`.beagle` state exists) | Update `.beagle/strategy/<subject-slug>/state.md` with resume state; optionally produce `strategy-notes.md`; no `strategy-draft.md` |
| Phase 4 after long interview | Write `composition.md` first; compose `strategy-draft.md` and `strategy-notes.md` from `.beagle` artifacts, not chat memory |
| Multiple possible strategy subjects emerge | Scope to one, park the rest in `state.md` open questions or `evidence.md` decisions |
FILE:references/wardley-mapping.md
# Landscape Mapping: Seeing the Terrain Before Choosing the Route
Use this reference when the user's situation involves **competitive positioning, technology choices, build-vs-buy decisions, or understanding where an industry is heading**. It adds a layer of situational awareness to Phase 1 (Discovery) that sharpens the diagnosis.
**When to skip this entirely:** Personal strategies (career pivots, side projects), team-level process improvements, or any situation where the user is the only actor and there is no competitive landscape to map. If nobody is competing for the same users or resources, landscape mapping adds weight without insight.
## Core concepts (for the interviewer's mental model, not for lecturing)
### Value chain
Every user need is served by a chain of components. The user sees the top; underneath are layers of capabilities, technologies, and activities that make it work.
**Example:** A customer books a hotel room (visible need). Underneath: search interface, availability engine, payment processing, room inventory database, channel manager, property management system.
The strategic question is always: **which components in the chain do you control, which do you depend on, and which are changing?**
### Evolution stages
Every component moves through a lifecycle, left to right:
| Stage | Character | Feels like | Example |
|---|---|---|---|
| **Genesis** | Novel, uncertain, requires exploration | Research project | First LLM agents (2023) |
| **Custom-built** | Understood but bespoke, no standard approach | Internal tooling | Most company data pipelines today |
| **Product** | Standardized, multiple providers, feature competition | Buying software | CRM systems, CI/CD platforms |
| **Commodity/Utility** | Invisible, pay-per-use, interchangeable | Turning on a tap | Cloud compute, email delivery, payment processing |
Components only move right. They never move back. The speed varies, but the direction doesn't.
**Why this matters for diagnosis:** If someone is building custom what has already become product, they're burning resources for no advantage. If they're treating a genesis-stage component as commodity (buying an off-the-shelf solution for something nobody has figured out yet), they'll get a mediocre version of something that requires invention.
### Movement patterns
These are the forces that move components rightward:
- **Competition** drives evolution. More suppliers, more users, more standardization.
- **Componentization** — yesterday's product becomes tomorrow's building block (AWS turned servers into API calls; that enabled a generation of startups that couldn't have existed before).
- **Co-evolution** — when an underlying component commoditizes, practices built on top of it change. Cheap compute didn't just make the same things cheaper; it made entirely new architectures possible (microservices, serverless, ML training at scale).
### Value capture dynamics
Knowing where components sit on the evolution curve is half the picture. The other half is understanding who captures the economic value at each layer. A company can control every component in its chain and still make no money if value concentrates at a layer controlled by someone else.
**The pattern:** As components commoditize, margin migrates. It moves away from the commodity layer and toward adjacent layers that are still differentiated. If you're operating in the commodity layer and your supplier operates in the product layer, they capture the margin — you compete on price while they compete on features.
**Why this matters for diagnosis:** Many strategies fail not because the company chose the wrong market but because they're positioned at the wrong layer of value capture. They built the best version of something that was becoming interchangeable, while the real margin moved upstream or downstream.
**Interview probes:**
- "Where in your value chain does the margin actually sit today? Who captures it — you, your suppliers, or your customers?"
- "If your layer commoditizes further, where does the value migrate? Who's positioned to capture it?"
- "Is your competitive advantage at the layer where value is currently captured, or a different one?"
- "When you look at how your industry's economics have shifted over the past five years, which layer gained pricing power and which layer lost it?"
## Interview application
### Extracting the value chain through conversation
Don't ask the user to "draw a map." Instead, ask questions that surface the chain and where things sit on the evolution axis.
**Start from the user need and work down:**
- "Who is the end user, and what are they actually trying to accomplish when they interact with you?"
- "What has to happen behind the scenes for that to work? Walk me through the chain — what depends on what?"
- "Which of those pieces do you build yourself, and which do you buy or rent?"
**Identify evolution stage for each important component:**
- "Is this something you had to figure out from scratch, or did you follow a known playbook?"
- "How many other companies solve this the same way you do? Could you swap in a vendor tomorrow, or is this deeply custom?"
- "If you stopped maintaining this, could you buy a replacement off the shelf? How close is that replacement to what you need?"
- "Is this something that felt novel three years ago but now feels like table stakes?"
**Find the tension points:**
- "Where are you building custom when you suspect you could be buying?"
- "Where are you buying something generic when you think your situation actually requires something purpose-built?"
- "Which piece of this chain keeps you up at night — the one where if it breaks, everything breaks?"
### Using the map to sharpen the diagnosis
Once you have a rough picture of the value chain and where components sit, use it to pressure-test the user's framing. These are the most common patterns that landscape mapping reveals:
#### Building custom what's becoming commodity
**What it sounds like:** The user describes heavy investment in a capability that multiple vendors now offer as a product or service.
**Interview probe:** "You mentioned you have a team of four maintaining your internal [X]. Three vendors now sell this as a service. What does your version do that theirs doesn't — and is that difference actually strategic, or is it historical accident?"
**Why it matters for diagnosis:** Resources locked into commodity work can't be deployed where the real differentiation lives. This is one of the most common sources of strategic incoherence — the org *says* its advantage is in area A but *spends* its engineering budget maintaining area B.
#### Treating genesis as commodity
**What it sounds like:** The user plans to "just buy" or "quickly implement" something that nobody in their industry has figured out yet.
**Interview probe:** "You're describing this as a procurement decision, but I'm not hearing about anyone who's solved it well yet. Is this actually a problem someone has productized, or are you going to end up building it anyway after the vendor disappoints?"
**Why it matters for diagnosis:** Buying something that doesn't exist yet as a reliable product leads to painful vendor lock-in with an immature solution, or an expensive rewrite when the off-the-shelf version doesn't fit.
#### Competitor is further along the evolution curve
**What it sounds like:** A competitor has already industrialized a component the user is still building by hand.
**Interview probe:** "Your competitor launched [X] as a self-serve product last quarter. You're still doing it with a manual process and three engineers. What does that gap cost you in speed, and how long before it costs you customers?"
**Why it matters for diagnosis:** If a competitor has already commoditized a component you're still building custom, matching them by building your own version is the wrong move — you'll always be behind. The strategic question becomes: can you leapfrog by adopting their commodity version (or a third-party equivalent) and competing on a different layer of the stack?
#### Inertia masking a shift
**What it sounds like:** The user describes their competitive advantage in terms of a component that is visibly commoditizing but they haven't acknowledged the shift.
**Interview probe:** "You described [X] as your core differentiator. Two years ago I'd agree — but now [competitor A] and [competitor B] offer something similar as a feature. If [X] becomes table stakes in the next 18 months, where does your advantage actually live?"
**Why it matters for diagnosis:** Inertia — emotional attachment to past investments, organizational identity built around a capability — is the most common reason strategies fail to adapt. The landscape moved; the strategy didn't.
#### A component is about to be disrupted
**What it sounds like:** A layer of the user's value chain is being commoditized by a new entrant or technology shift, and the user hasn't factored this into their planning.
**Interview probe:** "If [new technology / new entrant] makes [component] essentially free or trivially easy in the next two years, what happens to your business model? Which assumptions break?"
**Why it matters for diagnosis:** The most dangerous disruptions aren't the ones that destroy your product — they're the ones that destroy the *economics* of a component you depend on, changing who captures value in the chain.
## What good landscape-informed diagnosis sounds like
After working through the value chain, the diagnosis should be able to say something like:
- "We're spending 40% of engineering on infrastructure that has become commodity — freeing that up and redeploying it to [genesis-stage capability] is the structural move."
- "Our competitor commoditized [X] two years ahead of us. Competing on [X] is a losing game. The opening is in [Y], which they've neglected because they're organized around [X]."
- "The entire middle layer of our stack is about to be disrupted by [technology shift]. Our strategy needs to assume that layer becomes free and figure out where we create value above or below it."
- "We're treating [component] as a buy decision, but nobody sells what we actually need. This is a genesis problem disguised as a procurement problem, and it needs R&D investment, not a vendor eval."
These are concrete, falsifiable, and they lead directly to a guiding policy. That's the test.
## Posture
Use this framework to ask better questions during discovery, not to deliver a mapping tutorial. The user should never feel like they're being walked through a methodology — they should feel like you're asking unusually sharp questions about where their market is heading and where their resources are actually going.
If the user starts using evolution language on their own ("that's becoming commodity," "we're still in the early stages of figuring this out"), run with it. If they don't, that's fine — the insight shows up in the diagnosis, not in the vocabulary.
Turn ideas into comprehensive project specs through collaborative dialogue. Use BEFORE any planning or implementation — for new projects, features, or signif...
---
name: spec-brainstorm
description: "Turn ideas into comprehensive project specs through collaborative dialogue. Use BEFORE any planning or implementation — for new projects, features, or significant changes. Produces a system-agnostic spec document that can feed into any agentic workflow."
---
# Brainstorm: Ideas Into Specs
Turn a fuzzy idea into a comprehensive, implementation-free project spec through collaborative dialogue.
The output is a standalone spec document — structured enough for any agentic system to consume, clear enough for a human to act on. It captures WHAT and WHY, never HOW.
<hard_gate>
Do NOT write any code, create implementation plans, scaffold projects, or take any implementation action. This skill produces a SPEC DOCUMENT only. Every project goes through this process regardless of perceived simplicity — "simple" projects are where unexamined assumptions waste the most work.
</hard_gate>
## Workflow
Complete these steps in order:
1. **Explore context** — read project files, docs, git history, existing specs
2. **Assess scope** — is this one spec or does it need decomposition?
3. **Ask clarifying questions** — one at a time, follow the thread
4. **Propose 2-3 directions** — high-level product approaches with tradeoffs
5. **Draft spec** — write the structured spec document
6. **Self-review** — check for completeness, contradictions, implementation leakage (see `references/spec-reviewer.md`)
7. **User review** — present for approval, iterate if needed
8. **Write to disk** — save to `docs/specs/YYYY-MM-DD-<topic>.md`
```
Explore context → Assess scope ──→ Too large? → Decompose into sub-projects
→ Brainstorm first sub-project
→ Right size? → Clarifying questions
→ Propose directions
→ Draft spec
→ Self-review (fix inline)
→ User review ──→ Changes? → Revise
→ Approved? → Write to disk
```
**The terminal state is a written spec.** This skill does not transition to implementation, planning, or any other skill. The user decides what to do with the spec.
## Questioning
You are a thinking partner, not an interviewer. The user has a fuzzy idea — your job is to help them sharpen it.
**How to question:**
- **Start open.** Let them dump their mental model. Don't interrupt with structure.
- **Follow energy.** Whatever they emphasized, dig into that. What excited them? What problem sparked this?
- **Challenge vagueness.** Never accept fuzzy answers. "Good" means what? "Users" means who? "Simple" means how?
- **Make the abstract concrete.** "Walk me through using this." "What does that actually look like?"
- **Clarify ambiguity.** "When you say Z, do you mean A or B?"
- **Know when to stop.** When you understand what, why, who, and what done looks like — offer to proceed.
**Question mechanics:**
- One question per message. If a topic needs more, break it into multiple messages.
- Prefer multiple choice when possible — easier to react to concrete options than open-ended prompts.
- When the user selects "other" or wants to explain freely, switch to plain text. Don't force them back into structured choices.
- 2-4 options is ideal. Never use generic categories ("Technical", "Business", "Other").
**What to ask about:**
| Ask about | Examples |
|-----------|----------|
| Motivation | "What prompted this?" "What are you doing today that this replaces?" |
| Concreteness | "Walk me through using this" "Give me an example" |
| Clarification | "When you say X, do you mean A or B?" |
| Success | "How will you know this is working?" "What does done look like?" |
| Boundaries | "What is this explicitly NOT?" |
**What NOT to ask about:**
- Technical implementation details (that's for planning)
- Architecture patterns (that's for planning)
- User's technical skill level (irrelevant — the system builds)
- Success metrics (inferred from the work)
- Canned questions regardless of context ("What's your core value?", "Who are your stakeholders?")
**Background checklist** (check mentally, not out loud):
- [ ] What they're building (concrete enough to explain to a stranger)
- [ ] Why it needs to exist (the problem or desire driving it)
- [ ] Who it's for (even if just themselves)
- [ ] What "done" looks like (observable outcomes)
When all four are clear, offer to proceed. If the user wants to keep exploring, keep going.
## Scope Assessment
Before diving into questions, assess whether the idea is one project or several.
**Signs it needs decomposition:**
- Multiple independent subsystems ("build a platform with chat, file storage, billing, and analytics")
- No clear ordering dependency between parts
- Would take multiple months of work
**When decomposition is needed:**
1. Help the user identify the independent pieces and their relationships
2. Establish what order they should be built
3. Brainstorm the first sub-project through the normal flow
4. Each sub-project gets its own spec
**For right-sized projects**, proceed directly to clarifying questions.
## Exploring Directions
After understanding the idea, propose 2-3 high-level directions. These are product directions, not technical architectures.
**Good directions:**
- "A CLI tool that operates on single files vs. a daemon that watches directories"
- "A focused MVP with just the core loop vs. a broader first version with supporting features"
- "Optimized for speed of use (power users) vs. optimized for discoverability (new users)"
**Bad directions (implementation leaking in):**
- "React with a REST API vs. HTMX with server-side rendering"
- "PostgreSQL vs. SQLite for storage"
- "Monorepo vs. polyrepo"
Lead with your recommendation and explain why. Present tradeoffs conversationally.
## Scope Discipline
Brainstorming naturally generates ideas beyond the current scope. Handle this gracefully:
**When the user expands scope mid-brainstorm:**
> "That's a great idea but it's its own project/phase. I'll capture it in Future Considerations so it's not lost. For now, let's focus on [current scope]."
**The heuristic:** Does this clarify what we're building, or does it add a new capability that could stand on its own?
Capture deferred ideas in the spec's "Future Considerations" section. Don't lose them, don't act on them.
## Implementation Leakage
The spec must never prescribe implementation. This is the hardest discipline.
| Allowed (WHAT) | Not allowed (HOW) |
|-----------------|-------------------|
| "Users can filter results by date and category" | "Add a /api/filter endpoint that accepts query params" |
| "Must support 10k concurrent users" | "Use Redis for session caching" |
| "Data must persist across sessions" | "Store in PostgreSQL with a users table" |
| "Must work offline" | "Use a service worker with IndexedDB" |
| "Search must feel instant" | "Use Elasticsearch with debounced queries" |
**Exception — constraints:** When the user has genuine constraints ("must use PostgreSQL because that's what our infra runs"), those go in the Constraints section with rationale. A constraint is a boundary condition, not a design choice made during brainstorming.
## Spec Format
Use the template in `references/spec-template.md`. The spec has these sections:
1. **Core Value** — ONE sentence, the most important thing
2. **Problem Statement** — what problem, who has it, why now
3. **Requirements** — must have, should have, out of scope (with reasons)
4. **Constraints** — hard limits with rationale
5. **Key Decisions** — decisions made during brainstorming with alternatives considered
6. **Reference Points** — "I want it like X" moments, external docs, inspiration
7. **Open Questions** — unresolved items needing future research
8. **Future Considerations** — ideas that emerged but belong in later phases
Requirements must be concrete and testable:
| Good requirement | Bad requirement |
|-----------------|-----------------|
| "User can undo the last 10 actions" | "Good undo support" |
| "Page loads in under 2 seconds on 3G" | "Fast performance" |
| "Works with screen readers" | "Accessible" |
| "Export to CSV and JSON" | "Multiple export formats" |
## Self-Review
After drafting the spec, review it for:
1. **Placeholders** — any TBD, TODO, vague requirements? Fix them.
2. **Contradictions** — do any sections conflict? Resolve them.
3. **Implementation leakage** — does any requirement prescribe HOW? Rewrite as WHAT.
4. **Untestable requirements** — could someone verify this was met? Make it concrete.
5. **Missing rationale** — do constraints and out-of-scope items explain WHY? Add reasons.
6. **Scope** — is this focused enough for a single planning cycle?
Fix issues inline. Then present to the user for review.
See `references/spec-reviewer.md` for the detailed review checklist.
## Writing the Spec
- Save to `docs/specs/YYYY-MM-DD-<topic>.md` (user preferences override this path)
- Commit to git with message: `docs: add <topic> project spec`
- After writing, tell the user:
> "Spec written to `<path>`. Review it and let me know if you want changes."
- Wait for approval before considering the brainstorm complete.
## Key Principles
- **One question at a time** — don't overwhelm
- **Follow the thread** — don't walk a checklist
- **YAGNI ruthlessly** — remove anything that isn't clearly needed
- **Concrete decisions only** — "card-based layout" not "modern and clean"
- **No implementation** — WHAT and WHY, never HOW
- **Capture everything** — ideas outside scope go to Future Considerations, never lost
- **Incremental validation** — confirm understanding before moving on
- **The spec stands alone** — anyone should be able to read it and understand the project
FILE:references/spec-reviewer.md
# Spec Self-Review Checklist
Run this review after drafting the spec. Fix issues inline — don't flag them and move on.
## Review Dimensions
### 1. Completeness
| Check | What to look for |
|-------|-----------------|
| No placeholders | TBD, TODO, "to be determined", empty sections, ellipsis as content |
| Core Value exists | One sentence that resolves prioritization conflicts |
| Problem is concrete | Specific problem, specific people, specific pain — not abstract |
| Requirements are testable | Every requirement can be verified by observation |
| Out of Scope has reasons | Every exclusion explains WHY, not just WHAT |
| Constraints have rationale | Every constraint explains WHY it's a hard limit |
### 2. Consistency
| Check | What to look for |
|-------|-----------------|
| No contradictions | Requirements don't conflict with each other or with constraints |
| Scope alignment | Must-have requirements match the problem statement |
| Decision coherence | Key decisions don't undermine each other |
| Out of Scope respected | Nothing in requirements contradicts an explicit exclusion |
### 3. Implementation Leakage
**This is the most common failure mode.** Scan every requirement for:
| Leaked | Clean |
|--------|-------|
| "Create a REST API endpoint" | "Service exposes data to third-party integrations" |
| "Use WebSocket for real-time" | "Updates appear within 1 second without page refresh" |
| "Store in a relational database" | "Data persists across sessions and survives restarts" |
| "Build a React component" | "User sees a filterable list of results" |
| "Add a cron job" | "Report is generated daily and available by 9am" |
**Exception:** Constraints section may contain implementation-specific limits ("Must use PostgreSQL") when these are genuine external constraints with rationale.
**The test:** Could this requirement be satisfied by two completely different technical approaches? If yes, it's clean. If it implies exactly one approach, it's leaked.
### 4. Testability
Every requirement should pass the "how would you verify this?" test:
- **Good:** "User can undo the last action" — verify: perform action, press undo, confirm reversal
- **Bad:** "Intuitive undo support" — verify: ???
Rewrite any requirement where "verify" isn't obvious.
### 5. Atomicity
Each requirement should be one thing:
- **Bad:** "Users can search, filter, and sort results"
- **Good:** Three separate requirements — search, filter, sort
Compound requirements hide scope and make prioritization impossible.
### 6. Scope
Is this focused enough for a single planning cycle?
- More than 15 must-have requirements? Probably needs decomposition.
- Requirements spanning multiple independent subsystems? Decompose.
- Could you explain the core loop in 30 seconds? If not, it's too broad.
## Calibration
**Only fix issues that would cause real problems downstream.**
A downstream planning system acting on this spec should be able to:
- Understand what to build without asking the user again
- Decompose requirements into tasks
- Know what's in scope and what's not
- Understand the constraints they're working within
Minor wording preferences, stylistic consistency, and "sections that could be more detailed" are not issues. Ambiguity that could lead someone to build the wrong thing IS an issue.
**Approve the spec unless there are serious gaps.** Then present to the user for review.
FILE:references/spec-template.md
# Spec Document Template
Use this template when writing the final spec document. Save to `docs/specs/YYYY-MM-DD-<topic>.md`.
## Template
```markdown
# [Project Name]
**Created:** [YYYY-MM-DD]
**Status:** Ready for planning
## Core Value
[ONE sentence — the single most important thing this project delivers. If everything else fails, this must work. Drives prioritization when tradeoffs arise.]
## Problem Statement
[What problem exists, who has it, and why it matters now. 2-4 sentences. Include what people do today without this (the status quo) and why that's insufficient.]
## Requirements
### Must Have
[Concrete, testable, user-centric. These define v1 — the project isn't done without them.]
- [Requirement — observable, verifiable outcome]
- [Requirement — observable, verifiable outcome]
### Should Have
[Important but not blocking v1. Build these if time allows, defer if not.]
- [Requirement — observable, verifiable outcome]
### Out of Scope
[Explicit exclusions with reasoning. Prevents scope creep and re-litigation.]
- [Exclusion] — [why not]
- [Exclusion] — [why not]
## Constraints
[Hard limits on the solution space. Each must include WHY — constraints without rationale get questioned.]
- **[Type]:** [What] — [Why]
- **[Type]:** [What] — [Why]
Common types: Tech stack, Timeline, Compatibility, Performance, Security, Regulatory, Team
## Key Decisions
[Significant choices made during brainstorming. Captures the reasoning so downstream work doesn't relitigate.]
### [Decision Area]
- **Decision:** [What was decided]
- **Alternatives considered:** [What else was on the table]
- **Rationale:** [Why this choice]
### [Decision Area]
- **Decision:** [What was decided]
- **Alternatives considered:** [What else was on the table]
- **Rationale:** [Why this choice]
## Reference Points
[Inspiration, external docs, "I want it like X" moments, specific behaviors or patterns the user referenced. Not implementation instructions — direction and taste.]
[If none: "No specific references — open to standard approaches."]
## Open Questions
[Unresolved items that need research or future discussion before or during implementation. Flag what's unknown so downstream systems can investigate.]
- [Question — what needs to be figured out and why it matters]
[If none: "No open questions — spec is self-contained."]
## Future Considerations
[Ideas that emerged during brainstorming but belong in later phases or separate projects. Captured so they're not lost, explicitly deferred.]
[If none: "None — brainstorming stayed within scope."]
```
## Guidelines
**Core Value:**
- Rarely more than one sentence
- Not a tagline — a prioritization tool
- Should resolve ambiguity when two requirements compete
**Requirements — quality checklist:**
- Can someone verify this was met? (testable)
- Does it describe what the user experiences? (user-centric)
- Is it one thing, not a bundle? (atomic)
- Could it be interpreted two ways? If so, pick one.
**Requirements — what NOT to write:**
- Implementation tasks disguised as requirements ("Create a REST API")
- Vague qualities ("Good performance", "Modern UI")
- Compound requirements ("Users can search AND filter AND sort")
- Technical jargon the user didn't use
**Constraints vs Key Decisions:**
- Constraint: imposed externally, limits the solution space ("Must run on iOS 16+")
- Key Decision: chosen during brainstorming, shapes the direction ("Mobile-first, desktop secondary")
**Key Decisions — when to record:**
- When 2+ viable options existed and one was chosen
- When the user expressed a strong preference
- When a direction was explicitly rejected (alternatives section)
- When the user said "you decide" — record it as "Claude's discretion" with the area
**Reference Points — good content:**
- "I want the search to feel like Spotlight — instant, forgiving of typos"
- "The onboarding should be like Linear's — minimal, progressive, no tutorial walls"
- External docs the user wants to be followed: `path/to/spec.md` with what it covers
**Future Considerations — the redirect:**
When an idea is captured here during brainstorming, it was explicitly deferred. Note what it is and why it's deferred (too complex for v1, depends on other work, nice-to-have, etc.).
## Examples
### Example 1: Developer Tool
```markdown
# Logwatch — Spec
**Created:** 2026-03-26
**Status:** Ready for planning
## Core Value
Developers see exactly what happened in production without leaving their terminal.
## Problem Statement
When a production incident occurs, developers switch between Grafana, CloudWatch, and Slack to piece together what happened. This context-switching wastes 10-15 minutes per incident and often means important log lines are missed. Logwatch brings structured log tailing and search into the terminal where developers already work.
## Requirements
### Must Have
- User can tail logs from multiple services simultaneously in a split-pane view
- User can search logs by time range, service name, and log level
- User can bookmark a log line and return to it later in the same session
- Log output is syntax-highlighted by log level (error=red, warn=yellow, info=default)
- User can filter to a single service from the multi-service view without losing scroll position
- Connection drops are detected within 5 seconds and auto-reconnected
### Should Have
- User can save a search query and replay it in a future session
- User can export a time range of logs to a file
- User can share a permalink to a specific log line with teammates
### Out of Scope
- Alerting or notification — this is a viewing tool, not a monitoring tool
- Log aggregation or storage — reads from existing infrastructure
- Dashboard creation — Grafana already handles this well
- Mobile support — terminal-first, desktop only
## Constraints
- **Compatibility:** Must work with CloudWatch Logs and Datadog — these are our two log backends
- **Performance:** Must handle 10k log lines/second without dropping frames
- **Distribution:** Single binary, no runtime dependencies — devs install via brew or curl
- **Auth:** Must support SSO via existing Okta setup — no separate credentials
## Key Decisions
### Primary interface
- **Decision:** TUI with vim-style keybindings
- **Alternatives considered:** Web UI, VS Code extension, plain CLI with flags
- **Rationale:** Developers are already in the terminal during incidents. TUI keeps them in flow. Vim bindings match muscle memory for the team.
### Multi-service display
- **Decision:** Vertical split panes, one per service, synchronized by timestamp
- **Alternatives considered:** Interleaved single stream with service prefixes, tabbed view per service
- **Rationale:** Side-by-side makes cross-service correlation visual and immediate. Interleaved gets noisy above 3 services. Tabs hide context.
## Reference Points
- "I want it to feel like `lazygit` — fast, keyboard-driven, just works"
- "The search should be like ripgrep — regex by default, fast enough to feel instant"
- Okta SSO integration spec: `docs/infra/sso-integration.md` (sections 2-3 cover token flow)
## Open Questions
- Can we get sub-second latency from CloudWatch's API, or do we need a local cache layer? Needs benchmarking.
- Datadog's log query API has rate limits — need to verify they're sufficient for real-time tailing.
## Future Considerations
- Alerting integration (pipe to PagerDuty) — separate project, depends on core being stable
- Log annotation ("this line caused the outage") — great idea, v2 feature
- Team sharing of saved queries — requires backend, save for later
```
### Example 2: Minimal Project
```markdown
# Markdown Link Checker — Spec
**Created:** 2026-03-26
**Status:** Ready for planning
## Core Value
Broken links in docs are caught before they reach readers.
## Problem Statement
Our documentation has 200+ external links. We discover broken links when users report them, sometimes weeks after the target page moved. An automated checker in CI catches these before merge.
## Requirements
### Must Have
- Scans all `.md` files in a directory recursively
- Checks HTTP status of external links (follows redirects up to 3 hops)
- Reports broken links with file path, line number, URL, and HTTP status
- Exit code 1 if any links are broken (for CI gating)
- Completes scan of 200 files with 500 links in under 60 seconds
### Should Have
- User can exclude URLs by pattern (regex allowlist)
- User can set a custom timeout per link
### Out of Scope
- Checking anchor fragments (#section-name) — parsing target HTML is too complex for v1
- Fixing broken links — detection only
- Checking internal cross-file links — separate concern
## Constraints
- **CI:** Must run in GitHub Actions — no external service dependencies
- **Language:** Team prefers Go for CLI tools — aligns with existing tooling
## Key Decisions
### Concurrency model
- **Decision:** Concurrent link checking with configurable parallelism
- **Alternatives considered:** Sequential checking (simple but slow), unbounded parallelism (risks rate limiting)
- **Rationale:** Need to finish in 60s but also avoid hammering external servers. Default 10 concurrent, configurable via flag.
## Reference Points
- "Like `markdown-link-check` npm package but faster and no Node dependency"
## Open Questions
- Should rate-limited responses (429) be treated as broken or retried? Need to decide retry policy.
## Future Considerations
- Anchor fragment checking — worth a follow-up if teams request it
- Cache layer for recently-checked URLs — would speed up repeated CI runs
```