@clawhub-quochungto-93dad49abd
Analyze and engineer liking to increase rapport, persuasion, and compliance in marketing, sales, and communication contexts. Use this skill when the user wan...
---
name: liking-factor-engineer
description: >
Analyze and engineer liking to increase rapport, persuasion, and compliance in marketing, sales, and communication contexts.
Use this skill when the user wants to improve how much an audience likes them, their brand, or their message — including writing sales copy,
designing onboarding flows, crafting brand voice, building personal brand, creating relationship-based sales strategy, writing endorsement copy,
structuring ad creative, designing UX that builds trust, or preparing for any high-stakes pitch or persuasion scenario.
Also use when the user suspects they are being manipulated by a compliance professional through manufactured rapport, flattery, or contrived similarity.
Trigger keywords: liking, rapport, trust, relationship building, brand personality, personal brand, similarity, compliments, familiarity, association,
endorsement, halo effect, attractive design, influence, persuasion, likability, warm, friendly, relatable, connection.
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/influence-psychology-of-persuasion/skills/liking-factor-engineer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: influence-psychology-of-persuasion
title: "Influence: The Psychology of Persuasion"
authors: ["Robert B. Cialdini"]
chapters: [5]
pages: "126-156"
tags: [persuasion-psychology, liking, rapport, persuasion, marketing, sales, brand, halo-effect, similarity, association, endorsement, compliance, defense]
depends-on: []
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: "Brand voice guidelines, marketing content, sales pitches, or scenario description provided by the user"
tools-required: [Read, Write]
tools-optional: [Grep]
mcps-required: []
environment: "Any agent environment. Document inputs preferred but not required — user's verbal description is sufficient."
---
# Liking Factor Engineer
## When to Use
You are in one of two modes:
**APPLICATION mode** — The user wants to increase how much a target audience likes their brand, product, communication, or themselves as a salesperson or professional. This includes: brand messaging, sales copy, outreach emails, onboarding flows, endorsement strategy, ad creative, personal brand content, UX microcopy, pitches, and relationship-based selling.
**DEFENSE mode** — The user is evaluating a situation where they may be on the receiving end of manufactured liking — a sales conversation, a negotiation, or any context where a compliance professional is trying to get them to say yes.
Before starting, determine the mode by asking: "Are you trying to build liking (application) or protect against it (defense)?" Both modes can be run in the same session.
## Context and Input Gathering
### Required Context
- **Target audience or counterpart:** Who is this for? (Consumer segment, specific buyer, brand persona, the user themselves as a buyer)
→ Check prompt for: audience description, persona, customer type
→ If missing, ask: "Who is the person you need to like you / your brand / your message?"
- **What needs to be liked:** The brand, the product, the salesperson, the message, the content?
→ Check prompt for: existing copy, scenario description, content type
- **Mode (application vs defense):** Already described above.
### Observable Context
- **Existing brand voice or content:** If files are provided, read them to identify which liking factors are already present and which are absent.
→ Look for: warmth/personality signals (compliments), visual/aesthetic polish (physical attractiveness), shared values with audience (similarity), partnership language (association), repeated touchpoints (familiarity)
→ If unavailable: proceed from user's verbal description
### Default Assumptions
- If no audience described → assume a consumer-facing marketing context
- If no existing content provided → generate strategy recommendations the user can apply
- If mode is ambiguous → default to APPLICATION, flag DEFENSE considerations at the end
## Process
### Step 1: Analyze the Scenario and Content
**ACTION:** Read all provided content (brand docs, copy, pitch, messages). Identify: Who is the requester? Who is the target? What is the compliance goal (buy, sign up, agree, trust, refer)?
**WHY:** Liking is a compliance tool — the endgame is a "yes." Understanding the specific yes being sought determines which liking factors carry the most leverage. A Tupperware-style social purchase is different from a B2B sales close; a personal brand play is different from a product endorsement. The goal shapes the strategy.
### Step 2: Audit Against All Five Liking Factors
**ACTION:** Evaluate the current scenario or content against each of the five research-validated liking factors. Score each factor as: **Active** (clearly present), **Weak** (present but underdeveloped), or **Absent** (no presence).
The five factors (see [references/liking-factors-evidence.md](references/liking-factors-evidence.md) for full research data):
1. **Physical Attractiveness** — Does the brand/product/person project quality through visual and aesthetic signals? Does the design, presentation, photography, or formatting create a halo effect (good-looking = good product)?
- The halo effect is automatic: attractive candidates get 2.5x more votes, attractive defendants are twice as likely to avoid jail, attractive job applicants are hired based on appearance even when interviewers deny it.
- In brand contexts: design quality, imagery polish, spokesperson attractiveness, and even grooming/professionalism of sales representatives all trigger this factor.
2. **Similarity** — Does the communication reflect shared opinions, values, background, personality, or lifestyle with the target audience?
- Compliance rates jump from under 50% to over 67% when a requester appears similarly dressed.
- Car salespeople are trained to scan trade-ins for camping gear, golf balls, and out-of-state plates — then mirror those interests in conversation.
- Even small, trivial similarities (same hometown, same name, same age) are effective.
3. **Compliments** — Does the communication express genuine appreciation for the audience? Does it make them feel seen, valued, or admired?
- Joe Girard sent 13,000+ "I like you" cards per month to former customers — and became the world's greatest car salesman (Guinness World Records). His formula: a fair price + someone they liked.
- Flattery works even when obviously insincere. North Carolina study: pure praise produced maximum liking even when the recipient knew the flatterer stood to benefit and even when the praise was objectively false.
4. **Familiarity and Contact** — Has the audience seen, heard, or experienced this brand/person/product repeatedly under positive conditions?
- Mere exposure increases liking — but ONLY under non-competitive, cooperative conditions.
- Critical distinction: contact under rivalry or frustration worsens liking (Robbers Cave experiment: competition between groups produced name-calling, raids, and fights). Contact under cooperation improves it (jigsaw classroom: cooperative tasks converted rivals into allies).
- Application: repeated positive touchpoints (email sequences, retargeting, helpful content) build familiarity. Competitive framing ("we're better than X") destroys it.
5. **Association** — Is the brand/person/message linked to things the audience already likes, admires, or values?
- Razran's luncheon technique (1930s): political statements presented during a meal became more liked — the positive feeling of food transferred to the ideas.
- Association works for both positive and negative connections. Weathermen are disliked for bad weather they didn't cause. Brand spokespersons transfer their personal qualities to products.
- Basking in reflected glory: people display connections to winners and hide connections to losers. After wins, fans say "we won"; after losses, "they lost."
### Step 3: Identify the Strongest Liking Factor Opportunities
**ACTION:** From the audit, identify which 1-3 factors are (a) currently weakest and (b) most relevant to the specific audience and goal. Prioritize interventions that will move the needle most.
**WHY:** Not all five factors are equally applicable in every context. Physical attractiveness matters enormously in consumer product ads but less in B2B whitepapers. Association dominates celebrity endorsement strategy but may be irrelevant for a solo freelancer. Similarity is the most reliable and universally applicable factor — it should almost always be a primary lever. Focusing improvement on the highest-leverage absent factors produces more ROI than polishing already-active ones.
**Prioritization heuristics:**
- Similarity is the most universally powerful factor — default priority if weak or absent
- Familiarity is a compounding factor — it grows over time; start early
- Compliments require authenticity to sustain — insincere praise works short-term but erodes trust long-term
- Physical attractiveness is most impactful at first impression (homepage, ad creative, LinkedIn profile)
- Association is most powerful for credibility transfer and social proof
### Step 4: Design the Application Strategy
**ACTION:** For each prioritized liking factor, generate specific, actionable interventions. Map each intervention to a concrete touchpoint in the user's context (email, landing page, social post, sales call, product UI, etc.).
**WHY:** Abstract advice ("be more relatable") produces no change. Specific interventions at specific touchpoints do. The goal is to give the user something they can act on immediately.
**Intervention templates by factor:**
**Similarity interventions:**
- Research audience demographics, values, and lifestyle markers → reflect them in language, imagery, and examples
- Use audience-native vocabulary (industry jargon, regional phrases, generational references)
- Reference shared challenges, frustrations, or aspirations before presenting solutions
- In sales: scan for and mirror background, interests, values early in conversation
**Compliment interventions:**
- Open copy or conversations by acknowledging something specific and genuine about the audience
- Use second-person affirmations ("You're the type of person who...") that reflect audience self-image
- Send appreciation touchpoints that do not ask for anything (Joe Girard's monthly card model)
- Validate the audience's existing choices before presenting alternatives
**Familiarity interventions:**
- Design a repeated positive touchpoint cadence: helpful content, check-ins, updates that carry no ask
- Ensure every touchpoint is cooperative in tone — position brand as ally, not competitor
- Retargeting should feel like a friendly reminder, not a pursuit
- Create rituals (weekly newsletters, consistent brand voice, recurring formats) that become familiar patterns
**Association interventions:**
- Identify what the audience already loves/trusts → find authentic connection points
- Use testimonials and social proof from people the audience looks up to
- Present the product alongside aspirational contexts (events, lifestyles, values)
- Partnerships, co-marketing, and celebrity or expert endorsement explicitly transfer liking
- Avoid association with anything the audience dislikes — even tangential connections stick
**Physical attractiveness / halo interventions:**
- Invest in design quality — visual polish triggers automatic "good = good" inference
- Use professional, high-quality photography for products, teams, and spokespersons
- Grooming and presentation guidelines for sales representatives and brand representatives
- In UX: clean interfaces, consistent design systems, and premium aesthetic signals all trigger halo effects
### Step 5: Check Ethical Boundaries
**ACTION:** Review the proposed strategy against the distinction between authentic rapport and manufactured compliance. Flag any intervention that crosses from genuine relationship-building into deception.
**WHY:** Liking factors work even when artificial — but manufactured rapport that the audience later perceives as fake backfires catastrophically. Authentic rapport compounds over time and generates referrals (Tupperware's model relied on genuine friendship networks — the liking had to be real for the hostess relationship to hold). Manufactured rapport creates fragile compliance that dissolves under scrutiny.
**The authentic vs manufactured line:**
- Authentic: Genuinely finding and reflecting real shared values → builds lasting relationship
- Manufactured: Inventing false similarities ("I'm from Ohio too!" when you're not) → deception
- Authentic: Sending appreciation that reflects genuine positive feeling → builds goodwill
- Manufactured: Automated "I like you" messages with no genuine basis → hollow, erodes trust over time
- Authentic: Associating brand with values the brand genuinely embodies → credibility
- Manufactured: Celebrity endorsement where celebrity has no real connection to product → can work short-term but audiences increasingly see through it
Flag and revise any intervention that relies on false claims, invented similarities, or associations the brand cannot authentically sustain.
### Step 6: Defense — Separation Protocol (run in DEFENSE mode or append to APPLICATION)
**ACTION:** When evaluating whether you (the user) are being influenced by manufactured liking, apply the single-criterion detection test and the separation protocol.
**WHY:** There are too many liking tactics to detect each one individually — many operate unconsciously (physical attractiveness, familiarity, association all work below awareness). Trying to spot the specific tactic being used is a losing game. The elegant defense focuses on the effect, not the cause.
**The detection test (ask yourself exactly this):**
> "Have I come to like this person more than I would have expected given only the time I've spent with them and the circumstances?"
If the answer is yes — regardless of why — that feeling is the signal. You don't need to know whether it was the compliments, the similarity claims, the food they served, or their attractive appearance. The anomalous liking itself is the trigger.
**The separation protocol (three steps):**
1. **Notice the feeling.** Acknowledge that you like this person more than the circumstances warrant.
2. **Mentally separate the person from the deal.** Ask: "Would I buy this car / accept this offer / agree to this request if a stranger offered it to me?" Evaluate the offer on its independent merits — price, terms, quality, fit for your needs.
3. **Do not actively dislike the person.** The goal is not to reverse the liking — that would be unfair and counterproductive. The goal is to bracket the liking and make the decision based on the deal alone.
**Reference case:** Car salesman "Dealin' Dan" — in 25 minutes he fed you coffee and doughnuts, complimented your color choices, mirrored your interests, and cooperated with you against the sales manager. The question is not "did he use liking tactics?" — the question is "if a stranger offered this car at this price, would I take it?" That's the only question that matters.
## Inputs
- Brand voice guidelines, marketing copy, sales pitch scripts, or outreach messages (preferred but optional)
- User's verbal description of the scenario, audience, and goal
- In defense mode: description of the interaction the user is evaluating
## Outputs
### Liking Factor Audit Report
```
# Liking Factor Audit: {Brand/Scenario Name}
## Goal
{What compliance outcome is being sought — buy, sign up, agree, trust, refer?}
## Audience
{Who is the target? What do they value?}
## Factor Coverage
| Factor | Status | Evidence from content | Priority |
|---------------------------|---------|----------------------|----------|
| Physical Attractiveness | Active / Weak / Absent | {observation} | High / Med / Low |
| Similarity | Active / Weak / Absent | {observation} | High / Med / Low |
| Compliments | Active / Weak / Absent | {observation} | High / Med / Low |
| Familiarity / Contact | Active / Weak / Absent | {observation} | High / Med / Low |
| Association | Active / Weak / Absent | {observation} | High / Med / Low |
## Top 3 Intervention Opportunities
1. **{Factor}** — {specific intervention} → {touchpoint}
2. **{Factor}** — {specific intervention} → {touchpoint}
3. **{Factor}** — {specific intervention} → {touchpoint}
## Ethical Review
{Any interventions flagged as crossing authentic/manufactured line, with revision}
## Defense Check (if applicable)
{Detection test result + separation protocol application}
```
## Key Principles
- **Liking operates on all five factors simultaneously** — compliance professionals use multiple factors at once (Tupperware parties deploy all six of Cialdini's principles, with liking as the centerpiece). Audit all five even if only one is the primary lever, because gaps in coverage leave liking on the table.
- **Similarity is the universal lever** — across all contexts, similarity is the most reliably applicable liking factor. When in doubt about where to start, mirror the audience's values, language, and lifestyle first.
- **Contact requires cooperation to build liking** — repeated exposure under competitive or frustrating conditions worsens liking, not improves it. This is the most commonly misapplied insight: simply exposing someone to your brand more often does not build liking unless every touchpoint is positive and cooperative in tone.
- **Detect the effect, not the cause** — in defense contexts, do not try to catalog which specific tactic was used. Ask only: "Do I like this person more than I should given the circumstances?" That single question bypasses the entire problem of unconscious influence.
- **Separate the person from the deal** — liking the salesperson is irrelevant to whether the deal is good. The separation protocol makes this distinction explicit and allows you to preserve the relationship while making a clear-headed decision.
- **Authentic rapport compounds; manufactured rapport collapses** — liking built on genuine similarity, real compliments, and true associations sustains itself and generates referrals. Liking built on artifice works once and leaves a bad taste. Design for the long game.
## Examples
**Scenario: SaaS onboarding email sequence lacks warmth**
Trigger: "Our trial-to-paid conversion is low. Users say the product is great but they don't feel connected to us."
Process: Audited email sequence — Physical Attractiveness (Active: good design), Similarity (Absent: generic copy doesn't reflect user persona), Compliments (Absent: no appreciation for user's choice to try), Familiarity (Weak: irregular cadence), Association (Weak: no social proof from similar companies). Top interventions: (1) Similarity — rewrite onboarding emails in the language of the target persona (startup founders), reflect their specific frustrations; (2) Compliments — add Day 1 email that genuinely appreciates them for joining and recognizes the courage of starting a company; (3) Familiarity — establish a consistent weekly cadence of helpful tips that carry no ask.
Output: Rewritten 5-email sequence with persona-specific language, appreciation touchpoint, and weekly value cadence. Ethical review: all similarities based on real persona research, not invented.
**Scenario: Sales rep preparing for a high-stakes enterprise pitch**
Trigger: "I have 45 minutes with the VP of Operations at a company I really want to close. What should I do?"
Process: Similarity — research the prospect's LinkedIn, company About page, and recent press; identify shared professional values or background to reference authentically. Compliments — open by acknowledging something specific and genuine about their organization's approach. Familiarity — confirm whether prior touchpoints have been positive; if not, warm up with a useful insight before the pitch. Association — identify which of their respected peers or competitors already use the product; lead with that reference. Physical Attractiveness — ensure professional presentation aligns with their company culture.
Output: Pre-call research checklist, opening conversational moves for each factor, and a reminder to evaluate the prospect's response to similarity/compliment moves as an indicator of receptivity.
**Scenario: User evaluating a car purchase after a charming sales conversation**
Trigger: "I spent 2 hours with this salesman and I genuinely love the guy. He's offering me a deal. Should I take it?"
Process: Defense mode. Detection test: "In 2 hours, did you come to like him more than the time and circumstances would normally produce?" Answer: yes — he fed them snacks, complimented their taste, mentioned he's also from their home state, and worked with them against the manager. Separation protocol: bracket the liking. Ask: "If this exact car at this exact price were being offered online by a stranger, would you take it?" Evaluate price against market comps, terms against industry standards, and car specs against stated needs — entirely independently.
Output: Defense evaluation report. Car decision based on deal merits, not salesman liking. Explicitly note: the user can still like Dan and even refer him later — the goal is not to dislike, but to decide on the deal alone.
## References
- For research evidence behind all five factors with full statistics and study citations, see [references/liking-factors-evidence.md](references/liking-factors-evidence.md)
- For commercial model case studies (Joe Girard, Tupperware, Shaklee, MCI Calling Circle), see [references/commercial-models.md](references/commercial-models.md)
- For the Good Cop/Bad Cop mechanics as a multi-principle compliance model, see [references/good-cop-bad-cop.md](references/good-cop-bad-cop.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Influence: The Psychology of Persuasion by Robert B. Cialdini.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/commercial-models.md
# Commercial Liking Models
Case studies of how compliance professionals systematically deploy the liking rule in commercial contexts. From Chapter 5 of *Influence: The Psychology of Persuasion* by Robert B. Cialdini.
---
## Model 1: Joe Girard — The Liking Formula Distilled
### Profile
- Guinness World Records: "world's greatest car salesman" for 12 consecutive years
- Income: more than $200,000/year as a showroom floor salesman
- Average: more than 5 cars and trucks sold every day he worked
- Source: Chevrolet dealership, Detroit
### The Formula
"It consisted of offering people just two things: a fair price and someone they liked to buy from. And that it," claimed in interview. "Finding the salesman they like, plus the price; put them both together, and you get a deal."
### The Mechanism
- Fair price alone is not enough — dozens of salespeople can offer fair prices
- The differentiator was manufactured liking at scale
- Primary tactic: **Compliments at scale via mass direct mail**
### The Card System
- Every month, sent a holiday greeting card to every one of his 13,000+ former customers
- Card changed monthly (Happy New Year, Happy Thanksgiving, Happy Valentine's Day, etc.)
- Message on the face of the card: always and only "I like you"
- Nothing else: no upsell, no offer, no product mention — just his name and "I like you"
- Annual volume: well over 150,000 cards per year
- Cost: significant printing and mailing expense
### Why It Works
- Demonstrates the flattery principle: liking even obviously impersonal, obviously commercial praise
- The recipients knew exactly what Joe was doing; they still liked him
- "Joe understands an important fact about human nature: We are phenomenal suckers for flattery."
- Even when the praise is clearly false or commercially motivated, humans tend to believe praise and like those who provide it
### Lesson for Application
The card system demonstrates that the liking rule does not require subtlety. A direct, repeated, mass-produced expression of positive feeling still works — because the automatic response to being liked is to like in return. The lesson is not to copy the method exactly but to understand the principle: sustained, regular expressions of genuine regard build liking that converts to compliance.
---
## Model 2: Tupperware Party — All Six Principles, Liking as the Center
### Structure
- Party hosted by a housewife (the hostess) in her home
- Tupperware demonstrator is physically present but is not the real requester
- The compliance pressure comes from the hostess — a friend of every person in the room
### Why It Dominates Retail
- Tupperware abandoned retail outlets entirely
- A party now starts somewhere every 2.7 seconds globally
- Estimated sales exceeding $2.5 million per day (at time of publication)
### The Social Bond as Compliance Engine
Consumer researchers Frenzen and Davis (examined social ties between hostess and partygoers) found: the strength of the social bond between hostess and partygoer is TWICE as likely to determine product purchase as is preference for the product itself.
The product preference is irrelevant compared to the friendship pressure.
### All Six Principles in Play
1. **Reciprocity** — games are played, prizes are given; everyone receives a gift before buying begins
2. **Commitment** — each participant publicly describes the uses and benefits of Tupperware she already owns
3. **Social Proof** — once buying begins, each purchase signals that similar people want the product
4. **Liking** — the true requester is the hostess, a friend; the company provides her with financial motivation
5. **Authority** — the Tupperware demonstrator provides product expertise and credibility
6. **Scarcity** — implied by limited party availability and specific product assortments
### The Hostess Arrangement
- Tupperware Home Parties Corporation gives the hostess a percentage of every sale
- The hostess calls friends together for the demonstration
- Her financial interest is hidden behind the friendship frame
- Partygoers feel they are buying from a friend, not a stranger
- "By providing the hostess with a percentage of the take, the Tupperware Home Parties Corporation arranges for its customers to buy from and for a friend"
### Participant Awareness
Fully aware, many guests still comply. One woman quoted: "It's gotten to the point now where I hate to be invited to Tupperware parties. I've got all the containers I need; and if I wanted any more, I could buy another brand cheaper in the store. But when a friend calls up, I feel like I have to go. And when I get there, I feel like I have to buy something. What can I do? It's for one of my friends."
### Lesson for Application
The Tupperware model reveals that liking transfers through social networks — the friend's relationship does the compliance work. The compliance professional (Tupperware demonstrator) doesn't need to manufacture personal liking; they leverage existing liking between the hostess and her friends. For product and brand strategy: giving trusted advocates financial or social incentive to promote a product to their own networks is structurally more powerful than direct persuasion.
---
## Model 3: Shaklee Corporation — The Endless Chain
### Profile
- Door-to-door sales of home-related products
- Proprietary method: "endless chain" for finding new customers
### The Method
1. Salesperson closes a sale with a customer
2. Asks the customer for the names of friends who would also appreciate learning about the product
3. Those friends are approached by the salesperson, who mentions that the mutual friend suggested the visit
4. Each new customer yields a new list of names — the chain extends indefinitely
### The Key Lever: Liking Transfers Through the Referral Name
- "The key to the success of this method is that each new prospect is visited by a salesperson armed with the name of a friend 'who suggested I call on you.'"
- Turning the salesperson away under those circumstances is almost like rejecting the friend
- The Shaklee sales manual: "It would be impossible to overestimate its value. Phoning or calling on a prospect and being able to say that Mr. So-and-so, a friend of his, felt he would benefit by giving you a few moments of his time is virtually as good as a sale 50 percent made before you enter."
### Why It Works
- The friend doesn't even need to be present
- Simply mentioning the friend's name invokes the liking the prospect has for that friend
- The compliance professional benefits from a relationship they didn't build and a liking they didn't earn
- Rejection of the salesperson feels like rejection of the friend
### Lesson for Application
Referrals work not because of endorsement logic but because of the liking transfer. A referral moves the liking the customer has for their friend onto the salesperson before they've even met. In modern terms: referral programs, testimonials from people the prospect knows or respects, and word-of-mouth marketing all operate on this mechanism. The friend's name (or the trusted figure's endorsement) is the active ingredient.
---
## Model 4: MCI Calling Circle — Liking Disguised as Savings
### The Scheme
- Long-distance phone company (MCI) created "MCI Friends and Family Calling Circle"
- A customer's friend is added to the calling circle without their consent
- MCI salesperson calls the non-customer and explains: "Your friend Brad placed you on his Calling Circle. He can save 20% on calls to you if you switch to MCI."
### The Pressure Structure
- The financial benefit is framed as being for the friend, not for the prospect
- Refusing to switch means refusing to help save Brad money
- "For me to say that I didn't want to be in his Calling Circle and didn't care about saving him money would have sounded like a real affront to our friendship when he learned of it."
- The prospect switches to MCI to protect the friendship, not because they wanted MCI
### Effectiveness
- MCI salesperson quoted by Consumer Reports: "It works nine out of ten times."
### Lesson for Application
The MCI model reveals the weaponization of liking through social obligation: by framing a compliance request as an act of friendship maintenance rather than a commercial transaction, compliance rates approach 90%. The "friend as the requester" frame is among the most powerful structures in commercial persuasion. This is also why any tactic that hijacks friendship networks to generate commercial compliance carries significant ethical risk — it exploits the trust built by genuine relationships.
---
## Summary: Common Patterns Across Commercial Models
| Model | Primary liking factor exploited | Key mechanism |
|-------|--------------------------------|---------------|
| Joe Girard | Compliments | Mass-scale personal appreciation at regular intervals |
| Tupperware | Familiarity + all five factors | Friendship network transfers existing liking to purchase decision |
| Shaklee | Familiarity + similarity | Referral name invokes existing liking before first contact |
| MCI Calling Circle | Familiarity | Frames commercial compliance as friendship maintenance |
All four models share one insight: **the friend doesn't need to be present for liking to transfer**. The association of a trusted name, a consistent reminder of appreciation, or a network that channels pre-existing friendship into commercial settings is sufficient. Compliance professionals don't have to earn liking themselves — they borrow it from existing relationships.
FILE:references/good-cop-bad-cop.md
# Good Cop / Bad Cop — Multi-Principle Compliance Model
Analysis of the Good Cop/Bad Cop interrogation technique as a case study in simultaneous deployment of contrast, reciprocity, and liking. From Chapter 5 of *Influence: The Psychology of Persuasion* by Robert B. Cialdini.
---
## Why This Is In the Liking Chapter
Cialdini introduces Good Cop/Bad Cop as an example of how compliance professionals "manufacture" cooperation — specifically to demonstrate the Contact and Cooperation factor (Factor 4: Familiarity). Good Cop positions himself as the suspect's cooperative teammate against the bad cop's hostility. The technique produces liking through manufactured cooperation.
However, the technique is analytically significant because it simultaneously triggers THREE distinct principles:
1. **Contrast** (from Chapter 1: reciprocal concessions / perceptual contrast)
2. **Reciprocity** (from Chapter 2)
3. **Liking** (Chapter 5: manufactured cooperation)
---
## How Good Cop/Bad Cop Works: The Scenario
A robbery suspect is brought in for questioning. He has been advised of his rights and maintains innocence.
**Phase 1 — Bad Cop establishes the threat:**
- Before the suspect even sits down, Bad Cop curses him for the robbery
- Uses snarls, growls, and kicks the prisoner's chair to emphasize points
- Threatens maximum prison sentence
- Claims to have friends in the district attorney's office who will prosecute aggressively
- The goal: create maximum perceived threat, establish emotional contrast baseline
**Phase 2 — Good Cop emerges as ally:**
- Has been sitting quietly in the background during Bad Cop's performance
- Begins to intervene: "Calm down, Frank, calm down"
- Bad Cop refuses to be calmed: "Don't tell me to calm down when he's lying right to my face!"
- Good Cop speaks on the suspect's behalf: "Take it easy, Frank, he's only a kid"
- Good Cop calls the suspect by first name, points out positive details of the case
- "I'll tell you, Kenny, you're lucky that nobody was hurt and you weren't armed. When you come up for sentencing, that'll look good."
**Phase 3 — Cooperation is manufactured:**
- Good Cop sends Bad Cop for coffee ("Okay, Frank, I think we could all use some coffee. How about getting us three cups?")
- With Bad Cop gone, Good Cop's big scene: "Look, man, I don't know why, but my partner doesn't like you, and he's gonna try to get you. He's right about the D.A.'s office going hard on guys who don't cooperate. You're looking at five years, man... I don't want to see that happen to you. If we work together on this, we can cut that five years down to two, maybe one. Do us both a favor, Kenny. Just tell me how you did it, and then let's start working on getting you through this."
- A full confession frequently follows
---
## The Three-Principle Analysis
### Principle 1: Contrast (Perceptual Contrast)
- Bad Cop's extreme hostility makes Good Cop appear especially reasonable and kind by comparison
- The contrast is manufactured — Good Cop is not inherently reasonable; he only appears so relative to Bad Cop's baseline
- "Compared to the raving, venomous Bad Cop, the interrogator playing Good Cop will seem like an especially reasonable and kind man"
- The contrast principle: the second stimulus appears more different from the first when the two are presented in succession than if they were presented alone
### Principle 2: Reciprocity
- Good Cop intervenes repeatedly on the suspect's behalf — a series of unsolicited favors
- "Has even spent his own money for a cup of coffee — the reciprocity rule pressures [the suspect] for a return favor"
- The suspect now psychologically owes Good Cop
- Reciprocity creates pressure to comply with Good Cop's request (confession) as a return favor for his advocacy
### Principle 3: Liking (Manufactured Cooperation)
- The "big reason" the technique is effective: Good Cop gives the suspect the idea that there is someone on his side, working WITH him, FOR him
- This manufactured alliance creates intense liking under conditions of perceived threat (the suspect needs an ally most when most threatened)
- "In most situations, such a person would be viewed very favorably, but in the deep trouble our robbery suspect finds himself, that person takes on the character of a savior. And from savior, it is but a short step to trusted father confessor."
- The cooperative framing: "we can cut that five years down to two" — "we," not "you"
---
## Why the Technique Is Structurally Powerful
The technique's effectiveness comes from simultaneous principle activation:
| Principle | What it contributes |
|-----------|---------------------|
| Contrast | Makes Good Cop appear maximally reasonable without him being objectively reasonable |
| Reciprocity | Creates psychological debt for Good Cop's interventions |
| Liking | Converts Good Cop into an ally / savior figure the suspect trusts and wants to please |
No single principle would produce the same effect. Contrast alone would produce resentment, not compliance. Reciprocity alone wouldn't overcome fear of self-incrimination. Liking alone without the threat baseline wouldn't produce the urgency. The combination creates a compliance cascade.
---
## Application for Non-Interrogation Contexts
The Good Cop/Bad Cop structure appears in many commercial and negotiation contexts. Recognizing the structure allows both deployment and defense.
### Deployment applications (ethical contexts):
- **Sales negotiation:** Two-person sales team where one takes hard positions (price, terms) and the other advocates for the customer — "Let me see what I can do for you with my manager"
- **Brand positioning:** Brand presents itself as the customer's ally against a frustrating industry ("We're fighting to end hidden fees on your behalf")
- **Customer service escalation:** First-tier agent (firm on policy) vs. second-tier resolution specialist (makes concession, becomes hero) — the customer's gratitude toward the specialist is proportionally greater because of the contrast
### Defense applications:
- When you feel sudden warmth toward someone who appears to be "on your side" against a threat or difficult party, check: was the threat real, or was it manufactured to make this person appear more cooperative than they actually are?
- The separation protocol applies: evaluate the offer independently of how much you like the person who made you feel safe
---
## Key Insight for Liking Factor Engineering
The Good Cop/Bad Cop model demonstrates that **manufactured cooperation is one of the most powerful liking triggers available**. When you position yourself — authentically or strategically — as someone who is working for the other party against a common challenge, you generate intense liking even under first-meeting conditions.
Brand applications: products that position themselves as the customer's ally against an industry, a frustrating problem, or an adversarial norm leverage this mechanism. This is why challenger brand messaging ("we're fighting for you against Big X") generates such strong liking — it manufactures the cooperative frame that Cialdini identifies as the deep mechanism behind the Contact/Familiarity factor.
FILE:references/liking-factors-evidence.md
# Liking Factors — Research Evidence
Full citations and study data supporting the five liking factors from Chapter 5 of *Influence: The Psychology of Persuasion* by Robert B. Cialdini.
---
## Factor 1: Physical Attractiveness
### The Halo Effect Mechanism
Physical attractiveness triggers a "click, whirr" automatic response — a halo effect where one positive characteristic (good looks) dominates how a person is perceived across ALL other dimensions. People automatically attribute favorable traits to good-looking individuals: talent, kindness, honesty, and intelligence. These judgments happen without awareness.
### Key Research Findings
**Canadian federal elections study:**
- Attractive candidates received more than 2.5 times as many votes as unattractive candidates
- Follow-up research: 73% of Canadian voters denied their choices were influenced by physical appearance; only 14% even allowed for the possibility of such influence
- Implication: the halo effect operates below conscious awareness — people cannot self-report it accurately
**Pennsylvania criminal trial study:**
- Researchers rated physical attractiveness of 74 male defendants at the start of criminal trials
- Checked court records for outcomes later
- Handsome men received significantly lighter sentences
- Attractive defendants were TWICE as likely to avoid jail as unattractive ones
- Effect held for both male and female jurors
**Negligence damages study (staged trial):**
- Better-looking defendant vs. victim: average compensation to victim = $5,623
- More attractive victim: average compensation = $10,051
- Attractiveness of the victim dramatically increased damages awarded
**Helping and persuasion studies:**
- Attractive people are more likely to obtain help when in need
- Attractive people are more persuasive in changing audience opinions
- Both sexes respond the same way; effect holds even for same-sex interactions
- Exception: when the attractive person is a direct romantic rival
**Hiring study:**
- Good grooming in a simulated employment interview accounted for more favorable hiring decisions than job qualifications
- Interviewers claimed appearance played a small role — their actual decisions showed otherwise
**Children:**
- Adults view aggressive acts by attractive children as less naughty
- Teachers presume good-looking children are more intelligent than less-attractive classmates
- Social benefits of good looks begin accumulating early in childhood
### Industry Applications
- Sales training programs include grooming hints
- Fashionable clothiers staff showrooms from attractive candidate pools
- Con men and con women are specifically selected for their appearance
- Car dealerships, real estate agencies, and financial services firms systematically hire based on appearance
---
## Factor 2: Similarity
### Mechanism
We like people who are similar to us. The similarity can be in opinions, personality traits, background, lifestyle, dress, values, age, religion, politics — virtually any dimension. The more dimensions of similarity, the stronger the liking.
### Key Research Findings
**Dress similarity study (early 1970s):**
- Experimenters dressed as hippies or "straight" college students approached campus students for a dime
- Same-dressed requester: request granted more than two-thirds of the time (>67%)
- Different-dressed requester: request granted less than half the time (<50%)
- Result is automatic — subjects don't deliberate; they respond to the similarity signal directly
**Antiwar demonstration petition study:**
- Marchers signed petitions from similarly dressed requesters more readily
- AND signed without even reading the petition first
- Implication: similarity bypasses critical evaluation entirely
**Insurance company study:**
- Customers more likely to buy insurance from salesperson who was like them in age, religion, politics, and cigarette-smoking habits
- Small similarities (any dimension) can be effective — the category is broad
**Car salesman training practice:**
- Trained to scan trade-in vehicles for evidence of customer's background and interests:
- Camping gear in trunk → mention camping interest later
- Golf balls on back seat → mention scheduled golf game
- Out-of-state purchase → ask if customer is from that state
- These manufactured similarities appear to work; the customer experiences them as genuine connection
**Mirror and match:**
- Modern sales training programs instruct trainees to mirror customer body posture, verbal style, and language
- Similarity along physical dimensions has been shown to lead to positive results
### Caution: Manufactured Similarity
Cialdini explicitly warns: "Because even small similarities can be effective in producing a positive response to another and because a veneer of similarity can be so easily manufactured, I would advise special caution in the presence of requesters who claim to be 'just like you.'"
---
## Factor 3: Compliments
### Mechanism
Humans have an automatic positive reaction to flattery. Even when the praise is obviously insincere, even when the recipient knows the flatterer is trying to manipulate them, and even when the praise is objectively false — compliments still produce liking and compliance.
### Key Research Findings
**North Carolina men study:**
- Men received comments from another person who needed a favor from them
- Three conditions: (1) only positive comments, (2) only negative comments, (3) mix of positive and negative
- Three key findings:
1. The evaluator who provided only praise was liked best
2. This was true even though the men fully realized the flatterer stood to benefit from their liking
3. Pure praise did not have to be accurate to work — positive comments produced just as much liking for the flatterer when they were untrue as when they were true
### Joe Girard Case Study
- Joe Girard held the Guinness World Record as "world's greatest car salesman" for 12 consecutive years
- Averaged more than 5 cars and trucks sold every day he worked
- Made more than $200,000/year as a floor salesman (not owner, not executive)
- His formula: "It consisted of offering people just two things: a fair price and someone they liked to buy from."
- Each month he sent every one of his 13,000+ former customers a holiday greeting card
- The message printed on the face of the card never varied: **"I like you."**
- Card changed monthly (Happy New Year, Happy Thanksgiving, etc.) but the message was always "I like you"
- "There's nothing else on the card. Nothin' but my name. I'm just telling 'em that I like 'em."
- Mailing cost: well over 150,000 cards per year
Joe's insight: "We are phenomenal suckers for flattery. Although there are limits to our gullibility — especially when we can be sure that the flatterer is trying to manipulate us — we tend, as a rule, to believe praise and to like those who provide it, oftentimes when it is clearly false."
---
## Factor 4: Familiarity and Contact
### Mechanism
We are more favorable toward things we have had contact with. Mere exposure increases liking. However — critically — this effect only holds under positive conditions. Contact under negative, competitive, or frustrating conditions WORSENS liking.
### Key Research Findings
**Photo preference study:**
- Subjects shown a photograph of their own face (true print) vs. mirror image of their face
- The subject preferred the mirror image (what they see in the mirror every day)
- Their friends preferred the true print (what the world sees)
- Both groups are responding favorably to the more familiar version of the face
**Face-flashing exposure study:**
- Faces flashed on a screen so briefly subjects couldn't consciously register them
- Subjects were unable to recall seeing any of the faces later
- But: the more frequently a face was flashed, the more the subjects came to like that person in a subsequent meeting
- AND: these subjects were also more persuaded by the opinion statements of individuals whose faces had appeared most frequently
**Ohio attorney general race:**
- A man given little chance of winning swept to victory when he changed his name to "Brown" shortly before the election — a name with strong Ohio political tradition
- Familiarity with the name, not familiarity with the person, was enough
### The Critical Caveat: Contact Requires Cooperation
**Robbers Cave experiment (Muzafer Sherif):**
- Boys at summer camp divided into two groups (Eagles and Rattlers)
- Competitive phase: cabin treasure hunts, athletic contests → name-calling, raids, threatening signs, physical friction
- Mere contact + competition = escalating hatred
**Resolution (cooperative phase):**
- Series of situations where competition would harm everyone's interests; cooperation was necessary for mutual benefit
- Stuck truck (all had to push together for food run), interrupted water supply (required joint repair), pooled money for a desired movie
- Result: verbal baiting stopped, jostling ended, boys began to mix at meal tables, cross-group friendships formed, hostile attitude reversed
**Jigsaw classroom (Elliot Aronson, Texas schools):**
- Students formed into cooperative learning teams; each student held one piece of the information needed for the test
- Must take turns teaching each other; everyone needs everyone else to do well
- Results vs. traditional competitive classrooms: significantly more cross-group friendship, less prejudice between ethnic groups, improved self-esteem, higher test scores for minority students, equivalent performance for white students
- Carlos case study: a Mexican-American boy initially ridiculed became valued as a teammate when his peers needed his information to pass; they came to like him
**Why traditional school desegregation often fails:**
- School setting is primarily competitive (students competing for teacher approval)
- Contact under competition → increased prejudice, not decreased
- This explains why raw desegregation "so frequently produces increased rather than decreased prejudice"
### Application for Compliance Professionals
- Compliance professionals "manufacture" cooperation: the new-car salesman who "takes your side" and "does battle" with the sales manager is manufacturing cooperation
- Good Cop/Bad Cop (see separate reference) is the law enforcement version of manufactured cooperation
---
## Factor 5: Conditioning and Association
### Mechanism
People are connected to the things they are associated with, both positive and negative. We like/dislike people based on what they are linked to — even when the person had no causal role. This is classical Pavlovian conditioning applied to social liking.
### Key Research Findings
**Razran's luncheon technique (1930s):**
- Subjects shown political statements they had rated before, presented during food consumption
- Only statements presented while subjects were eating gained in approval
- Changes appeared to occur unconsciously — subjects could not remember which statements they had seen during the food service
- The positive feeling of eating transferred to the associated ideas
**Pavlov connection:**
- Razran was one of the earliest translators of Russian psychological literature into English — directly influenced by Pavlov's work
- Pavlov: a dog salivates at a bell if the bell was always paired with food
- Razran's insight: any normal response to food (including positive feelings) can transfer to anything associated with it
**Persian messengers:**
- Ancient Persia: messengers bringing news of military victory were treated as heroes (food, drink, women of their choice)
- Messengers bringing news of military defeat were summarily slain
- The messenger did not cause the news; they were merely associated with it
**TV weather forecasters:**
- Routinely disliked for bad weather they did not cause and cannot control
- Being connected with sunshine versus bad weather measurably affects forecaster popularity
- "The nature of bad news infects the teller" (Shakespeare)
**Negative association study (University of Georgia):**
- Students told to inform another student of either good or bad news
- When news was positive: "You just got a phone call with GREAT news — better see the experimenter for details"
- When news was negative: "You just got a phone call. Better see the experimenter for the details" (distanced themselves)
- People naturally manage association by connecting themselves to positive events and separating from negative ones
**Sports association — basking in reflected glory:**
- Cialdini study: school sweatshirts more common on Monday mornings at 7 universities when team won the prior Saturday
- "We won" vs. "They lost" pronoun study: students used "we" for team victories, distanced pronouns for losses
- After general knowledge test failure, students showed greatest need to proclaim team wins — image damage drives reflected glory seeking
- New Orleans Saints fans wearing paper bags over heads after losses; discarding bags as the team started winning
### Brand Applications
- Automobile ads: attractive young models photographed with cars → beauty traits transfer to the car
- Men who saw car ad with seductive model rated the car as faster, more appealing, more expensive, better designed — and denied the model had influenced their judgment
- Celebrity endorsement: professional athletes paid to connect themselves to products (relevant OR irrelevant — the connection is what matters, not its logic)
- Cultural moment association: during first moon shot, everything was marketed with space program allusions
- "Natural" branding era (1970s): products associated with naturalness to transfer those values
- Radio station call-letter jingles played immediately before big hit songs → positive feeling of the song transfers to the station brand
- Tupperware: women yell "Tupperware!" instead of "Bingo!" so the prize celebration is conditioned to the brand name
### Positive vs Negative Association
- Mothers teach association avoidance: "you'll be known by the company you keep"
- People work to publicize their connections to winners and conceal connections to losers
- Stage mothers, name-droppers, and groupies all exploit positive association for personal prestige
Identify which of Cialdini's 6 influence principles to apply for a persuasion scenario. Use when someone asks "which persuasion tactic should I use?", "how d...
---
name: influence-principle-selector
description: |
Identify which of Cialdini's 6 influence principles to apply for a persuasion scenario. Use when someone asks "which persuasion tactic should I use?", "how do I make this more persuasive?", "what's the best influence strategy for this situation?", or "which Cialdini principle applies here?" Also use for: persuasion audit of marketing copy, sales email, or landing page; choosing between reciprocity vs scarcity vs social proof for a campaign; mapping a persuasion scenario to compliance psychology; diagnosing why content isn't converting; identifying influence tactics being used against you in a negotiation; evaluating ethical boundaries of a persuasion approach. Applies Cialdini's master taxonomy of 6 principles (reciprocity, commitment, consistency, social proof, liking, authority, scarcity) plus contrast principle and cross-principle interaction rules to produce a scored, rationale-backed recommendation. Classifies practitioners as ethical (real evidence) vs exploitative (manufactured triggers). Works on marketing strategy, sales psychology, copywriting, product onboarding, negotiation briefs, and any compliance scenario.
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/influence-psychology-of-persuasion/skills/influence-principle-selector
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on: []
source-books:
- id: influence-psychology-of-persuasion
title: "Influence: The Psychology of Persuasion"
authors: ["Robert B. Cialdini"]
chapters: [1]
tags: [persuasion, influence, cialdini, reciprocity, commitment, consistency, social-proof, liking, authority, scarcity, marketing-psychology, sales-psychology, compliance, persuasion-audit, influence-tactics, ethical-persuasion, contrast-principle]
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: "Persuasion scenario or marketing content to analyze — a situation brief, marketing copy, sales email, landing page text, or any persuasive communication"
tools-required: [Read, Write]
tools-optional: [Grep]
mcps-required: []
environment: "Any agent environment. Works with pasted text or document files."
discovery:
goal: "Identify the optimal influence principle(s) for a given persuasion scenario and produce an actionable recommendation with rationale"
tasks:
- "Score all 6 principles against a persuasion scenario"
- "Identify cross-principle interactions and stacking opportunities"
- "Evaluate ethical boundaries of proposed influence approach"
- "Produce a ranked recommendation with application guidance"
- "Audit existing content for principle usage and gaps"
audience:
roles: ["marketer", "salesperson", "copywriter", "product-manager", "negotiator", "entrepreneur", "UX-designer", "consultant"]
experience: "any — no psychology background required"
triggers:
- "User describes a persuasion or sales scenario and wants to know which principle to apply"
- "User has marketing copy, a sales email, or a landing page and wants it more persuasive"
- "User wants to audit content for compliance with influence principles"
- "User is in a negotiation and wants to understand the influence dynamics"
- "User wants to diagnose why their marketing isn't converting"
- "User has identified a Cialdini principle mentioned and needs help applying it"
not_for:
- "Building individual principle tactics in depth — use the dedicated principle skills (reciprocity-strategy-designer, scarcity-framing-strategist, etc.)"
- "Executing persuasion (writing copy, sending emails) — this skill selects the principle; execution skills implement it"
- "Defending against manipulation — use influence-defense-analyzer"
---
# Influence Principle Selector
## When to Use
You have a persuasion scenario — a marketing campaign, sales email, landing page, product launch, negotiation, or onboarding sequence — and need to identify which of Cialdini's 6 influence principles to apply, and in what combination.
This is the hub skill. Use it when you need principle selection and rationale. Once you have your recommendation, hand off to the dedicated principle skill (e.g., `reciprocity-strategy-designer`, `scarcity-framing-strategist`) for detailed implementation.
**Do not use this skill if:** You already know which principle applies and need implementation tactics. Go directly to the relevant principle skill.
---
## Context & Input Gathering
Before running the scoring process, collect:
### Required
- **The scenario:** What is the persuasion situation? (Cold outreach? Conversion page? Negotiation? Retention campaign?)
- **The audience:** Who is being persuaded? What do they want, fear, or value?
- **The goal:** What specific action or decision do you want the audience to take?
- **Your relationship to the audience:** First contact? Existing relationship? Warm referral?
### Important
- **What evidence you have:** Do you have real testimonials, genuine scarcity, actual credentials? (This determines ethical applicability of each principle.)
- **The medium:** Email, landing page, in-person pitch, ad copy, product UI?
- **Previous interactions:** Has the audience already taken any prior action or commitment?
### Optional (improves scoring precision)
- **Existing content:** If auditing existing copy, provide the text.
- **Competitor context:** Are you entering a crowded space or establishing a new category?
- **Conversion data:** If you have metrics, they can identify which principles are underperforming.
If required context is missing, ask for it before proceeding. A principle recommendation without audience and goal context is unreliable.
---
## Process
### Step 1: Map the Scenario to Principle Conditions
**Action:** Analyze the scenario against each principle's optimal activation conditions. Identify which conditions are present naturally vs. which would need to be created.
**WHY:** Each of the 6 principles activates through a specific trigger condition — a feature in the situation that fires the automatic compliance response. Some features may already exist in your scenario (e.g., you have genuine testimonials → social proof is ready to deploy). Others may need to be constructed (e.g., you need to establish a prior commitment before using consistency). Knowing which conditions are present vs. absent determines which principles are immediately available vs. which require setup.
Score each principle 1–5:
- **5** — All trigger conditions present. Real evidence exists. Strong fit.
- **4** — Most conditions present. Minor element missing; principle applies with small adjustments.
- **3** — Partial fit. Would need to set up conditions before deploying principle (e.g., create a commitment first).
- **2** — Key condition absent or scenario contradicts principle requirements.
- **1** — Principle does not fit scenario. Forcing it would be artificial or unethical.
See `references/principle-comparison-matrix.md` for the full activation condition checklist per principle.
---
### Step 2: Score Each Principle
**Action:** Run the scoring rubric against all 6 principles. Produce a score and a one-sentence rationale for each.
**WHY:** Scoring all 6 — not just the obvious one — prevents premature convergence on familiar principles (most practitioners default to scarcity or social proof). Running the full set often surfaces a high-scoring principle that was overlooked. The rubric also surfaces ethical flags: if the only way to score a principle highly requires manufacturing fake evidence, that is a clear signal to deprioritize it.
**Scoring template:**
```
Reciprocity: [1-5] — [one-sentence rationale]
Commitment: [1-5] — [one-sentence rationale]
Social Proof: [1-5] — [one-sentence rationale]
Liking: [1-5] — [one-sentence rationale]
Authority: [1-5] — [one-sentence rationale]
Scarcity: [1-5] — [one-sentence rationale]
```
---
### Step 3: Identify the Optimal Conditions and Sequencing
**Action:** For each principle scoring 3 or above, determine: (a) what makes it applicable now, and (b) if it scores 3 (setup required), what would need to happen first.
**WHY:** Sequencing matters enormously. Scarcity is far more powerful if commitment has been established first (Christmas toy tactic). Social proof is far stronger when the audience is uncertain. Authority suppresses skepticism before other principles are applied. Knowing the right sequence turns a mediocre single-principle approach into a layered strategy that compounds.
Key sequencing rules:
- **Commitment before Scarcity:** Establish desire/commitment first, then apply time/quantity pressure. Reverse order is weak.
- **Reciprocity before Request:** Give value before asking. Never reverse.
- **Authority before Technical Claims:** Establish credibility before making claims that require expertise.
- **Social Proof under Uncertainty:** Most powerful when the audience doesn't know what's normal or correct.
- **Contrast before Price:** Present expensive anchor before the actual price.
---
### Step 4: Check Cross-Principle Interactions
**Action:** Identify any stacking opportunities or interaction effects between the top-scoring principles. Flag any known interactions from the cross-principle map.
**WHY:** Principles interact — they can amplify or override each other. A practitioner who knows only individual principles misses the compounding effects. Reciprocity overrides liking (a cold prospect who received genuine value will comply even without rapport). Commitment plus scarcity stacks multiplicatively. Social proof amplifies any principle in uncertain situations. Identifying interactions turns a single-principle strategy into a layered one with compounding effect.
Check for:
- **Stacking opportunities:** Top 2–3 principles present simultaneously? (e.g., Tupperware: all 6)
- **Override effects:** Does one principle (especially reciprocity) make another unnecessary or redundant?
- **Sequencing dependencies:** Does one principle need to fire before another will work?
- **Contrast amplification:** Can contrast framing make any of the top principles hit harder?
---
### Step 5: Evaluate Ethical Boundaries
**Action:** For each recommended principle, verify that the trigger evidence is real — not manufactured, falsified, or misrepresented.
**WHY:** The 6 principles work because they tap normally-reliable decision shortcuts. When triggers are real, using them is legitimate persuasion — you're helping the audience make a correct decision efficiently. When triggers are manufactured (fake scarcity, paid testimonials presented as organic, fake credentials), you corrupt a shortcut that the audience depended on for accurate decisions. This causes harm beyond the individual interaction — it degrades the reliability of shortcuts that everyone uses. Ethically, it's also a liability risk.
Classify each recommended principle:
| Principle | Real evidence? | Classification |
|-----------|---------------|----------------|
| [Principle] | [Yes/No — describe] | Fair practitioner / Exploitative |
**Classification rule:**
- All trigger evidence is real → Fair practitioner. Proceed.
- Any trigger evidence is manufactured, falsified, or misrepresented → Exploitative. Revise or remove.
If a high-scoring principle requires fake evidence to activate, flag it explicitly and recommend the next-highest scoring principle with real evidence instead.
---
### Step 6: Produce the Recommendation
**Action:** Write a structured recommendation covering the top 1–3 principles, the recommended sequence, cross-principle interactions to exploit, and application guidance.
**WHY:** A ranked recommendation with explicit rationale enables confident decision-making. Surfacing the runner-up principle and why it was ranked lower prevents second-guessing. Including the sequence ensures the practitioner deploys principles in the order that maximizes effect — not just which principle, but when.
**Output format:**
```
## Influence Principle Recommendation
### Primary Principle: [Name] (Score: X/5)
**Why:** [1–2 sentences connecting activation conditions to scenario]
**Trigger to use:** [The specific feature to present]
**Application:** [1–2 concrete steps]
**Ethical check:** Fair practitioner — [real evidence available]
### Secondary Principle: [Name] (Score: X/5)
**Why:** [rationale]
**Sequence note:** [When to deploy relative to primary]
**Application:** [steps]
### Stacking Opportunity: [If applicable]
**Interaction:** [Which principles amplify each other and how]
**Recommended sequence:** [Order of deployment]
### Contrast Principle: [If applicable]
**Anchor:** [What to present first to create favorable contrast]
### Ruled Out: [Principle] (Score: X/5)
**Why not:** [Reason — missing condition, setup cost too high, or ethical flag]
### Next Step
→ Use [specific principle skill] to implement the primary recommendation.
```
---
## Inputs / Outputs
### Inputs
- Persuasion scenario description (required)
- Target audience profile (required)
- Goal / desired action (required)
- Existing content to audit (optional)
- Evidence inventory (testimonials, credentials, scarcity data) (optional)
### Outputs
- Scored principle ranking (all 6 scored)
- Structured recommendation with primary and secondary principles
- Sequencing guidance
- Cross-principle stacking opportunities
- Ethical classification
- Pointer to next-step principle skill
---
## Key Principles
**All 6 principles operate on the same mechanism.** Each one activates an automatic compliance response by presenting a single trigger feature. The feature causes the audience to reach a "yes" decision without full analysis. This is why shortcuts are powerful — and why they can be exploited. Understanding the mechanism (not just the label) helps you select the right one for your situation.
**The trigger condition — not the principle name — drives effectiveness.** Naming "scarcity" is not enough. Scarcity works when items are genuinely limited, newly scarce (not always scarce), and ideally under competition. Missing any of these conditions reduces effectiveness sharply. Always trace back to the trigger condition.
**Reciprocity is force-independent.** Unlike liking or authority, reciprocity obligation persists even when the recipient dislikes or distrusts the giver. This makes it the most reliable principle for cold interactions where no relationship exists yet.
**Stacking compounds compliance.** Principles applied in combination are not additive — they are multiplicative. The Tupperware party deploys all 6 simultaneously. Even 2–3 well-chosen, sequenced principles dramatically outperform any single principle alone.
**Contrast is a framing amplifier, not a principle.** The contrast principle (present expensive before cheap, hostile before reasonable) is not one of the 6 — it is a perceptual mechanism that can amplify any of them. Always check whether contrast framing can increase the impact of your primary principle.
**Ethical use requires real triggers.** Fair practitioners use real evidence (genuine scarcity, authentic testimonials, actual credentials). Exploitative practitioners manufacture fake triggers. The ethical test is simple: is the trigger feature true? Using real triggers is legitimate — you help the audience make a correct decision efficiently. Fake triggers corrupt a cognitive shortcut the audience relied on.
---
## Examples
### Example 1: SaaS Product Launch Email Campaign
**Scenario:** A B2B SaaS product is launching. The team wants to convert 200 beta users to paid. They have: 10 existing paying pilot customers who love the product, genuine limited launch pricing (first 50 seats at 40% off), and a credentialed founding team (ex-Google, Stanford PhD).
**Trigger:** "We have a launch email going out next week. How do we make it more persuasive?"
**Process:**
- Step 1: Conditions present — real testimonials (social proof ready), real limited pricing (scarcity ready), real credentials (authority ready), existing relationship from beta (liking/familiarity ready)
- Step 2: Scarcity=5, Social Proof=5, Authority=4, Commitment=3 (setup possible via free trial commitment), Liking=3, Reciprocity=2
- Step 3: Deploy Authority first (credibility reduces skepticism), then Social Proof (reduce uncertainty), then Scarcity (motivate action now)
- Step 4: Social Proof amplifies Scarcity when others are competing for limited seats
- Step 5: All triggers real — fair practitioner classification
**Output:**
```
## Influence Principle Recommendation
### Primary Principle: Scarcity (Score: 5/5)
Why: Launch pricing is real, genuinely limited to 50 seats, and time-bound.
Trigger: "First 50 seats at launch pricing — 23 remaining as of today."
Application: Show a counter or real number; update it. Combine with deadline.
Ethical check: Fair practitioner — pricing tier is real, seat count is genuine.
### Secondary Principle: Social Proof (Score: 5/5)
Why: 10 paying pilot customers with results provide authentic peer evidence.
Sequence: Lead with social proof before scarcity — first establish "this works
for people like you," then motivate urgency.
Application: Quote 2–3 pilot customers with specific results in the email body.
### Stacking: Social Proof + Scarcity
Interaction: Others competing for limited seats amplifies scarcity response.
Sequence: [1] Authority (founding team) → [2] Social Proof (pilot results) → [3] Scarcity (limited seats)
### Authority (Score: 4/5)
Sequence: Open with founding team credentials — one sentence.
Purpose: Suppress initial skepticism so social proof lands more strongly.
→ Next step: Use scarcity-framing-strategist for launch pricing copy.
Use social-proof-optimizer for testimonial selection and placement.
```
---
### Example 2: Negotiation — Enterprise Sales Close
**Scenario:** Enterprise sales rep is in final negotiation. The prospect has been evaluating for 3 months, has participated in 2 demos and a pilot, and is comparing to one competitor. Procurement wants a 15% discount the rep cannot give.
**Trigger:** "How do I handle the discount request and close this deal?"
**Process:**
- Step 1: Conditions — 3-month evaluation = strong commitment foundation, pilot = effortful commitment, no discount room but can offer value concessions
- Step 2: Commitment=5 (effortful prior investment is a major consistency anchor), Reciprocity=4 (concession exchange possible via door-in-the-face), Scarcity=3 (end-of-quarter pricing), Authority=3, Social Proof=2, Liking=3
- Step 3: Lead with Commitment (their 3-month investment creates consistency pressure), use Reciprocity via strategic concession (not discount — offer added value), apply Contrast before the concession (frame what they lose by not proceeding)
- Step 4: Commitment makes departure psychologically costly; Contrast frames the alternative as inferior; Reciprocity makes the value concession feel like a real give
**Output:**
```
## Influence Principle Recommendation
### Primary Principle: Commitment/Consistency (Score: 5/5)
Why: Three months of evaluation, two demos, and a completed pilot are all
active, effortful commitments that create strong consistency pressure.
Trigger: "You've invested three months validating this. The pilot confirmed X."
Application: Explicitly name the prior investment; connect it to the decision.
### Secondary Principle: Reciprocity — Door-in-the-Face (Score: 4/5)
Why: Cannot offer price discount but can make a value concession that feels
substantive. Offer something real (extended onboarding, dedicated CSM, extra
seats) as a give — not a discount. The concession creates reciprocity obligation.
Sequence: After anchoring commitment, make the value offer as a concession.
### Contrast Framing
Anchor: Frame the competitor alternative first — highlight what they'd lose
in switching costs, implementation time, feature gaps. Then present your offer.
Effect: Your terms appear more favorable after the competitor contrast.
### Ruled Out: Social Proof (Score: 2/5)
Why not: Procurement at this stage is not uncertain — they've evaluated.
Social proof would feel irrelevant to a procurement conversation about price.
→ Next step: Use commitment-escalation-architect for the specific framing.
```
---
### Example 3: Persuasion Audit — Landing Page Not Converting
**Scenario:** A landing page for a premium online course ($997) has a 1.2% conversion rate. The page has: a course description, instructor bio, curriculum list, and pricing. No testimonials, no enrollment deadline.
**Trigger:** "My landing page isn't converting. What influence principles am I missing?"
**Process:**
- Step 1: Page audit — social proof absent (no testimonials, no student count), scarcity absent (no deadline or limit), authority present but thin (bio exists but lacks specific credentials), commitment absent (no lead magnet or low-friction entry), reciprocity absent (no free value offered)
- Step 2: Social Proof=1 (missing, has no evidence to support), Scarcity=1 (missing), Authority=3 (bio present but underbuilt), Reciprocity=2 (nothing given before the ask), Commitment=2 (no entry point), Liking=3 (instructor voice could be stronger)
- Step 3: Biggest gap is social proof — uncertainty is high at $997, and there's no peer evidence. Second gap is scarcity — no urgency to act now.
- Step 5: All would require real evidence — need to collect genuine testimonials before adding social proof
**Output:**
```
## Influence Principle Audit: Landing Page
### Critical Gap 1: Social Proof (Current Score: 1/5)
Why missing: No testimonials, no student count, no case results.
At $997, uncertainty is high — this is exactly when social proof is most needed.
Fix: Collect 3–5 student testimonials with specific outcome results. Add enrollment
count ("1,200 students enrolled"). A/B test a results-focused testimonial above the fold.
Ethical check: Must use genuine testimonials — do not fabricate.
### Critical Gap 2: Scarcity (Current Score: 1/5)
Why missing: No enrollment window, no seat limit, page is always open.
Fix: Create a real enrollment period (cohort-based intake, or limited seats per month).
Do not manufacture fake countdown timers if enrollment never actually closes.
Ethical check: Scarcity must be real. If it's not, do not add it.
### Improvement: Authority (Current Score: 3/5 → target 5/5)
Current: Bio exists but generic.
Fix: Add specific credentials, outcomes delivered, named clients or publications.
Add a short video to increase familiarity and liking simultaneously.
### Quick Win: Reciprocity (Score: 2/5 → target 4/5)
Fix: Offer a free lesson, worksheet, or mini-course before the main ask.
This creates obligation before the purchase decision and demonstrates quality.
→ Next step: Use social-proof-optimizer for testimonial collection and placement.
Use scarcity-framing-strategist if enrollment windows can be made real.
```
---
## References
| File | Contents |
|------|----------|
| `references/principle-comparison-matrix.md` | Full 6-principle activation condition checklists; scoring rubric; cross-principle interaction map; scenario-to-principle routing guide; ethical boundary checklist; case study breakdowns (Tupperware, Good Cop/Bad Cop, Christmas toy tactic) |
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Influence: The Psychology of Persuasion by Robert B. Cialdini.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/principle-comparison-matrix.md
# Principle Comparison Matrix
> Reference for: influence-principle-selector
> Source: Influence: The Psychology of Persuasion — Robert B. Cialdini Ph.D.
---
## The 6 Principles at a Glance
| Principle | Core Mechanism | Trigger Condition | Primary Audience Motivation | Best Scenario Type |
|-----------|---------------|------------------|-----------------------------|-------------------|
| **Reciprocity** | Obligation to return favors or concessions | An unrequested gift, favor, or concession has been given | Avoiding social debt; fairness | Cold outreach, first-touch marketing, negotiation openers |
| **Commitment / Consistency** | Drive to act in line with prior statements and actions | A public, active, written, or effortful prior commitment exists | Avoiding cognitive dissonance; self-image integrity | Multi-step funnels, trial programs, onboarding sequences |
| **Social Proof** | Determining correct behavior by observing peers | Uncertainty about what to do + evidence of what similar others are doing | Risk aversion; desire for correct action | New product adoption, reviews, testimonials, conversion pages |
| **Liking** | Compliance with people we know, like, or identify with | A real or perceived connection (similarity, familiarity, attractiveness, cooperation) | Affinity; relationship maintenance | Referral programs, spokesperson selection, warm introductions |
| **Authority** | Deference to credible, legitimate experts | Credentials, titles, or symbols of expertise are present | Efficiency; reducing cognitive load of complex decisions | Technical products, health/finance decisions, B2B sales |
| **Scarcity** | Valuing what is rare or diminishing | Limited availability, time pressure, or exclusive access | Loss aversion; fear of missing out | Launches, limited editions, enrollment windows, negotiation deadlines |
---
## Principle Scoring Rubric
Use this rubric to score each principle's applicability to a given scenario (1–5 scale):
| Score | Meaning |
|-------|---------|
| **5** | All optimal conditions present. Principle will produce strong response. Genuine evidence exists to support the trigger. |
| **4** | Most conditions present. Minor element missing but principle still applies with adjustments. |
| **3** | Some conditions present. Principle could be applied but requires setup (e.g., need to CREATE a commitment first). |
| **2** | Conditions partially present but a key element is absent or contradictory. |
| **1** | Conditions absent. Forcing this principle would be artificial or unethical. |
---
## Optimal Activation Conditions Per Principle
### Reciprocity (Optimal Score: 5)
All of these increase reciprocity strength:
- [ ] Gift or favor is UNINVITED (not expected, not contractual)
- [ ] Gift is personalized (not generic)
- [ ] Gift creates a sense of obligation before the request is made
- [ ] The concession asked is framed as smaller than what was given
- [ ] Relationship is ongoing (not one-time)
**Door-in-the-face amplifier:** Make a large request first (which will be refused), then retreat to the actual request. The concession feels earned by reciprocity. Works even with strangers.
**Reciprocity strength floor:** Reciprocity has been shown to override personal dislike. Even when the recipient dislikes the giver, reciprocity obligation persists (Regan Coke study). This makes it the most force-independent principle.
---
### Commitment / Consistency (Optimal Score: 5)
Commitment quality is determined by four factors:
- [ ] **Active:** Commitment was made verbally or behaviorally (not just thought)
- [ ] **Public:** Commitment was made in front of others (social accountability)
- [ ] **Effortful:** Commitment required effort, cost, or sacrifice to make
- [ ] **Inner choice:** Commitment felt freely chosen (not coerced)
Any commitment that scores 3–4 of these factors is powerful. All 4 factors = very powerful.
**Escalation path:** Small commitments create a foundation for larger ones (foot-in-the-door). Never skip the small-commitment stage in a multi-step funnel.
**Lowball mechanism:** Gain commitment to an offer, then change the offer's terms. The prior commitment persists even after terms change (buyers honor verbal agreements after price increases).
---
### Social Proof (Optimal Score: 5)
Two-factor amplifier model:
- [ ] **Uncertainty factor:** Audience is uncertain what the correct action is (new product, unfamiliar situation, high-stakes decision)
- [ ] **Similarity factor:** Social proof comes from people like the audience (same demographic, problem, situation)
Both factors must be present for maximum effect. Social proof from dissimilar others has weak influence. Social proof in low-uncertainty situations is redundant.
**Werther effect warning:** Social proof can amplify negative behaviors as well as positive ones. Publicizing suicide statistics increases copycat suicides. Publicizing non-compliance ("most visitors don't donate") decreases compliance. Never use negative social proof.
---
### Liking (Optimal Score: 5)
Five liking factors (any combination strengthens compliance):
- [ ] **Physical attractiveness:** Halo effect — attractive people seen as more capable, honest, kind
- [ ] **Similarity:** Shared background, interests, opinions, or identity
- [ ] **Familiarity:** Repeated exposure to a person or brand builds liking (mere-exposure effect)
- [ ] **Association:** Being associated with positive events, people, or symbols increases liking
- [ ] **Compliments:** Genuine flattery increases liking (even when recipients know it's strategic)
**Tupperware stacking:** All 5 liking factors are typically present simultaneously in referral sales contexts (friend-sold products). The social relationship provides familiarity, similarity, and association simultaneously.
---
### Authority (Optimal Score: 5)
Three authority signal types:
- [ ] **Titles:** Professional credentials, degrees, certifications, job titles
- [ ] **Clothing/appearance:** Uniforms, formal dress, lab coats, professional appearance
- [ ] **Trappings:** Office environment, symbols of status (luxury car, prestigious address)
**Nurse/phone study:** 95% of nurses complied with a phone call from someone claiming to be a doctor — even when the requested drug dosage was dangerous. Authority alone overrode professional judgment.
**Defense test (2-question protocol):**
1. Is this person actually an expert? (Verify credentials)
2. Can this expert be trusted to be honest here? (Check for conflicts of interest)
---
### Scarcity (Optimal Score: 5)
Three-level amplifier hierarchy:
1. **Baseline scarcity:** Limited quantity or time ("Only 3 left")
2. **Newly scarce:** Item was previously available but is now restricted — more powerful than always-scarce ("Was widely available; now limited")
3. **Competition + exclusive information:** Others also want it AND you have inside knowledge about the scarcity ("Limited release, not widely announced")
**Double whammy formula:** Newly scarce + competitive demand = maximum reactance. Cookie study: 10 cookies → 2 cookies felt tastier. When other people were competing for the 2 cookies: even tastier.
---
## Cross-Principle Interaction Map
```
Reciprocity
└── overrides liking (Regan study: obligation survives dislike)
└── amplifies door-in-the-face when contrast principle is used first
Commitment/Consistency
└── stacks with Scarcity (toy tactic: commit before holiday, scarcity after)
└── enables escalation to larger requests (foot-in-the-door)
Social Proof
└── amplifies ALL principles under uncertainty
└── requires similarity match to work at full strength
Liking
└── is overridden by Reciprocity (see Regan)
└── amplifies Commitment (public commitments are stronger with liked audiences)
└── is the core mechanism in Tupperware-style stacking
Authority
└── suppresses skepticism, reducing resistance to Scarcity and Reciprocity
└── combines with Commitment (authority endorses the commitment = amplified)
Scarcity
└── stacks with Commitment (pre-holiday toy tactic)
└── amplified by Social Proof (competition for scarce item = double whammy)
└── is most powerful when newly scarce (not always scarce)
Contrast Principle (not one of the 6, but an amplifier)
└── frames Scarcity (small remaining quantity after large original = bigger contrast)
└── enables Reciprocity via door-in-the-face (large request → concession feels bigger)
└── enables Good Cop / Bad Cop (authority + contrast + reciprocity + liking simultaneously)
```
---
## Scenario-to-Principle Routing Guide
| Scenario Type | Primary Principle | Secondary Principle | Notes |
|--------------|------------------|--------------------|-|
| Cold outreach (email, direct mail) | Reciprocity | Liking | Give value first. Personalize. |
| Landing page / conversion page | Social Proof | Authority | Testimonials + credentials |
| Launch campaign | Scarcity | Social Proof | Limited time + early adopter proof |
| Multi-step sales funnel | Commitment | Reciprocity | Small ask → escalation |
| Referral / word-of-mouth | Liking | Social Proof | Friends selling to friends |
| Technical product sale | Authority | Social Proof | Expert validation + peer use |
| Negotiation | Reciprocity | Contrast | Concessions + anchoring |
| Retention / win-back | Commitment | Scarcity | Past investment + limited window |
| Onboarding | Commitment | Liking | Early commitment points + relationship |
| Crowdfunding | Social Proof | Scarcity | Momentum + deadline |
---
## Ethical Boundary Checklist
Before applying any principle, verify:
- [ ] **Real trigger:** The trigger feature is real, not manufactured (genuine scarcity, real credentials, authentic testimonials)
- [ ] **Proportionate:** The compliance request is proportionate to the trigger used
- [ ] **Reversible:** If the trigger turns out to be false, you're prepared to reverse or refund
- [ ] **Audience benefit:** The audience is better off complying (not just the practitioner)
**Classification test:**
- If all 4 boxes checked → Fair practitioner. Proceed.
- If trigger is manufactured (counterfeit, falsified, misrepresented) → Exploitative. Do not proceed.
---
## Principle Interaction Case Studies
### Tupperware Party (all 6 simultaneous)
| Principle | How deployed |
|-----------|-------------|
| Reciprocity | Hostess gifts given before sales pitch |
| Commitment | Public product testimonials from attendees |
| Social Proof | Other guests buying during the party |
| Liking | Buyer knows and likes the seller (friend) |
| Authority | Product demos demonstrate expertise |
| Scarcity | Party-only special offers |
### Good Cop / Bad Cop (contrast + reciprocity + liking)
1. Bad Cop establishes hostile anchor (contrast baseline)
2. Good Cop appears sympathetic by contrast (liking through contrast)
3. Good Cop offers concession from Bad Cop's position (reciprocity via door-in-the-face)
4. Subject reciprocates the "reasonable" offer
### Christmas Toy Tactic (commitment + scarcity)
1. Ads create desire commitment in children (and parental promise)
2. Supply shortage at Christmas (scarcity forces missed purchase)
3. Re-stocking after Christmas captures committed parents again
4. Net result: two peak sales periods from one commitment
Detect and counter manipulation attempts using Cialdini's 6 influence principles. Use when you feel pressured to comply with a request, sense a sales tactic...
---
name: influence-defense-analyzer
description: |
Detect and counter manipulation attempts using Cialdini's 6 influence principles. Use when you feel pressured to comply with a request, sense a sales tactic at work, want to audit a document for manipulation, or ask "is this legitimate or am I being played?" Also use for: analyzing a sales pitch, marketing email, negotiation transcript, or contract for exploitative influence tactics; identifying which compliance trigger is being activated and whether it's real or manufactured; deciding whether to comply with a request you feel uneasy about; auditing your own persuasive content for ethical compliance; training yourself to recognize manipulation in consumer, negotiation, or organizational contexts. Applies all 6 per-principle defense protocols (reciprocity, commitment/consistency, social proof, liking, authority, scarcity) plus the epilogue meta-framework to classify practitioners as fair (real evidence) or exploitative (manufactured triggers) and prescribe a principle-specific response strategy. Works on document sets — sales pitches, marketing claims, negotiation transcripts, contracts, advertising — as well as live compliance scenarios described in text.
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/influence-psychology-of-persuasion/skills/influence-defense-analyzer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on:
- influence-principle-selector
source-books:
- id: influence-psychology-of-persuasion
title: "Influence: The Psychology of Persuasion"
authors: ["Robert B. Cialdini"]
chapters: [2, 3, 4, 5, 6, 7, "Epilogue"]
tags: [persuasion, defense, protect, manipulation, detect, recognize, resist, compliance, persuasion-tactics, sales-pressure, say-no, ethical-boundary, exploitation, reciprocity, commitment, social-proof, liking, authority, scarcity, cialdini, consumer-protection, negotiation-defense]
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: "The compliance situation to analyze — a sales pitch, marketing email, negotiation transcript, advertisement, contract excerpt, or a plain-text description of the situation you are in"
tools-required: [Read, Write]
tools-optional: [Grep]
mcps-required: []
environment: "Any agent environment. Works with pasted text or document files."
discovery:
goal: "Identify which influence principle(s) are active in a compliance situation, classify them as legitimate or exploitative, apply the principle-specific defense protocol, and produce an actionable response strategy"
tasks:
- "Detect which of the 6 principles are being activated in the situation"
- "Classify each activation as legitimate (real evidence) or exploitative (manufactured/falsified triggers)"
- "Apply the per-principle defense protocol with its specific diagnostic questions and response steps"
- "Produce a response strategy calibrated to legitimate vs exploitative classification"
- "Identify vulnerability conditions (rushed, stressed, uncertain, distracted, fatigued) that heighten risk"
audience:
roles: ["consumer", "negotiator", "employee", "citizen", "procurement-officer", "investor", "donor", "anyone in a compliance situation"]
experience: "any — no psychology background required"
triggers:
- "User feels pressured to comply and wants to understand whether the pressure is legitimate"
- "User is reviewing a sales pitch, marketing message, or contract and suspects manipulation"
- "User wants to audit a document for exploitative influence tactics"
- "User has already complied and wants to understand what happened and why"
- "User wants to train themselves to recognize manipulation patterns across all 6 principles"
not_for:
- "Applying influence principles to persuade others — use influence-principle-selector or the dedicated principle skills"
- "Auditing your own content for ethical compliance — use persuasion-content-auditor"
- "Deep understanding of how a single principle works offensively — use the dedicated principle skill"
---
# Influence Defense Analyzer
## When to Use
You are in a compliance situation — someone is trying to get you to say yes — and you want to know whether the pressure is legitimate or exploitative, and how to respond.
This skill covers all 6 compliance principles from the defensive side. It is the counterpart to `influence-principle-selector`, which helps you apply influence; this skill helps you resist it.
**Do not use this skill if:** You want to apply influence tactics yourself. Use `influence-principle-selector` or the dedicated principle skills for that.
---
## Context and Input Gathering
Before running the defense analysis, collect:
### Required
- **The situation:** What is being asked of you? What happened before the request?
- **The document or pitch:** If analyzing written material, provide the text.
### Important
- **Timeline and pressure cues:** Is there a deadline? A countdown? A stated limited supply?
- **The relationship:** Do you know this person or organization? How did the relationship start?
- **Your current state:** Are you rushed, stressed, uncertain, or fatigued? (These conditions increase vulnerability — see Step 1.)
### Optional
- **Prior interactions:** Was there an initial gift, favor, or concession before the request?
- **What you already committed to:** Any prior statements, signatures, or small actions taken?
If the situation is not yet described, ask for it before proceeding.
---
## Process
### Step 1: Assess Vulnerability Conditions
**Action:** Check whether any of these conditions are present: rushed, stressed, uncertain, distracted, fatigued, facing a deadline.
**WHY:** These are the precise conditions under which cognitive processing collapses to single-trigger shortcut responses. Compliance practitioners — both legitimate and exploitative — deploy their tactics knowing that these conditions make their triggers maximally effective. A rushed buyer, a stressed negotiator, a fatigued employee: all are more likely to comply automatically. Recognizing that you are in a vulnerable state is the first defensive move — it activates deliberate override of the automatic response.
**Output:** Flag any present vulnerability conditions. If multiple are present, note that the risk of automatic compliance is elevated and that extra deliberation is warranted.
---
### Step 2: Identify Which Principles Are Active
**Action:** Scan the situation for trigger features associated with each of the 6 principles. For each principle detected, note the specific trigger element present.
**WHY:** Each principle activates via a single trigger feature. Identifying the trigger — not just naming the principle — is what enables a targeted defense. "They are using scarcity" is less actionable than "They introduced a deadline after I showed interest, which activates scarcity; I need to check whether the deadline is real." Multiple principles may be active simultaneously, which amplifies compliance pressure; stacked principles require recognizing each one separately.
**Principle trigger checklist:**
| Principle | What to look for in the situation |
|-----------|-----------------------------------|
| Reciprocity | An initial gift, favor, concession, or "free" offer before the main request |
| Commitment/Consistency | A prior statement, action, or small agreement being referenced to justify the current request |
| Social Proof | Claims about what others are doing, buying, or believing — testimonials, numbers, crowd behavior |
| Liking | Unusual warmth, flattery, shared interests, or friendliness from the requester |
| Authority | Credentials, titles, uniforms, trappings of expertise, or claims of superior knowledge |
| Scarcity | Deadlines, limited supply, competition for the item, or phrases like "act now" or "only X left" |
Note every principle trigger present. Do not stop at the first one found.
---
### Step 3: Classify Each Trigger — Legitimate or Exploitative
**Action:** For each active principle, apply the classification test: Is the trigger evidence real or manufactured?
**WHY:** This is the single most important step. The 6 principles work because they are reliable signals — social proof, authority, scarcity, and the rest normally do indicate a correct decision. Fair practitioners present real evidence (genuine scarcity, authentic testimonials, actual credentials) and help you make a correct decision efficiently. Exploitative practitioners manufacture fake triggers (fake deadlines, paid testimonials presented as organic, counterfeit authority symbols) to fire the compliance response when the evidence does not support it. Distinguishing these two is the core of defense: cooperation is appropriate for fair practitioners; counter-aggression is warranted against exploitative ones.
**Classification rule:**
- Trigger evidence is real and verifiable → **Fair practitioner.** Principle is being used legitimately.
- Trigger evidence is manufactured, falsified, or cannot be verified → **Exploitative practitioner.** Treat the trigger as void.
---
### Step 4: Apply the Per-Principle Defense Protocol
**Action:** For each identified principle, apply the specific defense protocol. See `references/defense-protocols.md` for the complete per-principle quick-reference.
**WHY:** Each principle exploits a different automatic response, and each requires a different defense move. A generic "be skeptical" instruction does not work because each principle operates through a distinct mechanism. The stomach-signal approach that defeats commitment/consistency does nothing against manufactured social proof. The per-principle protocols give you the exact diagnostic question and response step for each trigger type.
#### Reciprocity Defense
The trigger is a prior gift, favor, or concession. The mechanism is obligation — a deep social rule that says favors are to be met with favors.
**Detection question:** Was this initial offer genuinely given, or was it a sales device designed to create obligation?
**Response steps:**
1. Accept the initial offer for what it fundamentally is — not for what it was presented as.
2. If you determine it was a sales device (the "favor" was the opener of a pitch, not a genuine gift), mentally redefine it as a compliance tactic. A favor rightly follows a favor — not a sales strategy.
3. Once redefined, the reciprocity obligation is void. You may decline the request freely without the pull of obligation.
**Key principle:** Redefining the initial "gift" as a sales device is not cynicism — it is accurate perception. The reciprocity rule creates obligation between people who give genuine favors to each other. It does not require that tricks be met with favors.
For deeper understanding of how reciprocity is applied offensively, see `reciprocity-strategy-designer`.
---
#### Commitment and Consistency Defense
The trigger is a prior statement, action, or agreement being used to pressure continued compliance.
**Detection signals — register both:**
1. **Stomach signal:** A tight feeling in the gut when you realize you are being pushed to comply with something you do not actually want to do. This is the clearest early warning that the consistency mechanism is being exploited.
2. **Heart signal:** A sense, when you examine your true feelings, that the reasons you have been giving yourself for staying committed do not match your actual preferences.
**Diagnostic question:** "Knowing what I know now, if I could go back in time, would I make the same commitment?"
**Response steps:**
1. Listen to the stomach signal. It registers when automatic consistency is pulling you toward an unwanted action. Do not dismiss it.
2. If uncertain, ask the diagnostic question and attend to the first feeling that surfaces — before rationalization engages.
3. If the answer is no, you have detected foolish consistency. Wise consistency (aligned with your true values and current knowledge) is worth maintaining. Foolish consistency (continuing a course because you started it, not because it is correct) is worth breaking.
4. Name what is happening. Stating explicitly that you recognize the commitment-as-trap frequently disrupts the exploit.
**Key principle:** Distinguishing wise consistency from foolish consistency is the entire defense. Automatic, unthinking consistency is the vulnerability; deliberate, knowledge-updated reconsideration is the protection.
For deeper understanding of how commitment is applied offensively, see `commitment-escalation-architect`.
---
#### Social Proof Defense
The trigger is evidence of what others are doing or believing, presented to suggest you should do the same.
**Two failure modes — identify which is present:**
1. **Falsified evidence:** Social proof that was manufactured by exploiters — canned laughter, paid actors posing as ordinary customers, staged crowds, fake testimonials. The evidence was created to create the impression of popularity.
2. **Incorrect data:** Genuine social evidence that is nonetheless misleading — pluralistic ignorance, where everyone privately doubts but publicly complies, creating a false picture of consensus. The evidence is real but inaccurate.
**Detection question:** Is this social evidence genuinely emergent (arising organically from independent behavior) or manufactured (staged, paid, or the result of a collective false signal)?
**Response steps:**
1. Check whether the social evidence source can be verified independently. Can you identify who these people are? Did they choose to endorse independently, or were they paid or prompted?
2. For online reviews, testimonials, and crowd behavior: look for the structural hallmarks of manufactured evidence — uniform tone, absence of specific detail, timing patterns, incentivized submission.
3. If evidence appears falsified: treat the trigger as void and proceed as if the social proof data point did not exist.
4. If evidence appears genuine but potentially driven by pluralistic ignorance: seek independent information sources to verify that the consensus reflects actual experience, not collective uncertainty.
For deeper understanding of how social proof is applied offensively, see `social-proof-optimizer`.
---
#### Liking Defense
The trigger is the practitioner's personal appeal — attractiveness, flattery, similarity, familiarity, association with pleasant things — creating a sense of warmth toward them.
**Universal detection criterion:**
"Have I come to like this person more than I would have expected given the circumstances and the amount of time we have spent together?"
**Response steps:**
1. Do not attempt to prevent liking from happening. Liking operates unconsciously through attractiveness, familiarity, and association, and cannot be reliably blocked in advance.
2. Instead, redirect vigilance to the effect, not the cause. The signal to act on is the feeling of unexpected or disproportionate liking — not the specific tactic that produced it.
3. When you detect unexpected liking: consciously separate the person from the proposal. The question is not whether you like the practitioner but whether the deal itself is good.
4. Do not actively dislike the practitioner as a counter-move. Some people are genuinely likable, and an unfair negative reaction is both unjust and harmful to you. Simply bracket the liking and evaluate the offer on its merits.
**Key principle:** The defense is not "stop liking people." It is "do not let liking substitute for evaluating the deal." Maintain the distinction between the requester and the request.
For deeper understanding of how liking is applied offensively, see `liking-factor-engineer`.
---
#### Authority Defense
The trigger is the appearance of expertise or legitimate authority — credentials, titles, uniforms, specialized knowledge — creating automatic deference.
**Two-question sequence — apply in order:**
**Question 1:** "Is this authority truly an expert?" Check credentials against the domain at hand. A medical doctor recommending a financial product is an authority in medicine, not finance. A celebrity endorsing a supplement has no relevant expertise. Distinguish between the label "authority" and actual domain-relevant knowledge.
**Question 2:** "How truthful can we expect this authority to be?" Even genuine experts may not present information honestly if they have conflicts of interest. Ask: What does this authority gain from my compliance? Does their incentive align with giving me accurate information, or with getting me to say yes?
**Red flag — strategic self-deprecation:** When an authority figure mentions a minor flaw or limitation before making their main claim, they may be using a credibility-building tactic (establishing honesty on a small point to be believed on a large one). Recognize this pattern: the conceded flaw is always minor and easily overcome; the claim it sets up is the one they want you to accept.
**Response steps:**
1. Identify the claimed authority and verify domain relevance (Question 1).
2. Check for conflicts of interest (Question 2).
3. If both tests pass, the authority input is trustworthy and may be weighted accordingly.
4. If either test fails, treat the authority claim as an unverified signal and seek independent verification before complying.
For deeper understanding of how authority is applied offensively, see `authority-signal-designer`.
---
#### Scarcity Defense
The trigger is limitation — a deadline, limited quantity, or competition for the item — creating urgency and desire.
**Two-stage response — execute in sequence:**
**Stage 1 — Recognize arousal as a warning signal, not a decision driver.**
When you feel a rising sense of urgency, the desire to act before missing out, or competitive agitation when others want the same thing: recognize that feeling as a warning signal, not a guide to action. The physiological arousal produced by scarcity suppresses deliberate analysis. Stop before acting.
**Stage 2 — Ask the possession vs. utility question.**
Once you have paused: "Do I want this item for its utility (to use it, eat it, drive it, deploy it) or for its possession value (to own something rare)?"
- If utility: remember that scarce items do not perform better because they are scarce. A rare product delivers the same function as an abundant one. The scarcity adds no value to what you will actually experience from it.
- If possession value: scarcity is a legitimate signal. The item's rarity is genuinely part of what you value about it.
**Key principle:** The cookies in the scarcity experiment were rated as more desirable — but not as better-tasting — when scarce. Scarcity affects perceived value, not actual utility. This distinction, held clearly in mind during the arousal state, dissolves most of scarcity's power over non-collectors.
For deeper understanding of how scarcity is applied offensively, see `scarcity-framing-strategist`.
---
### Step 5: Classify the Practitioner and Formulate the Response
**Action:** Based on the classification in Step 3, determine the appropriate response category and formulate the specific response.
**WHY:** The appropriate response to a fair practitioner differs fundamentally from the response to an exploitative one. Cooperating with fair practitioners is not weakness — they are offering real value that a shortcut correctly identifies. Counter-aggression against exploitative practitioners is not hostility — it is an appropriate response to deliberate fraud that corrupts the cognitive shortcuts everyone depends on.
**Response framework:**
| Classification | Response | Rationale |
|----------------|----------|-----------|
| Fair practitioner (real triggers) | Cooperate. Use the shortcut as intended. | Real scarcity, genuine social proof, actual authority — these are valuable signals. Complying is efficient and correct. |
| Exploitative practitioner (manufactured triggers) | Reject the trigger. Name the tactic. Withdraw compliance. | Manufactured triggers corrupt the shortcut. Counter-aggression is warranted — boycott, challenge, refusal, or public naming of the tactic. |
| Uncertain (cannot verify) | Pause. Seek independent verification before deciding. | Do not comply under time pressure when trigger legitimacy cannot be verified. Request time to check. |
---
## Inputs / Outputs
### Inputs
- Compliance situation description or document (required)
- Description of what was asked and what preceded the request (required)
- Any prior interactions, gifts, or commitments (optional)
### Outputs
- Vulnerability condition flags
- Active principle identification with trigger elements named
- Fair vs. exploitative classification per principle
- Per-principle defense protocol application
- Response strategy (cooperate, counter, or pause-and-verify)
---
## Key Principles
**The real opponent is the rule, not the requester.** When someone uses a compliance principle against you, they are deploying a social rule — reciprocity, consistency, authority — that has genuine force. The person is a jujitsu warrior who has aligned themselves with that force. Defusing the exploit means defusing the rule's energy, not attacking the person.
**Fair practitioners are allies.** Compliance professionals who present real trigger evidence are helping you use a reliable shortcut correctly. They are not the target of counter-aggression. The appropriate response is cooperation. Counter-aggression is warranted only when triggers are manufactured or falsified.
**Vulnerability conditions multiply risk.** Being rushed, stressed, uncertain, distracted, or fatigued suppresses the deliberate processing that defenses depend on. These states are exactly when compliance is most automatic and when manipulators most prefer to push. Recognizing your own vulnerability state is the first defense.
**Detect the trigger, not just the principle.** "They used scarcity" is insufficient. The operative question is: what is the specific trigger element, and is it real? Finding the trigger element directs you to the right defense protocol.
**Liking works unconsciously; defend at the effect, not the cause.** Unlike authority symbols or false testimonials — which can be checked — liking often operates below awareness through physical attractiveness and association. You cannot reliably block it. Defend by noticing the effect: unexpected, disproportionate warmth toward the requester.
**The commitment defense is internal.** Stomach signal (gut discomfort when trapped) and heart signal (recognizing stated reasons don't match true feelings) are the only reliable detectors of commitment exploitation. External analysis alone is insufficient. The diagnostic question "Would I make this same commitment knowing what I know now?" must be answered at the feeling level, before rationalization runs.
---
## Examples
### Example 1: Sales Pitch with Multiple Active Principles
**Scenario:** A software vendor sends a free consultation report analyzing your company's "inefficiencies" (unsolicited), followed by a pitch deck. The pitch includes logos of 40 named customers, a quote from a Gartner analyst, and an "end-of-quarter pricing" that expires Friday.
**Trigger:** "Should I take this deal? The analyst quote seems credible and a lot of companies I recognize are using them."
**Process:**
- Step 1: Deadline (Friday) creates time pressure — vulnerability condition flagged.
- Step 2: Reciprocity (free consultation = initial gift), Social Proof (40 customer logos + testimonials), Authority (Gartner analyst), Scarcity (end-of-quarter pricing expiry). Four principles active simultaneously.
- Step 3: Reciprocity — was the consultation genuinely valuable, or designed to create obligation? Check: did they tailor it to your specific situation or was it templated? If templated, redefine as a sales device, not a gift. Social proof — are the 40 logos verifiable reference customers or logos placed without permission? Can you call two of them? Authority — is the Gartner analyst quote from a paid research relationship or an independent analysis? Scarcity — is end-of-quarter pricing real (sales team quota pressure) or a manufactured false deadline?
- Step 4: Apply per-principle protocols for each.
- Step 5: If consultation is genuine AND logos are verifiable AND analyst quote is independent AND pricing expiry is real → fair practitioner; engage on merits. If any trigger is manufactured → classify as exploitative; request extended timeline and verify independently before committing.
**Output:**
```
Active principles: Reciprocity (free report), Social Proof (logos), Authority (analyst), Scarcity (deadline)
Vulnerability: Friday deadline = time pressure — elevated risk of automatic compliance
Reciprocity: Pending classification — verify if report is templated or tailored
Social Proof: Pending — call 2 reference customers before Friday
Authority: Pending — verify Gartner relationship is independent, not paid
Scarcity: Pending — ask directly whether pricing can be extended; test the deadline's reality
Response: Pause-and-verify. Request 2-week extension. Any refusal to extend signals manufactured scarcity.
```
---
### Example 2: Commitment Trap in a Negotiation
**Scenario:** You are in a procurement negotiation. Three months ago you signed a letter of intent. The vendor is now presenting terms 30% above the original estimate, citing "scope changes." You feel obligated to continue because of the time invested and the letter you signed.
**Trigger:** "We've put so much into this already. And we did sign the letter of intent. I feel like we have to see this through."
**Process:**
- Step 1: Sunk cost framing and prior commitment — stomach signal check warranted.
- Step 2: Commitment/consistency is the primary active principle. The letter of intent + 3 months of time = effortful prior commitment being used as consistency anchor.
- Step 3: Is the commitment legitimate? The letter of intent was real — but the terms presented now differ substantially from those anticipated when it was signed.
- Step 4: Apply commitment defense. Diagnostic question: "Knowing what I know now about the final pricing, would I have signed the letter of intent on these terms?" Register the first honest answer. If no, this is foolish consistency — the commitment to the original terms does not extend to accepting 30% scope creep without renegotiation.
- Step 5: Fair practitioner if scope genuinely changed and documentation supports it. Exploitative if scope was understated deliberately to lock in the commitment first.
**Output:**
```
Active principle: Commitment/Consistency
Stomach signal: Yes — discomfort about the gap between original and current terms
Diagnostic question answer: No — would not have signed at current terms
Classification: Pending — request scope change documentation
Defense: The commitment was to the original terms, not to any terms the vendor chooses to present after the letter is signed. Renegotiate or withdraw.
```
---
### Example 3: Consumer Scarcity + Social Proof Stack
**Scenario:** You are shopping for a hotel for a family trip. The booking site shows "Only 2 rooms left at this price!" and "17 people are looking at this right now."
**Trigger:** "I need to book now before it's gone."
**Process:**
- Step 1: Rushing to book = time pressure induced by the display — vulnerability condition present.
- Step 2: Scarcity (2 rooms, price expiry implied) + Social Proof (17 people looking) — two principles stacked.
- Step 3: Both are frequently manufactured on booking platforms. "17 people looking" figures are often fabricated or algorithmically inflated. "2 rooms left" may reflect inventory management, not genuine scarcity.
- Step 4: Scarcity defense — Stage 1: recognize arousal (urgency to book) as warning signal, pause. Stage 2: utility question — do I want this hotel to stay in it, or to "win" it? Answer: utility. Scarce hotel rooms provide the same night's sleep as abundant ones. Social proof defense — verify: open the hotel's own site and check availability; search elsewhere. If availability is identical, the scarcity was manufactured.
- Step 5: Exploitative if room availability shows the same rooms on direct booking. Counter: book directly or check an alternative platform. Do not reward manufactured urgency with a booking.
**Output:**
```
Active principles: Scarcity ("2 rooms left"), Social Proof ("17 looking")
Vulnerability: Urgency to book — elevated risk
Classification: Likely exploitative — verify via direct hotel site
Defense: Check direct. If available there, the platform manufactured urgency. Book direct or on a platform without the pressure display.
```
---
## References
| File | Contents |
|------|----------|
| `references/defense-protocols.md` | Per-principle defense quick-reference: detection criteria, diagnostic questions, response steps, and classification tests for all 6 principles |
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Influence: The Psychology of Persuasion by Robert B. Cialdini.
## Related BookForge Skills
Install related skills from ClawhHub:
- `clawhub install bookforge-influence-principle-selector`
Or install the full book set from GitHub: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/defense-protocols.md
# Per-Principle Defense Protocols — Quick Reference
Use this reference alongside the `influence-defense-analyzer` skill. Each section provides the compact detection criteria, diagnostic question, response steps, and classification test for one principle.
---
## 1. Reciprocity Defense
**What to detect:** An initial gift, favor, service, or concession delivered before the main compliance request.
**Core insight:** The reciprocity rule creates genuine obligation — but only for genuine favors. A sales device masquerading as a favor does not activate the rule's moral force.
**Detection question:**
> "Was this initial offer genuinely given with no expectation of return, or was it the opening move of a compliance sequence?"
**Indicators of a sales device (not a genuine favor):**
- The "gift" was accompanied by or quickly followed by a pitch
- The value of the gift is proportional to the value of the compliance being sought
- The favor was unsolicited and arrived from a stranger
- The provider had no prior relationship that would motivate genuine generosity
**Response steps:**
1. Accept the initial offer for what it genuinely is (a favor, if genuine; a tactic, if not).
2. If it was a compliance device: mentally redefine it. It is a sales strategy, not a favor.
3. After redefinition: the reciprocity obligation is void. Decline freely.
4. If it was genuinely given: reciprocate appropriately in the future — but not by complying with an unrelated, unwanted request now.
**Classification test:**
- Real favor → Fair practitioner → Cooperate with the reciprocity exchange; it is a legitimate social process.
- Sales device → Exploitative practitioner → Obligation is void; decline or counter-exploit.
**Source:** Ch. 2, pp. 38–40
---
## 2. Commitment and Consistency Defense
**What to detect:** A reference to your prior statements, actions, or agreements being used to pressure current compliance ("But you said…", "You already agreed to…", "After everything we've done together…").
**Core insight:** Consistency is generally valuable. The target of defense is *foolish* consistency — automatic, unthinking continuation of a course that no longer serves you. Wise consistency (updated to current knowledge) is worth keeping.
**Two-signal detection system:**
**Signal 1 — Stomach signal:**
A tightening in the gut when you realize you are being pushed to do something you do not want to do. This is the body's early warning that automatic consistency is being triggered. Cialdini calls this the most reliable first signal. Do not dismiss it.
**Signal 2 — Heart signal:**
A deeper recognition that the reasons you are giving yourself for compliance do not match your actual preferences. This occurs when prior commitments have accumulated supporting rationalizations that paper over the true feeling. Accessible via the diagnostic question.
**Diagnostic question:**
> "Knowing what I know now, if I could go back, would I make the same commitment?"
Attend to the *first feeling* that arises in response — before rationalization engages. The heart signal is a pure, basic feeling that precedes the cognitive apparatus.
**Response steps:**
1. Register the stomach signal if present. Treat it as a genuine alert, not an irrational discomfort.
2. Ask the diagnostic question. Trust the initial flash of feeling.
3. If the answer is no: you have identified foolish consistency. Name it explicitly — either internally to yourself or aloud to the requester. "I recognize that this is using my prior statement to lock me in. Given what I now know, that commitment no longer applies."
4. Distinguish between the commitment content (the original terms) and the consistency pressure (the social/psychological force to stay the course regardless of whether those terms still apply).
**Classification test:**
- Commitment being honored on its original terms → Fair. Consistency with what you genuinely agreed to is appropriate.
- Commitment being stretched beyond original scope OR used to trap you into something you would not have chosen → Exploitative. Foolish consistency is a vulnerability, not a virtue.
**Source:** Ch. 3, pp. 79–84
---
## 3. Social Proof Defense
**What to detect:** Claims or displays about what other people are doing, buying, or believing — testimonials, customer counts, crowd behavior, "most popular" labels, peer endorsements.
**Core insight:** Social proof is normally a reliable shortcut. The defense is not to reject it wholesale, but to distinguish between genuine emergent consensus (reliable) and manufactured consensus (fake data feeding a real mechanism).
**Two failure modes:**
**Mode 1 — Falsified evidence (exploitative):**
Social proof manufactured by exploiters to create the *impression* of popularity. Examples: canned laughter, paid actors in "unrehearsed interview" commercials, hired applause, staged crowds, fake reviews, testimonials from actors posing as customers.
Detection: The evidence was constructed rather than arising from independent behavior. Look for: uniform tone, absence of identifying specifics, incentivized sourcing, patterns inconsistent with genuine independent opinion.
**Mode 2 — Incorrect data (potentially misleading even when genuine):**
Real social behavior that nonetheless gives inaccurate guidance — typically pluralistic ignorance, where each individual privately doubts but publicly conforms, and the apparent consensus reflects collective uncertainty rather than genuine endorsement.
Detection: Everyone is going along but no one seems to have independently evaluated the claim. "Everyone is doing it" but no one can explain why.
**Detection question:**
> "Is this social evidence genuinely emergent — arising from independent decisions by independent people — or was it manufactured to create the impression of consensus?"
**Response steps:**
1. Identify the source of the social evidence. Can the testimonials be verified independently? Are review sources incentivized or organic?
2. Check for the hallmarks of manufactured evidence: paid placement, actor-sourced testimonials, algorithmically inflated activity figures.
3. For pluralistic ignorance scenarios: seek at least one independent, uncoordinated data point that reflects genuine assessment (not herd following).
4. If evidence is genuinely falsified: treat the social proof trigger as void. Consider active counter-measures: refuse to engage with the manipulative content, note the falsification publicly if appropriate.
**Classification test:**
- Genuine organic social evidence → Fair. Use it as the reliable shortcut it normally is.
- Manufactured or falsified evidence → Exploitative. Trigger is void; counter-aggression is warranted.
**Source:** Ch. 4, pp. 117–120
---
## 4. Liking Defense
**What to detect:** Unexpected or disproportionate personal warmth toward a compliance practitioner — the feeling that you like this person more than the circumstances would normally warrant.
**Core insight:** Liking operates largely unconsciously through physical attractiveness, familiarity, and association. It cannot be reliably blocked in advance. The defense is not to prevent liking but to catch its effect and bracket it from the compliance decision.
**Universal detection criterion (the single question):**
> "Have I come to like this practitioner more than I would have expected given the circumstances and the time we have spent together?"
If the answer is yes, the liking is potentially working as a compliance trigger — whether the practitioner intended it or not.
**Why this single criterion works:**
Liking can be produced by many specific tactics (flattery, finding shared interests, mirroring, physical attractiveness, providing food, casual humor) — too many to monitor individually, some of which operate below awareness. Monitoring the *effect* (unexpected liking) catches all of them with a single check.
**Response steps:**
1. Let liking proceed without attempting to block it. This is both more effective and more fair.
2. Apply the single detection criterion. Note whether the feeling of warmth is proportional to circumstances.
3. If unexpected liking is detected: consciously separate the person from the deal. Ask: "If someone I did not particularly like were offering me these exact terms, would I accept?"
4. Do not actively dislike the practitioner. Some people are genuinely likable, and punishing them for it is unfair and counterproductive. Simply bracket the warmth and evaluate the offer on its own merits.
5. Base the compliance decision entirely on the proposal's merits — not on how much you like the person presenting it.
**Classification test:**
- The practitioner is simply likable and there are no other exploitative triggers → Not an exploit per se. The liking is incidental. Evaluate the deal.
- The practitioner has systematically engineered warmth through flattery, manufactured similarity, or induced obligation in order to lower your guard → Exploitative use of the liking principle. Bracket the liking explicitly.
**Source:** Ch. 5, pp. 153–155
---
## 5. Authority Defense
**What to detect:** The display of credentials, titles, uniforms, specialized knowledge, or association with prestigious institutions being used to prompt automatic deference to a claim or request.
**Core insight:** Genuine authority is a reliable shortcut — experts usually do know more, and deferring to them is efficient and correct. The vulnerability is in the symbols of authority, which can be faked independently of the substance they represent.
**Two-question defense sequence (apply in order):**
**Question 1: "Is this authority truly an expert?"**
- Check credentials against domain relevance. An expert in one domain is not automatically an authority in adjacent or unrelated domains.
- Distinguish between the label "authority" (title, uniform, trappings) and actual domain-relevant knowledge.
- Ask: Would this person's credentials qualify them to give reliable guidance on *this specific topic*?
**Question 2: "How truthful can we expect this authority to be?"**
- Even genuine experts may not present information honestly if they have conflicts of interest.
- Ask: What does this authority gain from my compliance? Does their incentive structure align with giving me accurate information, or with getting me to say yes?
- Experts who stand to benefit directly from a particular decision should be held to a higher verification standard.
**Red flag — strategic self-deprecation:**
When an authority figure concedes a minor flaw before making their main claim, they may be building credibility on a small point to be more believable on a large one. This is a genuine compliance technique. Recognize the pattern: the admitted limitation is always minor and easily overcome; the advantage being established is the one they want you to act on. This does not make the authority untrustworthy — but it means their self-deprecation should not be taken as proof of objectivity.
**Response steps:**
1. Apply Question 1: verify domain relevance of credentials.
2. Apply Question 2: check for conflicts of interest.
3. If both tests pass: accept the authority input as a trustworthy signal and weight it accordingly.
4. If either test fails: treat the claim as unverified. Seek a second, independent expert opinion before complying.
5. Be alert to authority symbols (titles, uniforms, trappings) used in the absence of corresponding expertise.
**Classification test:**
- Genuine domain-relevant expert with no conflicting incentives → Fair. Defer appropriately.
- Genuine expert with significant conflict of interest → Verify independently before complying.
- Fake or irrelevant credentials → Exploitative. Void the authority signal.
**Source:** Ch. 6, pp. 172–174
---
## 6. Scarcity Defense
**What to detect:** Limitation signals — deadlines, limited quantity, competitive bidding, phrases like "only X left," "offer expires," "others are looking at this now."
**Core insight:** Scarcity increases *perceived value* but does not increase *functional utility*. A scarce item performs the same as an abundant one for any purpose other than possession of something rare. This distinction, held clearly during arousal, dissolves most of scarcity's power for utility-driven decisions.
**Two-stage response (execute in sequence — do not skip Stage 1):**
**Stage 1 — Recognize arousal as a WARNING signal, and pause.**
When you feel urgency rising, competitive agitation, or the pull to act before missing out:
- Name it: "I am experiencing scarcity arousal."
- Treat it as a warning signal, not a go signal. The arousal itself — not the item — is what needs attention first.
- Stop before acting. Panicky, feverish compliance decisions are reliably poor decisions.
**Stage 2 — Ask the possession vs. utility question.**
Once calmed:
> "Do I want this primarily to possess something rare, or primarily to use it for its function?"
- If **possession value**: scarcity is a legitimate signal. The item's rarity is genuinely part of what you value.
- If **utility value**: remember that scarce items do not taste, ride, work, or perform any better than abundant ones. The scarcity adds no value to what you will actually experience. Re-evaluate the item on its functional merits, independent of availability.
**Classification test:**
- Genuine, verifiable scarcity (actual limited stock, real deadline with documented rationale) → Fair. Scarcity is a real signal about availability.
- Manufactured scarcity (false deadline, algorithmically inflated "low stock" claims, perpetually renewing "limited time" offers) → Exploitative. Counter by requesting deadline extensions; refusal to extend confirms the manufacture.
**Verification check for deadlines:**
Ask for a deadline extension. A real deadline (end of quarter, actual expiry of a contract rate) may or may not be extendable, but the response will be reasoned. A manufactured deadline — constructed only to create urgency — will often be quietly extended when pushed, or the refusal to extend will be out of proportion to any legitimate operational constraint.
**Source:** Ch. 7, pp. 199–202
---
## Epilogue Meta-Framework: When to Cooperate vs. Counter-Aggress
**Source:** Epilogue, pp. 206–210
The 6 principles operate as cognitive shortcuts — single-trigger signals that normally counsel correct decisions efficiently. They are not inherently manipulative. The ethical status of their use depends entirely on whether the trigger evidence is real or manufactured.
### Fair Practitioners
Compliance professionals who present real trigger evidence are cooperating partners in an efficient decision-making process. They help you use a reliable shortcut correctly. The appropriate response is cooperation — not resistance.
Examples:
- A vendor with genuine customer testimonials using social proof → Fair.
- A seller with an actual inventory limit using scarcity → Fair.
- An expert with domain-relevant credentials and no conflict of interest using authority → Fair.
### Exploitative Practitioners
Compliance professionals who falsify, counterfeit, or misrepresent trigger evidence are corrupting the shortcuts that everyone depends on. They profit by making a reliable signal unreliable. The appropriate response is counter-aggression.
Examples:
- Fake testimonials from actors → Exploitative.
- Manufactured "act now" deadlines that reset automatically → Exploitative.
- Fake credentials or irrelevant authority symbols → Exploitative.
- Staged crowd behavior to simulate popularity → Exploitative.
### Conditions That Favor Exploitation
Exploitative practitioners prefer to operate when targets are:
- **Rushed** — time pressure suppresses deliberate analysis
- **Stressed** — emotional load reduces cognitive bandwidth
- **Uncertain** — ambiguity makes social proof maximally powerful
- **Distracted** — divided attention prevents trigger detection
- **Fatigued** — mental depletion eliminates resistance
When you are in any of these states, treat compliance requests with elevated suspicion regardless of which principle appears to be active.
### The Principle of Retaliation
Tolerating exploitative use of shortcuts degrades the reliability of those shortcuts for everyone. Each successful manipulation makes the signal less trustworthy in future uses. Counter-aggression — refusing, naming, boycotting, or publicly calling out exploitative tactics — is not pugnacity. It is maintenance of the decision infrastructure that everyone depends on.
Design commitment escalation sequences and detect when consistency pressure is being used against you. Use this skill when planning onboarding sequences, act...
---
name: commitment-escalation-architect
description: Design commitment escalation sequences and detect when consistency pressure is being used against you. Use this skill when planning onboarding sequences, activation flows, user engagement funnels, conversion funnels, habit formation programs, behavioral change campaigns, sales sequences, negotiation strategies, written commitment campaigns, foot-in-the-door campaigns, progressive commitment ladders, lowball tactics, escalation ladders, self-image engineering, inner choice cultivation, consistency-based persuasion, commitment amplification, or defending against manufactured consistency pressure and commitment traps.
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/influence-psychology-of-persuasion/skills/commitment-escalation-architect
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: influence-psychology-of-persuasion
title: "Influence: The Psychology of Persuasion"
authors: ["Robert B. Cialdini"]
chapters: [3]
tags: [persuasion, commitment, consistency, foot-in-the-door, lowball, escalation, onboarding, activation, habit-formation, user-engagement, conversion-funnel, behavioral-change, self-image, written-commitment, inner-choice]
depends-on: []
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: "Scenario, campaign plan, onboarding sequence, or sales funnel — a description of the behavior change or compliance you are trying to produce"
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: "Any agent environment. Document set preferred: campaign plans, onboarding sequences, sales funnels, negotiation briefs."
---
# Commitment Escalation Architect
## When to Use
You are designing a sequence to produce lasting behavior change or compliance, or you are evaluating whether consistency pressure is being exploited against you. Typical triggers:
- Designing an onboarding or activation sequence where users must build a habit
- Building a conversion funnel that moves prospects from low-stakes to high-stakes actions
- Planning a sales sequence where a small initial commitment leads to a larger purchase
- Writing a negotiation strategy that uses progressive agreement-building
- Designing a written commitment campaign (goal-setting, pledges, testimonials)
- Evaluating a situation where you feel pressure to act consistently with something you said or did earlier
- Detecting foot-in-the-door or lowball patterns being used on you
Before starting, identify:
- **Application or Defense mode** (designing a sequence vs. detecting manipulation)
- **What behavior do you want?** (The terminal action — purchase, habit, advocacy, agreement)
- **Who is the target?** (Individual, cohort, customer segment)
- **What small first step can open the sequence?**
---
## Context
### Required Context
- The terminal behavior you want to produce (the end of the commitment ladder)
- The starting point (what the target currently does or believes)
- The relationship stage (cold, warm, existing customer, adversarial)
### Observable Context
Read the provided document or scenario for:
- Existing onboarding steps, sales stages, or campaign touchpoints
- Any commitments already made (prior agreements, purchases, stated positions)
- The gap between current behavior and the desired terminal behavior
### Default Assumptions
- If no mode specified → run Application first, then Defense assessment
- If no starting commitment identified → design the smallest credible first ask
- If no existing sequence → build from scratch using the 6-step escalation template
---
## Process
### Step 1: Map the Commitment Ladder
**ACTION:** Define the terminal behavior and work backward to the smallest possible first step. Write out 3–6 intermediate steps connecting them.
**WHY:** Commitment operates through self-image — each action updates how the target sees themselves, making the next larger action feel consistent with who they now are. The gap between any two adjacent steps must be small enough that the target accepts each one, but the cumulative drift from step 1 to step 6 can be enormous. The Chinese prisoner-of-war escalation sequence demonstrates this precisely: from "The United States is not perfect" (trivially agreeable) to publicly broadcasting anti-American propaganda — all through individually small and apparently reasonable steps.
**Ladder template:**
```
Step 1 → [Trivial ask — almost no one refuses]
Step 2 → [Small but recorded commitment]
Step 3 → [Public or written version of step 2]
Step 4 → [Behavioral investment — some effort required]
Step 5 → [Social expression of the commitment]
Step 6 → [Terminal behavior]
```
---
### Step 2: Score Each Commitment for Amplifier Strength
**ACTION:** For each step in the ladder, score it against the four amplifiers. Record the score. Use this to identify which steps need strengthening.
**WHY:** Not all commitments are equally durable. The four amplifiers determine whether a commitment produces lasting self-image change or merely temporary compliance. Inner choice is the most important — more important than the other three combined. A commitment made under strong external pressure (large reward, strong threat) produces compliance only while the pressure is present. A commitment made with inner choice produces identity change that persists indefinitely.
**Amplifier scoring rubric:**
| Amplifier | 0 (absent) | 1 (partial) | 2 (full) |
|---|---|---|---|
| **Active** | Unspoken agreement | Verbal statement | Written, signed, or personally recorded |
| **Public** | Known only to target | Shared with 1–2 people | Witnessed by a group or published |
| **Effortful** | Requires no effort | Requires some effort or sacrifice | Requires significant effort, cost, or discomfort |
| **Inner choice** | Strong external pressure or reward | Mild incentive | No apparent external pressure; target believes they chose freely |
**Total score per step: 0–8.** Steps scoring below 4 should be redesigned. Inner choice score of 0 disqualifies the step regardless of other scores — external pressure prevents self-image internalization.
**Detailed amplifier mechanics are in:** [references/commitment-amplifiers.md](references/commitment-amplifiers.md)
---
### Step 3: Select the Escalation Technique
**ACTION:** Determine which technique best fits the scenario — foot-in-the-door, lowball, or a full progressive escalation sequence. A single campaign may use more than one.
**WHY:** The techniques have different mechanics and different use cases. Choosing the wrong one wastes the initial commitment.
| Technique | Mechanism | Best for | Key risk |
|---|---|---|---|
| **Foot-in-the-door** | Small commitment → self-image shift → large request compliance | Cold outreach, onboarding, habit formation, advocacy | Second request must be recognizably consistent with first |
| **Lowball** | Favorable offer → decision → self-generated justifications → remove advantage | Sales closing, subscription upgrades, negotiation | Target must make and begin internalizing the decision before advantage is removed |
| **Progressive escalation** | Multi-step ladder building cumulative identity change | Long sales cycles, behavioral change programs, community building | Steps must stay below the refusal threshold |
**Technique comparison detail:** [references/technique-comparison.md](references/technique-comparison.md)
---
### Step 4A: Design Foot-in-the-Door Sequence
*Skip to Step 4B if using lowball. Skip to Step 4C for full progressive escalation.*
**ACTION:** Design the small initial request and the large follow-up request. Apply the four amplifiers to the initial request.
**WHY:** The foot-in-the-door technique works because complying with a small request changes how the target sees themselves — they become the kind of person who does this. When the large request arrives, refusing it would be inconsistent with their new self-image. Freedman and Fraser demonstrated 76% compliance with a large, intrusive request (billboard on front lawn) from homeowners who had agreed to a trivial related request two weeks earlier — versus 17% from those asked directly. The self-image shift, not the prior agreement per se, is the mechanism.
**Design steps:**
1. **Define the large (terminal) request.** This is your anchor — what you actually want.
2. **Design the small initial request.** It must be:
- Small enough that nearly everyone agrees (target ≥80% acceptance)
- Genuinely related to the large request — related on topic, value, or identity dimension
- Designed for inner choice: no strong incentive attached; the target must feel they chose to agree
3. **Wait.** Allow 1–2 weeks between small and large request in sequential campaigns. The self-image shift needs time to consolidate. (Freedman and Fraser used a two-week gap.)
4. **Frame the large request as consistent with who they now are.** "Given that you already [small commitment], it makes sense that you'd want to [large request]."
5. **Amplify the initial commitment.** Ask the target to write it down, share it publicly, or take a visible action — even a small one. A written "Be a Safe Driver" sign produces far more compliance with a large billboard request than a verbal agreement.
---
### Step 4B: Design Lowball Sequence
*Skip if using foot-in-the-door or full progressive escalation.*
**ACTION:** Design the favorable initial offer, the decision point, and the advantage removal.
**WHY:** Lowball works because once a person makes a decision, they immediately begin generating their own reasons to support it. By the time the original incentive is removed, the person has built a self-supporting structure of new justifications. The commitment now stands on multiple legs — removing the original one does not collapse it. Car dealers who offer below-market prices to induce a purchase decision, then "discover" calculation errors before signing, reliably close at the higher price because buyers have already committed mentally and generated supporting reasons.
**Design steps:**
1. **Offer a genuine advantage** that makes the decision easy (price break, added feature, favorable terms, public naming).
2. **Get the decision.** The target must actively agree, not merely express interest. Get them invested: fill out forms, make arrangements, try the product, tell others.
3. **Allow self-justification to build.** Do not rush. The longer the target spends in the committed state, the more legs their decision grows.
4. **Remove the original advantage** with a plausible explanation (calculation error, manager override, policy change). Frame it neutrally — do not pressure.
5. **Observe.** The majority of committed targets will proceed even without the original incentive, because the commitment now stands on its own newly generated supports.
**Critical constraint:** The advantage must be real enough that the target would not have decided without it. A trivial incentive does not produce the decision energy needed for self-justification to take hold.
---
### Step 4C: Design Full Progressive Escalation Sequence
*Use when building a long-term behavioral change program, community, or multi-stage campaign.*
**ACTION:** Execute the full 6-step ladder from Step 1, applying amplifier scoring to each step and ensuring inner choice at every stage.
**WHY:** The Chinese prisoner-of-war indoctrination program is the best-documented case of full progressive escalation. It moved American soldiers from name-rank-serial-number compliance to voluntary collaboration — not through coercion, but through carefully graded small commitments, each building on the last. The program's success was specifically attributed to the absence of strong external pressure: small prizes for essay contests, not large rewards, so that participants took inner responsibility for what they wrote.
**6-step sequence template:**
| Step | Type | Example (POW program) | Example (product onboarding) |
|---|---|---|---|
| 1 | Trivial verbal agreement | "The US is not perfect" | "Yes, I want to build better habits" |
| 2 | Written private statement | Written list of "problems with America" | Fill out a personal goal form |
| 3 | Written signed statement | Sign the list | Sign up for a 7-day challenge |
| 4 | Public reading or sharing | Read list in discussion group | Share goal with an accountability partner |
| 5 | Extended public expression | Write essay expanding on list | Publish a progress update in community |
| 6 | Terminal behavior | Broadcast on radio; "collaborator" identity | Full product activation; advocate identity |
**At each step:**
- Keep the outer pressure minimal (small prize, gentle nudge — not coercion)
- Let the target attribute the action to their own values ("It's what I really believe")
- Use each step as the foundation for the next ("Since you wrote X, it makes sense to now Y")
---
### Step 5: Verify Inner Choice Conditions
**ACTION:** Review every step in your sequence and eliminate any strong external pressure — large rewards, significant threats, or coercive framing.
**WHY:** This is the most critical check in the entire process. Freedman's robot study provides the quantified proof: boys told not to play with a robot under a severe threat complied while threatened, but 77% played with the robot when observed six weeks later without the threat. Boys told not to play with only a mild reason complied equally in the short term, but only 33% played with it six weeks later — they had internalized the belief that the robot was wrong to play with. Strong pressure produces temporary compliance. Mild pressure produces lasting identity change.
**Inner choice checklist:**
- [ ] Each step uses the minimum pressure needed to elicit the action
- [ ] Incentives, if any, are small enough that the target cannot attribute their action entirely to the reward
- [ ] Threats, if any, are mild enough that the target cannot attribute their compliance entirely to fear
- [ ] The target is given latitude to feel they chose freely
- [ ] No step has an amplifier score of 0 on inner choice
**Rule:** If you cannot remove a strong external pressure from a step, that step will not produce durable commitment. Either redesign it or accept that its effect is temporary.
---
### Step 6 (Defense): Detect and Break Consistency Traps
**ACTION:** Apply the two-signal defense system to evaluate whether you are caught in a manufactured consistency sequence.
**WHY:** The consistency drive operates automatically — it fires before conscious analysis. The defense is not to abandon consistency (that would be disastrous; consistency is generally adaptive) but to distinguish genuine consistency from foolish consistency manufactured by someone else's influence sequence.
**Two-signal system:**
**Signal 1 — Stomach signal:** A gut tightening when you recognize you are being steered toward a commitment you don't actually want. This is the faster signal. It fires when the trap is relatively obvious.
**Signal 2 — Heart signal:** A quieter signal from the part of you that cannot be fooled by your own rationalizations. It surfaces when you ask the right question before your cognitive justifications engage. Access it by asking: *"Knowing what I know now, would I make this same commitment again?"* The first flash of feeling before you start constructing reasons is the signal from your heart of hearts.
**Defense checklist:**
- [ ] Am I doing this only because I said I would — and would I say it again now?
- [ ] Has the situation changed since I made this commitment?
- [ ] Is someone using my prior statement or action as a lever to extract something larger?
- [ ] Was the initial commitment obtained under favorable conditions that no longer apply? (Lowball pattern)
- [ ] Did a trivial initial agreement lead me here? (Foot-in-the-door pattern)
**If yes to any:** Say directly: "I realize I agreed to X earlier, but knowing what I now know, I would not make that commitment. I'm not willing to proceed on that basis." You are not obligated to honor a commitment manufactured to extract your consistency.
**Stomach signal counter-move:** When Cialdini felt his stomach tighten being steered toward a compliance he didn't want, he told the requester exactly what they were doing. The tactic stops working the moment the target names it.
---
## Outputs
For Application mode, produce a **Commitment Escalation Plan:**
```
## Commitment Escalation Plan
**Terminal Behavior:** [What you want the target to do at the end]
**Starting Point:** [What the target currently does/believes]
**Technique:** Foot-in-the-door / Lowball / Progressive escalation / Combined
### Commitment Ladder
| Step | Action | Active | Public | Effortful | Inner Choice | Score |
|---|---|---|---|---|---|---|
| 1 | | | | | | /8 |
| 2 | | | | | | /8 |
| ... | | | | | | /8 |
### Inner Choice Verification
- [Step N]: [External pressure present? Redesign if strong pressure identified]
### Timing and Sequencing
- [Gap between steps, delivery method, follow-up framing]
### Expected Compliance Path
- Baseline: [% without sequence]
- After step 1: [% estimated]
- Terminal behavior: [% estimated]
```
For Defense mode, produce a **Consistency Trap Assessment:**
```
## Consistency Trap Assessment
**Situation:** [What is being asked and what prior commitment is being leveraged]
### Signal Check
- Stomach signal: [Present/Absent — describe]
- Heart signal: [Result of "Would I make this commitment now?"]
### Technique Detected
- Foot-in-the-door: [Yes/No — what was the initial small commitment?]
- Lowball: [Yes/No — what advantage was offered and removed?]
- Progressive escalation: [Yes/No — how many steps back does this go?]
### Recommended Response
- [Proceed / Decline / Renegotiate]
- [Exact framing language]
```
---
## Examples
### Example 1: SaaS Onboarding Activation Sequence
**Scenario:** A project management tool has 60% trial-to-cancellation rate. Users sign up but never invite their team or create a project.
**Trigger:** "Our activation rate is too low. Users aren't getting to the 'aha moment.'"
**Process:**
- Terminal behavior: team adoption (3+ members, 1+ active project)
- Starting point: solo signup, no project created
- Technique: foot-in-the-door + progressive escalation
- Step 1: "Name your first project" — trivial, written, inner choice (no pressure), score 6/8
- Step 2: "Add one task" — active, minor effort, inner choice, score 5/8
- Step 3: "Invite one teammate" — public (someone else now knows), effortful (must think of who), inner choice, score 7/8
- Step 4: Teammate joins and responds — social proof reinforces identity as "a team that uses this tool"
- Amplifier check: no strong external pressure at any step; users attribute activation to their own need, not to coercion
- Inner choice: onboarding uses progress indicators (mild nudge), not threats or large rewards
**Output:** Activation sequence spec. Expected improvement from 40% to 65%+ activation based on foot-in-the-door mechanics (Freedman/Fraser: 76% compliance after trivial first commitment vs. 17% direct ask).
---
### Example 2: Sales Closing via Lowball
**Scenario:** A sales rep needs to close an annual subscription. Prospect has been evaluating for 6 weeks but hesitates on price.
**Trigger:** "The prospect likes the product but keeps delaying on signing."
**Process:**
- Terminal behavior: signed annual contract
- Technique: lowball
- Advantage offered: "If you sign this week, I can include premium onboarding (normally $2,000) at no cost."
- Decision obtained: verbal yes + contract sent for review
- Investment period: prospect reviews contract, involves their IT team, discusses with manager (multiple people now aware of the decision)
- Advantage condition changes: "I checked with my manager — the onboarding package is allocated to another account this quarter. We can still hold the annual price for you, just without the onboarding."
- Self-justifications already built: prospect has validated the product internally, budgeted for it, and told colleagues
- The commitment now stands on its own supports
**Note:** This technique is ethically sound only if the original offer was genuine. Engineering a fake advantage purely to remove it is deceptive. Apply the ethical check: would you have offered the advantage if you intended to honor it?
---
### Example 3: Defense — Recognizing a Foot-in-the-Door Trap
**Scenario:** A product manager is asked by a vendor to participate in a "quick 5-minute survey." Three weeks later, the vendor calls asking for a reference call with a prospect.
**Trigger:** "I feel obligated to do this reference call but I'm not sure I should."
**Process:**
- Technique detected: foot-in-the-door (trivial survey → significant ask)
- Stomach signal: slight discomfort — "I don't really know this vendor well enough to recommend them"
- Heart signal: "Would I agree to be a reference for this vendor if asked cold today?" — No
- Assessment: the survey created a relationship entry point; the reference call is the real ask; the two-week gap is structurally identical to Freedman and Fraser's design
- Response: Decline the reference call directly. "I appreciated the chance to give feedback in the survey, but I'm not in a position to serve as a reference — I don't have enough experience with your product to speak credibly on your behalf."
- No obligation exists: the small commitment (survey) was not a commitment to become a commercial reference
---
## Key Principles
- **Inner choice is the master amplifier.** Written, public, and effortful commitments all fail to produce lasting change if the target attributes their action to external pressure. The Chinese POW program worked not because it coerced, but because it used small prizes and social contexts that let prisoners believe they acted freely. Strong pressure produces temporary compliance. Minimal pressure produces permanent identity change.
- **Self-image is the mechanism, not the agreement.** The commitment is not binding because of a social contract; it is binding because the target has updated how they see themselves. Once someone sees themselves as the kind of person who does X, they do X in many contexts and for a long time — far beyond the original situation.
- **Commitment grows its own legs.** Once a decision is made, the human mind generates new, independent reasons to support it. These new reasons do not disappear when the original incentive is removed — they persist and often intensify. This is why lowball works and why onboarding sequences that get early commitments produce lower churn than those that don't.
- **Foot-in-the-door and lowball are structurally different.** Foot-in-the-door works through self-image shift (the target becomes someone who does this). Lowball works through self-generated justification (the target builds reasons that outlast the original incentive). Use foot-in-the-door for identity change campaigns; use lowball for purchase or agreement contexts where a decision must be locked in before conditions shift.
- **The defense is a question, not a refusal.** Blanket refusal to honor any commitment is socially corrosive and personally chaotic. The correct defense is accurate classification: ask "Would I make this commitment knowing what I know now?" Trust the first flash of feeling before cognitive justifications begin. If the answer is no, you are not obligated to proceed — you are resisting foolish consistency, not genuine commitment.
## References
- [references/commitment-amplifiers.md](references/commitment-amplifiers.md) — Full 4-amplifier scoring rubric with all quantified evidence
- [references/technique-comparison.md](references/technique-comparison.md) — Detailed foot-in-the-door vs. lowball mechanics, POW sequence breakdown, compliance data
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Influence: The Psychology of Persuasion by Robert B. Cialdini.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/commitment-amplifiers.md
# Commitment Amplifiers — Full Evidence Reference
Source: Influence: The Psychology of Persuasion, Ch. 3 (Cialdini)
---
## The Four Amplifiers
Commitments vary dramatically in their ability to produce lasting behavior change and self-image updating. Four properties determine amplifier strength. **Inner choice is more important than the other three combined.**
---
## Amplifier 1: Active (Written > Verbal > Unspoken)
**Mechanism:** Physical action — especially writing — creates irrefutable evidence of the act. It is harder to deny or forget what you have written than what you have thought or said.
**Evidence:**
- **Deutsch and Gerard (line-length study):** College students estimated line lengths, then were given contradicting evidence and the chance to change their estimate. Three groups: (1) wrote down and submitted their estimate publicly, (2) wrote it on a Magic Pad and erased it before anyone saw, (3) kept the estimate only in mind. Group 1 — publicly written — was by far the most loyal to the original estimate under pressure. Group 2 — privately written — was significantly more resistant to change than Group 3. Unwritten estimates were the most susceptible to new information.
- **Amway Corporation:** Required sales staff to write down individual sales goals and commit to them in writing on paper. Internal training materials stated: "There is something magical about writing things down." Compliance with written goals significantly exceeded compliance with unwritten intentions.
- **Encyclopedia company sales training:** Trained customers to personally fill out the sales agreement (rather than having the salesperson fill it out). A "personal commitment alone has proved to be 'a very important psychological aid in preventing customers from backing out of their contracts.'" Used specifically to reduce post-cooling-off-period cancellations.
- **Chinese POW indoctrination:** The Chinese were never satisfied with verbal agreement. Written statements were always sought. If a prisoner refused to write a statement voluntarily, he was asked to copy it from a notebook — "which must have seemed like a harmless enough concession." The written record created physical evidence that made it impossible to forget or deny the act, and it could be shown to others as proof.
- **Testimonial contests (Procter & Gamble, General Foods):** "Why I like [Product] in 25 words or less" contests required participants to write favorable product essays. Contestants voluntarily wrote favorable testimonials, making them believe what they had written — regardless of whether they won. The purpose was identical to the Chinese political essay contests: get people on record in writing as supporters.
**Quantified uplift:** Written commitments produce significantly more resistance to change than verbal commitments; verbal more than none. Deutsch and Gerard: publicly recorded estimates most stubborn of all three conditions.
---
## Amplifier 2: Public (Witnessed > Private)
**Mechanism:** When a commitment is witnessed by others, the social cost of backing down increases. The desire to appear consistent to others compounds the internal drive to be consistent with oneself.
**Evidence:**
- **Deutsch and Gerard:** The group that wrote down estimates AND submitted them publicly was the most resistant of all three groups to changing under pressure from contradicting evidence. "Public commitment had hardened them into the most stubborn of all."
- **Hung jury study:** Six- and twelve-person experimental juries deciding close cases produced significantly more hung juries when jurors expressed initial opinions with a public show of hands vs. a secret ballot. Once jurors had publicly stated their positions, they were reluctant to change even when presented with strong counterarguments.
- **Weight-loss clinics:** Many clinics require clients to write down their immediate weight-loss goal and show it to as many friends, relatives, and neighbors as possible. "Clinic operators report that frequently this simple technique works where all else has failed."
- **San Diego woman (smoking cessation):** A woman who had tried and failed to quit smoking multiple times wrote "I promise you that I will never smoke another cigarette" on a business card and gave one to every person on a list of people whose respect she wanted. Within a week she had distributed cards to dozens of people. She quit cold turkey. The public commitment sustained her through moments when she wanted to relapse.
- **Chinese POW program:** Political essays were not kept private. They were posted around camp, read aloud in discussion groups, or broadcast on the camp radio. "As far as the Chinese were concerned, the more public the better." Public visibility served dual purposes: it reinforced the writer's own commitment AND persuaded other prisoners that the statement reflected the author's genuine beliefs.
- **Sherman (Bloomington voting study):** Columbus, Ohio, residents who were asked to predict whether they would vote (a public-facing commitment to an interviewer) showed significantly higher actual voting turnout than those not asked. The act of stating a prediction to another person functioned as a public commitment.
**Quantified uplift:** Public vs. private: Deutsch and Gerard showed public group most resistant. Jury study: significantly more hung juries with public vs. secret ballot.
---
## Amplifier 3: Effortful (Harder = More Valued, More Owned)
**Mechanism:** When a person expends significant effort to attain or express something, they value it more highly. The effort cannot be attributed to external pressure; the person must generate internal justification (cognitive dissonance reduction). This makes the commitment feel like a genuine expression of the self.
**Evidence:**
- **Aronson and Mills (1959 — initiation study):** College women who had to endure a severely embarrassing initiation ceremony to gain access to a sex discussion group rated the group and its discussions as extremely valuable and interesting — even though Aronson and Mills had arranged for the group to be as "worthless and uninteresting" as possible. Women who went through a mild initiation or no initiation were decidedly less positive about the same group.
- **Follow-up (electric shock vs. embarrassment):** When coeds were required to endure pain (electric shocks) rather than embarrassment to enter a group, the more intense the shock, the more they later valued the group. The mechanism is not specific to embarrassment — effort and suffering of any kind amplifies commitment to the goal it was expended to reach.
- **54 tribal cultures study:** Researchers found that those tribes with the most dramatic and stringent initiation ceremonies had the greatest group solidarity. The correlation held across radically different cultures and geographies.
- **Fraternity hazing / military boot camp:** Hazing practices are phenomenally resilient to elimination attempts. The reason: the severity of the initiation significantly heightens the newcomer's commitment to the group. "The loyalty and dedication of those who emerge will increase to a great degree the chances of group cohesion and survival." Groups defend their hazing practices fiercely because they understand (often intuitively) that the suffering is what creates the bond.
- **West Point cadet John Edwards:** Expelled for refusing to expose new cadets to what he called "absurd and dehumanizing" initiation. His refusal was treated as a serious offense — because the institution understood that demanding initiation creates lasting commitment.
**Quantified uplift:** Aronson and Mills: severe-initiation group rated the worthless discussion group significantly more positively than mild- or no-initiation groups. The correlation across 54 tribal cultures is the largest-scale evidence.
---
## Amplifier 4: Inner Choice (No Strong External Pressure)
**THIS IS THE MOST IMPORTANT AMPLIFIER — more important than the other three combined.**
**Mechanism:** A commitment produces lasting self-image change only when the person believes they chose it freely. Strong external pressure — large reward or significant threat — gives the person an external explanation for their behavior. They attribute their action to the pressure, not to who they are. No identity change occurs. When external pressure is minimal or absent, the person cannot explain their action by pointing to outside force. They must look inward: "I did this because it's what I believe." The behavior becomes part of the self.
**Evidence:**
- **Freedman robot study (the definitive experiment):**
- Group 1 (strong threat): Boys told not to play with a robot under threat of severe punishment. 21/22 complied while Freedman was present. Six weeks later, observed without Freedman: 77% played with the robot. The threat had taught them it was unwise to play when they might get caught — not that it was wrong.
- Group 2 (mild reason): Boys told not to play with the robot "because it's wrong to play with it." Same short-term compliance. Six weeks later: only 33% played with the robot. They had internalized the belief. The mild instruction had produced an identity-level change; the severe threat had not.
- **Key ratio: 77% vs. 33% compliance six weeks later.** Strong external pressure is more than twice as likely to fail at producing lasting behavior change compared to minimal external pressure.
- **Chinese political essay contests:** The Chinese specifically chose small prizes (a few cigarettes, a bit of fruit) for essay contests rather than large rewards. The purpose: if the prize was large, a prisoner could attribute his pro-Communist writing to the reward. With a small prize, he could not — he had to attribute it to his own beliefs. "It was not enough to wring commitments out of their men; those men had to be made to take inner responsibility for their actions."
- **Chinese community-service refusal:** Fraternity chapters refuse to incorporate community service into initiation because community service would allow pledges to attribute their effort to charitable motives. The ordeal must not have a socially acceptable external explanation — the initiates must own it as their choice to join.
- **Pallak lowball study (Iowa energy conservation):** Homeowners promised public newspaper recognition for energy conservation conserved 12.2% more gas in the first month. When the promise was removed (a letter saying their names would not be published after all), instead of reverting to previous usage, they conserved 15.5% more. The original external incentive (public recognition) had been removed. Rather than diminishing commitment, its removal eliminated the one external reason that prevented full ownership. Homeowners could now say they conserved because they cared — not to get their names in the paper. Their savings increased.
- **Pallak replication (summer, air conditioning):** Same result. Homeowners promised publicity cut electricity use 27.8% in July. After the publicity promise was canceled, August savings rose to 41.6%.
**Design implication:** In commitment escalation sequences, use the smallest nudge that produces the desired action. The goal is an action the target cannot fully explain by pointing to your incentive. That unexplainability forces identity attribution. That identity attribution produces lasting change.
---
## Amplifier Interaction Effects
All four amplifiers interact. A written public effortful commitment made under inner choice is maximally durable. The amplifiers are somewhat additive, but inner choice is a necessary condition — its absence undermines the others.
**Hierarchy:**
1. Inner choice — necessary for any lasting effect
2. Active (written) — creates physical evidence, enables public sharing, resists forgetting
3. Public — social accountability; hardens into stubbornness
4. Effortful — increases perceived value; eliminates cheap external attributions
**Scoring guidance:** Use the 0/1/2 rubric in the main skill. Flag any step with inner choice = 0 for redesign regardless of other scores.
FILE:references/technique-comparison.md
# Technique Comparison — Foot-in-the-Door vs. Lowball vs. Progressive Escalation
Source: Influence: The Psychology of Persuasion, Ch. 3 (Cialdini)
---
## Foot-in-the-Door
### Mechanism
A small initial commitment → self-image shift → dramatically increased compliance with a large subsequent request.
The mechanism is self-image updating, not agreement obligation. When a person complies with a small request, they subtly update how they see themselves — they become "the kind of person who does this sort of thing." When a larger consistent request arrives later, they comply to remain consistent with their updated self-image, not because they feel contractually bound.
### Primary Evidence
**Freedman and Fraser (1966) — billboard study:**
- Researchers posing as volunteer workers knocked on doors in a residential California neighborhood
- Large request: allow a large "DRIVE CAREFULLY" billboard to be installed on the front lawn (an eyesore that nearly obscured the house in photographs)
- Direct ask group: 17% compliance
- Prior small commitment group: 76% compliance — after having agreed two weeks earlier to display a small 3-inch "BE A SAFE DRIVER" sign
**Cross-domain transfer (same study):**
- A second group was first asked to sign a "Keep California Beautiful" petition — completely unrelated to driving safety
- Two weeks later, asked to display the "DRIVE CAREFULLY" billboard
- ~50% compliance — even though the initial commitment was on a different topic
- The mechanism transferred across topics because what shifted was not knowledge of driver safety but self-image as a civic-minded person
**Explanation (Freedman and Fraser):**
> "What may occur is a change in the person's feelings about getting involved or taking action. Once he has agreed to a request, his attitude may change, he may become, in his own eyes, the kind of person who does this sort of thing, who agrees to requests made by strangers, who takes action on things he believes in, who cooperates with good causes."
**Sherman (Bloomington, Indiana — prediction study):**
- Residents asked to predict what they would say if asked to volunteer three hours for the American Cancer Society
- Most said they would volunteer (social desirability)
- When ACS called for actual volunteers days later: 700% increase in volunteers compared to control group
- The verbal prediction functioned as a small commitment
### Design Rules
1. First request must be small enough that refusal rate is below 20%
2. First request must share a thematic, identity, or behavioral domain with the large request
3. Inner choice must be preserved — no strong incentive attached to the first request
4. Self-image shift must consolidate before the large request arrives (minimum 1-2 weeks in sequential campaigns)
5. The large request should be framed as consistent with who the target now is
### When to Use
- Cold-to-warm conversion sequences (leads who don't yet know you)
- Onboarding flows where you need to build a habit identity
- Advocacy programs where you want users to become promoters
- Any scenario where you need to bridge a large gap between current behavior and target behavior
### Compliance Data Summary
| Condition | Compliance Rate |
|---|---|
| Direct large request | 17% |
| After trivially related small commitment | 76% |
| After unrelated petition (cross-domain) | ~50% |
| After verbal prediction commitment | 700% increase in volunteer rate |
---
## Lowball
### Mechanism
A favorable offer induces a decision → the decision triggers self-generated justifications → the original advantage is removed → the commitment stands on its newly created support structure.
The key insight: commitment is not the agreement — it is the decision. The moment a person decides, they start generating their own reasons to support the decision. These reasons are independent of the original incentive. When the incentive is removed, the reasons remain. The commitment stands on multiple legs; removing one does not collapse it.
### Primary Evidence
**Car dealership observations (Cialdini):**
- Salesperson offers price ~$400 below competitors to induce a purchase decision
- Customer decides, fills out forms, arranges financing, sometimes drives the car ("get the feel of it, show it around")
- Before signing: "error" discovered in calculation — air conditioning not included, adds $400; or manager overrules the deal; or trade-in estimate was too high
- Customer, having built personal investment and justifications, typically accepts the revised terms
- "It seems almost incredible that a customer would buy a car under these circumstances. Yet it works."
**Pallak Iowa energy conservation study:**
- Homeowners promised newspaper publicity for conservation conserved 12.2% more gas (Month 1)
- Letter arrived removing the publicity promise
- Rather than reverting: homeowners conserved 15.5% more for the rest of the winter
- The original incentive was gone; the commitment had grown its own legs (new self-image as conservation-minded citizens, new reasons about energy independence, monetary savings, capacity for self-denial)
**Summer replication (air conditioning):**
- Promised-publicity group: 27.8% electricity reduction in July
- After publicity cancellation: 41.6% savings in August
- Pattern held: removing the external reason intensified rather than diminished commitment
**Sara/Tim case (behavioral illustration):**
- Sara decided to break her engagement and reunite with Tim under a specific condition (Tim would stop drinking)
- Tim did not fulfill the condition
- Sara remained more devoted than ever — she had built new reasons to support her decision (Tim cares more, makes great omelets, etc.)
- The original reason (Tim would stop drinking) had been removed; the commitment stood on new legs
### Design Rules
1. Offer must be genuine enough that the target would not have decided without it
2. Decision must be active: verbal yes + investment of time, energy, or social capital
3. Allow self-justification period before removing the advantage
4. Remove the advantage with a plausible, non-coercive explanation
5. Do not pressure after removal — the commitment should hold on its own
**Ethical constraint:** The advantage must be real at the time of offering. Engineering a fake advantage purely to remove it is deception. The technique is ethically permissible only when the initial offer was genuine and circumstances changed.
**Insidious variant — inflated trade-in:** Salesperson offers an inflated trade-in value. Customer commits. Used car manager "corrects" it to blue-book value. Customer accepts it as "the fair one" and sometimes feels guilty for having "tried to take advantage."
### When to Use
- Sales closing where a moderate incentive can seal a decision
- Subscription or membership sign-ups with introductory offers
- Any scenario where a decision must be locked in before conditions shift
- Negotiation: offer something valuable to get a binding agreement, then restructure
### Compliance Data Summary
| Condition | Effect |
|---|---|
| Lowball decision (direct car purchase) | Majority proceed after advantage removal |
| Lowball + conservation (Month 1) | 12.2% savings increase |
| Lowball + conservation (post-removal months) | 15.5% savings increase (higher than Month 1) |
| Summer replication (post-removal) | 41.6% savings (up from 27.8% with incentive) |
---
## Full Progressive Escalation — Chinese POW Template
### Mechanism
Multi-step ladder that produces cumulative identity change through individually small and apparently reasonable steps. Each step:
1. Builds on the identity shift of the prior step
2. Stays below the refusal threshold
3. Preserves inner choice (no strong external pressure)
The terminal behavior would be refused outright if asked directly. The escalation sequence makes the terminal behavior feel like a natural extension of who the person has become.
### The 6-Step POW Sequence (documented by Dr. Edgar Schein)
| Step | Action | Amplifiers |
|---|---|---|
| 1 | Trivial verbal anti-American statement: "The United States is not perfect" | Active (verbal), inner choice (obviously true) |
| 2 | Written list of "problems with America" | Active (written), inner choice (small prizes) |
| 3 | Sign the list | Active (written + signed), beginning public |
| 4 | Read list aloud in discussion group with other prisoners | Public, effortful (social courage required) |
| 5 | Write an essay expanding on the list | Active (written, extended), effortful |
| 6 | Essay broadcast on camp radio to entire camp, other POW camps, American forces | Fully public, maximum amplification |
**Outcome:** Many prisoners changed their actual beliefs. Post-war psychological evaluation by Dr. Henry Segal found "war-related beliefs had been substantially shifted." Most men believed the Chinese story that the US had used germ warfare. Some expressed positive views of communism.
**Key design insight from the Chinese:** Prizes for essay contests were kept deliberately small — "a few cigarettes or a bit of fruit." Large prizes would have allowed prisoners to attribute their writing to the reward. Small prizes meant they had to attribute it to their own beliefs. Inner choice was structurally engineered.
**Fraternity parallel:** The fraternity refusal to substitute community service for hazing is the same logic inverted. Community service would allow pledges to attribute their effort to charity. The suffering must have no acceptable external explanation — so the pledge owns it as a choice.
### Design Rules for Progressive Escalation
1. Define the terminal behavior first (this is the anchor)
2. Work backward: what is the smallest first step the vast majority will take?
3. Each step must be thematically and identity-logically connected to the prior step
4. Each step must stay just below the refusal threshold
5. Apply inner choice at every step — minimum external pressure
6. Use each completed step as the foundation for framing the next: "Since you already X, it makes sense to now Y"
7. Allow consolidation time between steps — do not rush the ladder
### Compliance Data
Schein documented: "only a few men were able to avoid collaboration altogether; the majority collaborated at one time or another by doing things which seemed to them trivial but which the Chinese were able to turn to their own advantage." The success rate of near-total collaboration from men trained to resist was the most systematic demonstration of progressive commitment escalation in documented history.
---
## Technique Selection Guide
| Scenario | Recommended Technique | Reason |
|---|---|---|
| Cold outreach, no prior relationship | Foot-in-the-door | No existing commitment to leverage; need to build self-image from scratch |
| Sales closing with willing-but-hesitant prospect | Lowball | Prospect is interested; needs a decision trigger; self-justification will do the rest |
| Long-term habit formation | Progressive escalation | Identity change required; takes multiple steps; cannot be rushed |
| Community or group identity building | Progressive escalation + public amplifier | Group identity is built through shared investment |
| Onboarding activation sequence | Foot-in-the-door (short version) | 3–4 step ladder sufficient to move from signup to activation |
| Negotiation | Lowball or foot-in-the-door | Depends on whether you can offer a genuine initial advantage |
| Behavioral change programs | Progressive escalation | Long-term change requires identity updating, not just compliance |
---
## Distinguishing Foot-in-the-Door from Lowball
The most important distinction:
| | Foot-in-the-door | Lowball |
|---|---|---|
| **Primary mechanism** | Self-image updating | Self-generated justification building |
| **Time of commitment** | Commitment to new identity builds before large ask | Commitment to specific decision builds between decision and advantage removal |
| **What holds commitment in place** | "I am the kind of person who does X" | "I have good reasons to proceed even without the original incentive" |
| **When it locks in** | After the small commitment is made and identity shifts | After the decision is made and investment begins |
| **Risk of reversal** | Low once identity shifts | Low once enough justifications accumulate |
| **Ethical exposure** | Low (no deception required) | Moderate (original offer must be genuine) |
Design and audit authority signals in content, credentials, bios, and landing pages. Use this skill when building expert positioning, thought leadership cont...
---
name: authority-signal-designer
description: |
Design and audit authority signals in content, credentials, bios, and landing pages. Use this skill when building expert positioning, thought leadership content, professional bios, about pages, or any content where trust and credibility must be established quickly. Covers three authority symbol types — titles, clothes/uniforms, and trappings — with compliance data showing their real-world persuasion impact. Also covers the two-question defense framework for evaluating authority claims you encounter. Applies strategic self-deprecation to build credibility. Use when: writing a professional bio, building a consultant's about page, crafting thought leadership content, designing a speaker or expert landing page, positioning credentials in marketing copy, adding expert testimonials or social proof of expertise, auditing whether your content actually conveys authority, or when evaluating whether an authority claim you're reading is genuine. Relevant for: authority, credibility, expertise, thought leadership, credentials, trust signals, expert positioning, professional bio, about page, social proof of expertise, testimonials from experts, Milgram, obedience, compliance, persuasion.
model: sonnet
context: 200k
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: "Marketing content, professional bios, landing pages, credentials list, or about pages to design or audit"
- type: none
description: "Skill also works from a verbal description of the expert or context"
tools-required: [Read, Write, TodoWrite]
tools-optional: [Grep]
environment: "Run from any directory; document access enables concrete rewrites"
depends-on: []
---
# Authority Signal Designer
## When to Use
Use this skill when you are:
- **Writing or editing a professional bio or about page** — need to convey credentials and expertise in a way that activates trust, not just lists facts
- **Building a consultant, coach, or thought leader brand** — choosing which authority signals to emphasize across a website, social profiles, or marketing collateral
- **Creating expert-led marketing content** — articles, case studies, webinars, or campaigns where the author's or brand's expertise needs to be felt, not just stated
- **Auditing existing content for authority gaps** — reviewing a bio, landing page, or content set to identify where authority is being undersold or miscommunicated
- **Evaluating incoming authority claims** — receiving a pitch, reading a recommendation, or vetting an expert where you need to apply the two-question defense protocol
Preconditions: you have at least one of:
- A draft bio, landing page, about page, or content set to work from
- A description of the person, role, credentials, and audience
- An authority claim (from an expert, vendor, influencer, or advisor) you want to evaluate
**Agent:** Before starting, confirm whether you are in APPLICATION mode (designing authority signals for content) or DEFENSE mode (evaluating an authority claim someone is making). You can also do both in sequence if relevant.
## Context & Input Gathering
### Input Sufficiency Check
```
User prompt → Extract: whose authority? what content? what audience? which mode?
↓
Environment → Scan for: bio drafts, landing page copy, credential lists, existing content
↓
Gap analysis → Do I know: (1) whose authority is being designed, (2) what the content is for,
(3) who the audience is, (4) application vs defense mode?
↓
Missing critical info? ──YES──→ ASK (one question at a time)
│
NO
↓
PROCEED
```
### Required Context (must have — ask if missing)
- **Whose authority, and in what domain:**
→ Check prompt for: name, title, role, field, credentials, years of experience
→ Check environment for: existing bios, LinkedIn content, website copy
→ If still missing, ask: "Who is this for, and what is their domain expertise? For example: 'a cybersecurity consultant with 12 years in enterprise security' or 'a marketing agency specializing in B2B SaaS.'"
- **Audience and context:**
→ Check prompt for: target reader, content format (landing page, bio, article), decision being made
→ If still missing, ask: "Who is the audience for this content, and what action should they take after encountering this person's authority? For example: 'enterprise buyers deciding whether to hire this consultant.'"
### Observable Context (gather from environment)
- **Existing content:** Read any draft bio, about page, or credential list present in the files.
→ Look for: how credentials are currently presented, what titles are used, what symbols of authority appear
→ If unavailable: work from user description
- **Competitor or peer positioning:** If other experts in the same field are named or linked, observe their authority signals for comparison.
### Default Assumptions
- If no audience specified: assume a skeptical professional evaluating this person for the first time, with no prior exposure
- If no content format specified: assume a professional bio or about page is the target output
- If no mode specified: assume APPLICATION mode (designing signals) with a brief defense check at the end
## Process
Use `TodoWrite` to track steps before beginning.
```
TodoWrite([
{ id: "1", content: "Analyze scenario: mode, audience, goal, existing signals", status: "pending" },
{ id: "2", content: "Audit existing authority signals across all three symbol types", status: "pending" },
{ id: "3", content: "Design authority strategy: titles, visual/contextual signals, trappings", status: "pending" },
{ id: "4", content: "Apply strategic self-deprecation where appropriate", status: "pending" },
{ id: "5", content: "Ethical check: are signals genuine or manufactured?", status: "pending" },
{ id: "6", content: "Defense mode: apply two-question protocol if evaluating a claim", status: "pending" }
])
```
---
### Step 1: Analyze the Scenario
**ACTION:** Determine mode (APPLICATION / DEFENSE / BOTH), identify the audience's knowledge state and trust threshold, and clarify the specific goal.
**WHY:** Authority signals work through automatic, nearly unconscious compliance — what Cialdini calls the "click, whirr" response. The audience rarely deliberates about whether to trust an authority; they react. Designing effective signals means understanding what will trigger that automatic response in THIS audience. A title that commands authority for physicians ("MD") carries no weight in venture capital. A luxury office that signals success to retail clients may signal excess to lean startup operators. The signal must match the audience's conditioned deference patterns.
**Key questions to resolve:**
- What does this audience automatically defer to? (academic titles, corporate rank, media credentials, peer recognition, track record data?)
- Is the goal initial credibility (getting read or engaged) or sustained trust (getting hired, followed, recommended)?
- Is this a single-channel deployment (one bio) or multi-channel (website, LinkedIn, speaking intro, byline)?
Mark Step 1 complete in TodoWrite.
---
### Step 2: Audit Existing Authority Signals
**ACTION:** Review all existing content through the lens of the three symbol types. Identify what is present, what is absent, and what is being communicated unintentionally.
**WHY:** Authority is communicated whether you plan it or not. An unpolished website, a vague title ("consultant"), or a sparse bio all communicate something — usually uncertainty or low status. The audit surfaces both gaps and noise. Noise is as harmful as gaps: listing irrelevant credentials (a food science degree for a marketing consultant) dilutes focus and signals a failure to curate.
**Symbol Type 1 — Titles:** What titles, designations, certifications, or role labels appear?
- Are titles specific or generic? ("Principal Consultant" vs "Consultant")
- Are academic or professional designations present where relevant?
- Is the title front-loaded (appears early, prominently) or buried?
- Audit for: titles that are undersold (real credential hidden behind modesty), titles that are overstated (vague claims), missing titles that would be recognized by this audience
**Symbol Type 2 — Visual and Contextual Authority Signals** (the clothes/uniform equivalent in digital content):
- In written content: does the language register match expert status? (precise terminology, confident assertion, absence of hedging)
- In visual content: professional photography, publication logos, event photography, stage/keynote images
- In digital presence: website design quality, publication outlets where work appears, association with known institutions
- Audit for: language that hedges authority away ("I think," "in my opinion," "maybe"), generic stock photography that signals inauthenticity, poor visual design undermining strong credentials
**Symbol Type 3 — Trappings:** Status markers that signal success and achievement.
- Client logos, press mentions, award badges, bestseller labels
- Social proof metrics (followers, subscribers, downloads) where significant
- Association with prestigious institutions, events, or publications
- Audit for: absent trappings that could be added, trappings that are outdated or no longer accurate, trappings that signal the wrong domain
**Output:** A structured gap and noise analysis table.
Mark Step 2 complete in TodoWrite.
---
### Step 3: Design the Authority Strategy
**ACTION:** Based on the audit, prescribe specific additions, removals, and repositioning across all three symbol types. Produce revised copy or specific recommendations for each channel.
**WHY:** The three symbol types work together as a system, not independently. A strong title with weak trappings reads as self-proclaimed. Strong trappings with a weak title reads as successful but undefined. The goal is convergence: all three types pointing to the same conclusion about expertise and trustworthiness. The Milgram research and its replications show that 65% of people will comply with a clearly-marked authority even in high-stakes situations — the symbols do real persuasive work, but only when they form a coherent signal.
**Titles strategy:**
- Lead with the most recognized, domain-relevant title — not the most impressive one to the person themselves
- For consultants and independents: specificity beats seniority ("Customer Acquisition Strategist for B2B SaaS" beats "Marketing Consultant")
- Include certifications, publications, or institutional affiliations that the target audience will recognize and respect
- Height-perception principle: the more specific and credentialed the title sounds, the more the audience fills in competence (the Cambridge study showed 2.5-inch height perception increase per status step — the same mechanism governs perceived expertise depth)
**Visual and contextual signals strategy:**
- Rewrite hedging language to authoritative assertion: "I believe companies should..." → "Companies that scale past $10M need..."
- Identify one or two institutional associations (past employers, academic institutions, known clients) that serve as shorthand credibility markers for this audience
- If original content (articles, talks, frameworks) exists, name and reference it — having a named methodology or published work activates the "expert has a system" authority heuristic
- For visual assets: professional context photography (speaking, consulting, in environment) beats portrait studio shots — context signals the expert in their domain
**Trappings strategy:**
- Select three to five highest-signal trappings for this audience — prioritize recognizability over impressiveness
- Format trappings as social proof near the primary authority claim: "Featured in [X]" or "Trusted by [Y] companies" placed close to the bio, not buried in a separate section
- Use specificity to strengthen trappings: "10,000 newsletter subscribers" > "large newsletter"; "Led growth from $2M to $18M ARR" > "drove significant growth"
- For new experts with thin trappings: community leadership, event roles (speaker, organizer, panelist), and peer recognition from respected names substitute effectively
**AGENT: EXECUTES** — produce revised bio copy or a marked-up version of the existing content with specific changes called out.
Mark Step 3 complete in TodoWrite.
---
### Step 4: Apply Strategic Self-Deprecation
**ACTION:** Identify one or two places where acknowledging a genuine limitation, counterargument, or constraint will increase overall credibility more than omitting it.
**WHY:** Compliance professionals use strategic self-deprecation to establish truthfulness on minor points so that their major claims receive less scrutiny. The mechanism is trust calibration: an expert who acknowledges limitations signals that their positive claims are honest, not sales-motivated. Cialdini documents this with Listerine ("the taste you hate three times a day"), Avis ("We're #2, but we try harder"), and L'Oreal ("a bit more expensive and worth it"). The waiter who says "the special isn't as good tonight as it usually is" and recommends a less expensive dish gets trusted when they later recommend expensive wine and desserts. The key rule: the limitation must be genuine, secondary, and easily overcome by the benefits being claimed.
**Where to apply:**
- In a bio or about page: acknowledge one constraint on scope ("I focus exclusively on [narrow domain], so if you need [adjacent area], I'm not your person")
- In a case study: briefly note where an approach has limits before explaining why it was right for this client
- In a pitch or proposal: acknowledge one alternative approach and why it was rejected — this signals the recommendation was considered, not reflexive
**What to avoid:** Self-deprecation on core competence claims, primary value propositions, or anything the audience might actually be deciding about. The minor concession must be clearly minor.
**AGENT: EXECUTES** — identify the best candidate location in the content and draft the self-deprecation language.
Mark Step 4 complete in TodoWrite.
---
### Step 5: Ethical Check
**ACTION:** Verify that all authority signals being designed represent genuine credentials, real experience, or actual achievements. Flag any that are manufactured, exaggerated, or misleading.
**WHY:** Authority symbols are the most easily faked of all influence mechanisms — Cialdini explicitly notes that con artists specifically exploit titles, uniforms, and trappings precisely because they're so powerful and so easy to counterfeit. The 95% nurse compliance rate in the Hofling hospital study happened in response to a phone caller claiming to be a doctor — nothing verified. Designing manufactured authority signals is not only unethical but creates compounding risk: audiences who discover the gap between signal and substance lose trust entirely and permanently. The goal is amplifying genuine authority, not fabricating it.
**Checklist:**
- [ ] Every title listed is a current or verifiable past credential
- [ ] Every trapping (award, press mention, client logo) is accurate and current
- [ ] Institutional affiliations are genuine and not misrepresented in scope
- [ ] No implied expertise in domains where the expert has thin actual credentials
- [ ] Self-deprecation in Step 4 is a real limitation, not theatrical modesty over an actual strength
**IF** any signal fails this check → remove it or reframe it accurately
**IF** the genuine credentials are thin → recommend building real authority assets (publish content, seek speaking opportunities, pursue relevant certifications) rather than manufacturing symbols
Mark Step 5 complete in TodoWrite.
---
### Step 6: Defense — Two-Question Protocol
**ACTION:** If in DEFENSE mode, apply the two-question framework to evaluate the authority claim being assessed. If in APPLICATION mode, run this as a brief self-check on the content just designed.
**WHY:** People grossly underestimate how much authority affects them. In Milgram's experiments, predictions about how many subjects would comply with the full shock sequence fell in the 1-2% range — the actual rate was 65%. In the luxury car study, students predicted they would honk at the prestige car faster than the economy car — the opposite was true. This systematic underestimation is the core vulnerability. The two questions force conscious deliberation into what would otherwise be automatic deference, making it far harder for symbols to substitute for substance.
**Question 1: "Is this authority truly an expert?"**
- What specific credentials are claimed, and are they verifiable?
- Is the expertise domain-matched to this specific claim? (A cardiologist claiming authority on nutrition policy is an authority mismatch — genuine credentials in the wrong domain)
- Is the title the substance or just the symbol? (The "M.D." in the Sanka commercial was an actor playing a doctor — the credential was the signal, not the reality)
- Decision: Confirmed expert in this domain / Credentials unverified / Credentials real but domain mismatch
**Question 2: "How truthful can we expect this authority to be here?"**
- What does this authority stand to gain if you comply with their recommendation?
- Is there a conflict of interest between their expertise and their recommendation?
- Are they arguing against their own apparent interest anywhere? (If yes — this is evidence of trustworthiness; if never — be more skeptical)
- Are they presenting the full picture, or only the evidence that supports compliance?
- Decision: High trustworthiness / Conflict of interest present / Truthfulness uncertain
**Output:** A clear verdict — "Follow this authority on this claim" / "Verify before acting" / "Authority signal is present but substance is unconfirmed" — with the specific reasoning.
**HANDOFF TO HUMAN** — the defense protocol produces an assessment; the human makes the final judgment call.
Mark Step 6 complete in TodoWrite.
---
## Inputs
- **Content to design or audit:** bio, about page, landing page, credentials list, marketing copy, pitch document
- **Expert description:** name, role, credentials, experience, domain, notable achievements
- **Audience context:** who reads this, what decision they're making, what authority markers they recognize
- **Mode:** APPLICATION (building signals), DEFENSE (evaluating a claim), or BOTH
## Outputs
- **Authority Signal Audit:** gap and noise table identifying missing, weak, or misdirected signals across all three symbol types
- **Revised content:** rewritten bio, about page copy, or annotated original with specific changes
- **Strategic self-deprecation:** one or two drafted language additions that increase credibility through honesty
- **Defense assessment** (if in DEFENSE mode): two-question verdict on the authority claim with reasoning
## Key Principles
- **Symbols work as powerfully as substance — but only on audiences who haven't checked** — the nurse compliance study (95%) and Milgram experiments (65%) demonstrate that authority symbols bypass deliberate evaluation. This is both the mechanism to use ethically and the vulnerability to guard against.
- **Convergence across all three symbol types beats any single strong signal** — a title without trappings reads as self-proclaimed. Trappings without a clear title reads as successful but undefined. Design all three to point at the same conclusion.
- **Specificity is authority** — vague titles and generic credential lists communicate low status. Specific, granular credentials signal that the expert has done enough real work to have earned precision. "Principal consultant" signals more than "consultant"; "10,400 newsletter subscribers in fintech operations" signals more than "large audience."
- **Audiences underestimate authority's effect on themselves** — Cialdini documents this consistently: people in Milgram-style studies, luxury car studies, and uniform experiments all predicted they would be less affected than they actually were. When designing or evaluating authority signals, assume the effect is larger than it appears from the inside.
- **Strategic self-deprecation earns permission to claim** — acknowledging one genuine limitation buys credibility for all the positive claims that follow. The limitation must be real, minor, and easily outweighed. Theatrical modesty over a real strength is dishonest and backfires when detected.
- **Authority in the wrong domain is worthless — or worse** — the first defense question explicitly targets domain match because this is the most common authority misfire. Real experts in adjacent fields carry a halo that misleads audiences. Design and evaluate authority signals with domain specificity as a primary filter.
## Examples
**Scenario: Cybersecurity consultant bio with thin authority signals**
Trigger: "I'm a freelance cybersecurity consultant. My bio just says 'I help companies stay secure.' Can you make it better?"
Process:
1. Mode: APPLICATION. Audience: enterprise IT buyers and CISOs.
2. Audit: No title specificity, no credentials mentioned, no trappings, no institutional associations, hedged language.
3. Design: Lead with specific title ("Penetration Testing Specialist — Enterprise Cloud Infrastructure"). Add CISSP or relevant certification. Identify two verifiable trappings (past employer names, notable client sectors). Rewrite language from hedged to authoritative assertion.
4. Self-deprecation: "My work focuses specifically on cloud-native environments; for legacy on-premise infrastructure audits, I typically partner with specialists who have deeper experience there."
5. Ethical check: Confirm all claims are real.
6. Defense: Not applicable.
Output: Revised bio with specific title, credentials front-loaded, two trappings added, self-deprecation for scope clarity. Estimated authority-signal improvement: from 1/3 symbol types represented to 3/3.
---
**Scenario: Marketing agency building a new website about page**
Trigger: "We're redoing our website. We're a 6-person B2B content marketing agency. What authority signals should go on our About page?"
Process:
1. Mode: APPLICATION. Audience: B2B marketing directors at mid-market companies evaluating agencies.
2. Audit (from description): No existing about page content provided; designing from scratch.
3. Design strategy: Titles — name the founders' specific backgrounds and past brand affiliations (not just "former marketers"). Visual signals — replace generic headshots with context photography showing client work or team in action. Trappings — identify three recognizable client logos to feature, one press mention or industry award, and a specific result metric ("73 clients, average 40% increase in qualified pipeline").
4. Self-deprecation: "We focus exclusively on long-form content and thought leadership — we don't do social media management or paid ads." This positions the agency as a specialist, not a generalist, which is a form of authority in itself.
5. Ethical check: Confirm all client logos approved and metrics accurate.
Output: About page copy with all three symbol types covered, specialist framing as authority signal, self-deprecation that doubles as positioning.
---
**Scenario: Evaluating a consultant's recommendation**
Trigger: "A consultant is recommending we switch our entire data stack to a specific vendor. He has impressive credentials. Should I trust this recommendation?"
Process:
1. Mode: DEFENSE.
2. Question 1 — Is this authority truly an expert? Credentials confirmed: 15 years in data engineering, published in relevant trade publications. Domain match: yes, data infrastructure is the specific domain. Verdict: genuine expert.
3. Question 2 — How truthful can we expect them to be here? Consultant is a certified implementation partner of the vendor they're recommending. This is a direct financial conflict of interest — implementation partner status pays a referral or implementation fee. They have not disclosed this relationship proactively. No evidence of arguing against their own interest anywhere in the recommendation.
4. Verdict: "Genuine expert, but significant undisclosed conflict of interest. Recommendation deserves scrutiny. Request vendor-agnostic analysis, ask directly about partner relationships, and obtain a second opinion from a consultant with no vendor ties before deciding."
Output: Verification recommendation with specific follow-up questions.
## References
- For detailed evidence tables and study citations for all three authority symbol types, see [authority-symbol-evidence.md](references/authority-symbol-evidence.md)
- For the two-question defense protocol with extended examples across domains, see [two-question-defense.md](references/two-question-defense.md)
- Source: *Influence: The Psychology of Persuasion*, Robert B. Cialdini, Chapter 6 "Authority: Directed Deference," pages 157–177
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Influence Psychology Of Persuasion by Unknown.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/authority-symbol-evidence.md
# Authority Symbol Evidence
Detailed evidence, study data, and application notes for each of the three authority symbol types identified in Cialdini's Chapter 6.
---
## Symbol Type 1: Titles
### Mechanism
Titles are the most difficult symbol to legitimately earn and the easiest to fake. Earning a real title typically requires years of work and achievement. Yet claiming a title — or implying one — requires nothing. This asymmetry is why titles are both powerful and vulnerable to exploitation.
The mechanism is automatic: when a person carries a recognized title, others defer to their directives without consciously deliberating. The title functions as a trigger that activates a pre-learned response pattern. The response was originally adaptive — deferring to recognized experts usually produces better outcomes than not — but the trigger can be pulled by a label alone.
### Compliance Data: The Hospital Nurse Study
**Study:** Researchers (Hofling et al., referenced in Cialdini Chapter 6) contacted 22 separate nurses' stations across surgical, medical, pediatric, and psychiatric wards via phone.
**Procedure:** A caller identified himself as "Dr. Smith" (a physician the nurses had never met, seen, or spoken with before). He directed each nurse to administer 20mg of a drug called Astrogen to a specific ward patient.
**Four reasons the order should have been refused:**
1. The prescription was transmitted by phone, violating hospital policy
2. Astrogen was unauthorized — not cleared for use and not on the ward stock list
3. The prescribed dosage (20mg) was dangerously excessive; the maximum daily dose printed on the container was 10mg — half what was ordered
4. The directive came from a person the nurse had never interacted with
**Result:** 95% of nurses went directly to the medicine cabinet, obtained the Astrogen, and started toward the patient's room to administer it. They were stopped only by a hidden observer who revealed the experiment.
**Key conclusion from researchers:** "One of these intelligences [the doctor's or the nurse's] is, for all practical purposes, nonfunctioning." Highly trained professionals effectively suspended their own judgment in the presence of a title.
**Application implication:** The title "doctor" was conveyed by nothing more than a voice on a phone claiming it. No visual verification, no prior relationship, no credentials reviewed. This is the minimum viable authority signal — and it achieved a 95% compliance rate in a high-stakes professional context.
### Height Perception Study
**Study:** A man was introduced to five separate classes of Australian college students with different status designations (student, demonstrator, lecturer, senior lecturer, professor).
**Result:** Perceived height increased by an average of half an inch per status level. As "professor," he was seen as 2.5 inches taller than as "student" — without any change in his actual height.
**Mechanism:** Size and status are cognitively linked. Higher status objects (coins with higher monetary value, for example) are perceived as physically larger. The brain conflates importance with size, so higher authority triggers literal perceptual distortion.
**Application implication:** A more specific, senior-sounding title does not just communicate credentials — it literally makes the person seem more substantial. "Principal Consultant" registers physically differently than "Consultant" to the reader's automatic processing system.
### The Milgram Obedience Experiments
**Setting:** Yale University, psychology department. Researcher wore a gray lab coat and carried a clipboard.
**Procedure:** Participants ("Teachers") were directed by the researcher to deliver increasingly severe electric shocks to a "Learner" (actually an actor) whenever the Learner answered incorrectly. Shocks incremented by 15 volts per error, reaching a maximum of 450 volts. The researcher simply wore a lab coat and used verbal prompts ("Please continue," "The experiment requires that you continue") when participants hesitated.
**Predicted compliance:** Groups of colleagues, graduate students, and psychology majors at Yale predicted that 1-2% of subjects would proceed to the maximum shock. A panel of 39 psychiatrists predicted that only 1 in 1,000 subjects would comply.
**Actual result:** 65% of subjects delivered the maximum 450-volt shock when instructed by the lab-coated researcher — despite the Learner's audible screaming, demands to be released, and finally falling silent. Not one of the 40 subjects quit when the victim first demanded release. None quit when begging began. None quit when the Learner described having a heart condition.
**Key control that confirms authority as the mechanism:**
- When the researcher and victim switched scripts (the researcher told Teachers to stop; the victim insisted they continue) — 100% of subjects refused to continue. The fellow subject had no authority; the researcher's absence removed the authority effect.
- When two researchers issued conflicting orders, subjects became paralyzed ("Wait, wait. Which is it going to be?") and eventually followed their own judgment. Conflicting authority neutralized the automatic compliance mechanism.
**Milgram's conclusion:** "It is the extreme willingness of adults to go to almost any lengths on the command of an authority that constitutes the chief finding of the study."
**Application implication for content:** Authority symbols in content work through the same mechanism — not by forcing evaluation but by triggering automatic deference. The audience is not thinking "is this person credible?" — they are reacting. Designing authority signals means engineering the trigger, not writing an argument.
### Systematic Underestimation
Consistently across all authority studies, people predict they will be less influenced than they actually are. This underestimation:
- Lulls people into false confidence when evaluating authority claims (they believe their own scrutiny will protect them when it often won't)
- Makes authority signals more effective than audiences would consciously admit
- Means the real persuasive value of authority credentials in content is higher than most marketers assume
---
## Symbol Type 2: Clothes and Visual Authority Signals
### Mechanism
Clothing and visual presentation function as authority signals because they are observable before any content is processed. They prime the audience's cognitive frame before a word is read. In digital content, the equivalent signals are: language register, institutional associations, publication outlets, visual design quality, and context photography.
### Compliance Data: The Uniform Study (Bickman)
**Study:** Social psychologist Leonard Bickman asked passersby on the street to comply with odd requests (pick up a discarded paper bag, stand on the other side of a bus-stop sign).
**Conditions:**
- Requester dressed in normal street clothes
- Requester dressed in a security guard's uniform
**Result:** "Nearly all" pedestrians complied with the uniformed requester; fewer than half complied with the same person in street clothes.
**Specific sub-study:** The requester stopped pedestrians and pointed to a man at a parking meter 50 feet away, asking them to give the man a dime because he was overparked. The requester then walked away. Even after the requester was out of sight and could no longer monitor compliance, pedestrians who had received the instruction from the uniformed requester almost all complied. Fewer than half complied when the requester had been in street clothes.
**Quantified compliance rates:**
- Street clothes: 42% compliance (actual). College students estimated 50% — overestimating by 8 points.
- Uniform: 92% compliance (actual). College students estimated 63% — underestimating by 29 points.
**Key finding:** Students could roughly estimate the street-clothes rate but massively underestimated the uniform effect — consistent with the systematic underestimation pattern.
### Compliance Data: The Business Suit Jaywalker
**Study:** A 31-year-old man crossed a street against traffic signals repeatedly. In half the instances, he wore a freshly pressed business suit and tie. In the other half, a work shirt and trousers.
**Result:** Three and a half times as many pedestrians followed the suited jaywalker into traffic as followed the casually dressed one.
**Application to written content:** Language register functions as the content equivalent of a suit. Precise, confident, expert-register language creates the same authority signal that a well-tailored suit creates in person. Hedging language ("I think," "it seems like," "in my opinion") is the equivalent of a work shirt. Authoritative assertion is the equivalent of the suit.
### The Bank Examiner Fraud Pattern
A documented fraud scheme illustrates how clothing symbols (uniform + business suit together) compound:
1. A man in a conservative three-piece business suit presents as a bank examiner
2. He later sends a message via a uniformed "bank guard"
3. Victims comply with large financial requests without verification
Cialdini: "A pair of bunco artists who have recognized the capacity of carefully counterfeited uniforms to click us into mesmerized compliance with 'authority.'"
The two types of authority apparel — guard uniform and business suit — were "combined deftly by confidence men." The compound effect of multiple authority symbols working together is greater than any single one.
---
## Symbol Type 3: Trappings
### Mechanism
Trappings are status objects — luxury goods, prestigious office environments, expensive accessories, high-status vehicles — that signal success and achievement indirectly. They say "this person is important enough to have valuable things" rather than making a direct claim. They activate status-based deference through environmental cues.
### Compliance Data: The Luxury Car Study
**Study:** Conducted in the San Francisco Bay Area. A new luxury car was stopped at a green traffic light. Motorists behind waited before honking.
**Condition 1 — Economy car:** Nearly all motorists honked quickly. Two motorists rammed the car's rear bumper when it didn't move.
**Condition 2 — Luxury car:** 50% of motorists waited without honking at all until the luxury car drove on. The rest honked, but waited significantly longer than they did with the economy car.
**College student predictions:** Male students consistently predicted they would honk faster at the luxury car than the economy car — the opposite of what they actually did. This is the most extreme documented example of the systematic underestimation effect.
**Mechanism:** The luxury car served as a status symbol that triggered automatic deference. Motorists did not think "this person deserves more time at the light." They reacted. The trapping short-circuited the usual irritation response.
### Application to Content Trappings
The digital equivalent of luxury trappings: prestigious client logos, notable press mentions, bestseller labels, award badges, event photography from high-status venues (main stage at recognized conferences), and association with recognized institutions.
**Why specificity matters for trappings:**
- "Fortune 500 clients" → some deference (vague signal)
- "Nike, Salesforce, and Shopify" → strong deference (specific, recognizable brands the audience already trusts)
The trapping's signal strength is proportional to how well the audience recognizes and already defers to the associated symbol. A press mention in a trade publication the audience has never heard of contributes nothing. A mention in the publication the audience reads weekly functions as a third-party authority endorsement.
### Compound Trappings Effect
When multiple trappings point at the same conclusion (expert in X domain), they create convergent social proof. The audience's implicit reasoning: "All of these independent signals are saying the same thing — this person is an authority." Multiple weak trappings from a coherent domain are more persuasive than one strong trapping from a mismatched domain.
---
## Compound Authority: How All Three Work Together
The most effective authority presentations use all three symbol types in convergence. Cialdini notes that con artists specifically combine title, clothes, and trappings because each amplifies the others:
> "Con artists, for example, drape themselves with the titles, clothes, and trappings of authority. They love nothing more than to emerge elegantly dressed from a fine automobile and to introduce themselves to their prospective 'mark' as Doctor or Judge or Professor or Commissioner Someone. They understand that when they are so equipped, their chances for compliance are greatly increased."
**For legitimate expert positioning:** The same convergence principle applies ethically. A consultant who:
- Leads with a specific, domain-matched title (Symbol Type 1)
- Uses expert-register language and is photographed in context (Symbol Type 2)
- Features three recognizable client logos and a major press mention (Symbol Type 3)
...creates a coherent authority signal that activates automatic trust across all three channels simultaneously.
---
*Source: Robert B. Cialdini, Influence: The Psychology of Persuasion, Chapter 6 "Authority: Directed Deference," pages 157–177*
FILE:references/two-question-defense.md
# Two-Question Defense Protocol
A structured framework for evaluating authority claims to avoid automatic deference to authority symbols that may not represent genuine expertise or trustworthy counsel.
---
## Why a Defense Protocol Is Necessary
The core problem with authority influence is not that it works, but that it works automatically — even on people who know about it. In study after study, both the amount of influence and the direction of its effect were misestimated by the people being influenced:
- Milgram predicted 1-2% compliance with full shock delivery; actual rate was 65%
- Luxury car study: students predicted they would honk faster at the prestige car, not slower
- Uniform study: students estimated 63% compliance for uniformed requester; actual rate was 92%
These are people who were told about the study design and asked to predict their behavior. They still underestimated the effect. This means that awareness of authority influence alone is insufficient protection. Deliberate cognitive intervention — a protocol — is required.
The two-question framework gives that intervention a structure. It converts automatic deference into a brief deliberation by forcing two specific questions that cannot be answered without actual thinking.
---
## The Two Questions
### Question 1: "Is this authority truly an expert?"
**What this question does:** It redirects attention from the symbols of authority (the title, the lab coat, the luxury car, the impressive bio) to the substance — actual credentials and their relevance to this specific claim.
**Two components to evaluate:**
**Component A — Credentials:** Are the claimed credentials real and verifiable?
- Can the title, degree, certification, or institutional affiliation be independently confirmed?
- Is this a genuine credential or a symbol of a credential? (The actor Robert Young had the title "M.D." associated with him through his TV role — he was not a physician. The credential was a cultural association, not a real qualification.)
- Is the authority claiming something beyond their credentialed area without acknowledging it?
**Component B — Domain match:** Are the credentials relevant to this specific claim?
- A cardiologist claiming authority on dietary policy for the general population: credentials are real, but dietary science is not cardiology — this is an authority mismatch
- A successful entrepreneur claiming authority on the specific technical architecture decisions of a regulated financial institution: business success is real, but this domain requires specific regulatory and technical expertise
- A cybersecurity expert claiming authority on the marketing strategy of a cybersecurity company: the domain mismatch goes the other way — real expertise in the technical domain does not automatically transfer to marketing
**The Marcus Welby trap:** When Robert Young appeared in Sanka coffee commercials as a doctor figure, the campaign was extraordinarily successful — "selling so much coffee it was played for years in several versions." Young had no medical credentials. The audience's click-whirr response activated on the symbol (the cultural association with a doctor character) not the substance.
**The business-suited jaywalker version:** Three and a half times as many pedestrians followed a business-suited jaywalker into traffic as followed a casually dressed jaywalker. Even if the suited man were a genuine business authority, he was not a greater authority on crossing streets than other pedestrians. The authority symbol activated compliance in a domain where it carried no actual relevance.
**Verdict options for Question 1:**
- Confirmed expert in this specific domain — proceed to Question 2
- Credentials unverified — pause and verify before acting, or proceed with explicit uncertainty
- Credentials real but domain mismatch — weight their input appropriately to their actual domain; do not defer on the mismatch
---
### Question 2: "How truthful can we expect this authority to be here?"
**What this question does:** It accounts for the fact that even genuine, domain-relevant experts have incentives that may not align with honest recommendation. An expert who is telling the truth about their credentials may still be misleading you about the substance of their advice.
**What to assess:**
**Conflict of interest:**
- What does this authority stand to gain if you follow their recommendation?
- Are they a paid vendor, certified partner, or commission-paid advisor for what they're recommending?
- Have they disclosed their financial interest proactively, or is it only visible through investigation?
- The consultant recommending a vendor whose implementation they get paid to do has a direct conflict. The celebrity doctor endorsing a product for payment has a direct conflict. These are not disqualifying — experts can have conflicts and still give honest advice — but they require conscious accounting.
**Evidence completeness:**
- Is the authority presenting the full landscape of options, or only the evidence that supports their recommendation?
- Are they acknowledging limitations, edge cases, or contexts where their recommendation would not apply?
- A trustworthy expert volunteers uncertainty; a compliance-motivated expert (or a motivated expert) presents only the confirming evidence
**The self-deprecation signal:**
Paradoxically, one of the best signals of trustworthiness is voluntary acknowledgment of limitations. Cialdini identifies this as a deliberate tactic used by sophisticated compliance professionals — arguing against their own interest on minor points to establish credibility for major claims. But it is also a genuine signal when not staged:
- An expert who says "my approach works well in X context, but if you're doing Y, I'd actually recommend someone else" is signaling that their recommendation is not purely self-interested
- A vendor who volunteers disadvantages of their own product before you ask demonstrates confidence in their full value proposition and signals they're not hiding things
- The waiter who recommends the cheaper dish (Vincent in Cialdini's account) earns trust on wine and dessert selections later
The absence of any self-deprecation or acknowledged limitation is a mild negative signal, especially in high-stakes recommendations.
**Verdict options for Question 2:**
- High trustworthiness: no material conflict of interest, acknowledges limitations, presents full landscape → weight recommendation heavily
- Conflict of interest present: genuine expert but incentive misalignment exists → seek a second opinion or explicitly probe the conflict ("Given that you're an implementation partner, how would you advise us to evaluate vendor-neutral alternatives?")
- Truthfulness uncertain: incomplete picture, no acknowledgment of limitations, or undisclosed interests → treat as advocacy rather than expertise; verify independently
---
## Combined Verdicts
| Q1 Result | Q2 Result | Recommended Action |
|-----------|-----------|-------------------|
| Confirmed expert, domain match | High trustworthiness | Follow the authority's guidance on this claim |
| Confirmed expert, domain match | Conflict of interest | Weight advice; seek vendor-neutral second opinion |
| Confirmed expert, domain match | Truthfulness uncertain | Treat as one data point; verify independently |
| Credentials unverified | Any | Pause; do not act until credentials are confirmed |
| Domain mismatch | Any | Discount recommendation proportionally to domain gap |
| Symbol only (no real credentials) | N/A | Disregard authority signal; evaluate on merits only |
---
## Common Authority Defense Situations
### Medical or health advice
Q1: Is this a physician? Is the claim within their specific specialty, or outside it? (A dermatologist recommending a sleep supplement is a domain mismatch.)
Q2: Are they paid by a pharmaceutical company for this recommendation? Does their practice benefit financially from this specific recommendation?
### Vendor or consultant recommendations
Q1: Do their credentials specifically cover the domain of their recommendation, or are they generalists with adjacent experience?
Q2: Are they a reseller, implementation partner, or referral beneficiary of what they're recommending? Have they disclosed this?
### Thought leader or influencer content
Q1: What is the specific basis of their claimed expertise? (A large audience is not a credential — it is a measure of distribution, not expertise.)
Q2: Are they paid by brands whose products they recommend? Do they disclose sponsorships or conflicts?
### Investment or financial advice
Q1: Are they a licensed financial professional? Is the investment type within their licensed scope?
Q2: Do they earn commissions on the products they're recommending? Are they selling what they're recommending?
### Expert witness or authority cited in argument
Q1: What are their specific credentials, and are those credentials in the exact domain of the claim they're being cited for?
Q2: Who engaged them and paid for their testimony or opinion? What position were they hired to support?
---
## What Defense Does Not Mean
Defense against authority influence does not mean reflexive skepticism toward all authority. Cialdini is explicit on this: "We shouldn't want to resist altogether, or even most of the time. Generally, authority figures know what they are talking about. Physicians, judges, corporate executives, legislative leaders and the like have typically gained their positions because of superior knowledge and judgment. Thus, as a rule, their directives offer excellent counsel."
The goal of the defense protocol is:
- To recognize when automatic deference is appropriate (verified expert, no conflict, relevant domain)
- To recognize when it is not (unverified symbols, domain mismatch, conflict of interest)
- To make the decision consciously rather than automatically
A positive result from the two questions — genuine expert, domain match, no material conflict — is an authorization to defer. The questions are a gate, not a barrier.
---
*Source: Robert B. Cialdini, Influence: The Psychology of Persuasion, Chapter 6 "Authority: Directed Deference," pages 172–177*
Use this skill to design a viral or referral loop for a post-PMF product by extracting the PATTERN (not the tactic) from canonical case studies — Dropbox bil...
---
name: viral-loop-designer
description: "Use this skill to design a viral or referral loop for a post-PMF product by extracting the PATTERN (not the tactic) from canonical case studies — Dropbox bilateral-incentive referral, Hotmail email-signature instrumented virality, Airbnb Craigslist cross-posting, LinkedIn public-profile SEO virality — and adapting it to the user's product. Classifies the mechanism type (word-of-mouth amplification, instrumented virality, embedded referral), models K-factor (invitations × conversion rate) and cycle time, and produces a viral-loop-design.md with the loop diagram plus a referral-test-plan.md for experiment execution. Flags the viral spam trap (when sharing becomes annoying and hurts the brand). Triggers when a growth PM asks 'how do I build a referral program like Dropbox?', 'viral loop design', 'how do I make my product viral', 'referral program', 'word of mouth growth', 'K-factor', 'viral coefficient', 'invite mechanism', 'should I offer referral rewards', 'bilateral incentive', 'Hotmail email signature', 'Airbnb Craigslist hack', 'LinkedIn public profiles', or 'viral loop testing'. Also activates for 'our referral program isn't working', 'viral spam trap', or 'refer a friend program design'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/viral-loop-designer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [5, 9]
tags:
- growth
- viral-growth
- referral
- acquisition
- startup-ops
depends-on:
- acquisition-channel-selection-scorer
- north-star-metric-selector
execution:
tier: 1
mode: plan-only
inputs:
- type: document
description: >
Product brief (product-brief.md) with description and ideal customer profile.
Optional: current-referral-data.md describing any existing referral mechanism,
referral send rate, activation rate, and opt-out/spam complaint rate.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set. Plan-only — produces a viral loop design document and a
referral test plan. No code execution required.
discovery:
goal: >
Produce a viral-loop-design.md with a loop diagram, K-factor model, and
mechanism classification, plus a referral-test-plan.md for experiment
execution — tailored to the user's product, not a generic referral template.
tasks:
- "Read product brief and optional referral data"
- "Assess viral capability"
- "Classify best mechanism type"
- "Map matching case-study pattern"
- "Draft loop diagram with K-factor model"
- "Design first experiment"
- "Flag the viral spam trap"
- "Emit deliverables"
---
# Viral Loop Designer
Structured design of a viral or referral loop for a post-PMF product. Classifies the
mechanism type, maps the closest canonical pattern (Dropbox, Hotmail, Airbnb, LinkedIn)
to the user's product, models K-factor and cycle time, and produces a loop diagram with
a test plan — all grounded in pattern extraction, not tactic copying.
---
## When to Use
Use this skill when:
- You want to design or redesign a referral or viral growth mechanism
- Your team is asking "how do we grow like Dropbox?" or "should we build a referral program?"
- A referral program exists but isn't producing measurable results
- You want to evaluate whether your product has viral potential before investing in a loop
- You need to choose between word-of-mouth amplification, instrumented virality, and
explicit bilateral incentives
**Prerequisites:**
- Product/market fit confirmed (must-have score ≥ 40% "very disappointed" or stable
retention curve). Virality on a product that doesn't deliver the aha moment accelerates
churn, not growth — users arrive, experience nothing, leave, and tell no one to join.
See `product-market-fit-readiness-gate` to confirm before proceeding.
- A defined North Star Metric. The viral loop must compound the metric that reflects real
value delivered — not invites sent or accounts created. See `north-star-metric-selector`.
---
## Context and Input Gathering
Read the following before beginning:
1. **product-brief.md** (required) — Product description, ideal customer profile,
core value proposition, current growth stage, and any referral history.
2. **current-referral-data.md** (optional) — Existing referral mechanism description,
referral send rate, activation rate from referrals, and opt-out or spam complaint rate.
If not provided, note its absence and proceed with product-brief analysis only.
If either document is missing critical information (no ICP, no value proposition, no
growth stage), ask one targeted question before proceeding. Do not design a loop for
an underspecified product — mechanism choice depends on product structure, not generic
best practices.
---
## Process
### Step 1 — Read the Product Brief
Read product-brief.md and optional referral data. Extract:
- What the product does and for whom
- What triggers the aha moment (the first experience of core value)
- Whether the product has any existing sharing or invite behavior
- Current acquisition channels and growth stage
**Why:** Viral loop design is downstream of product understanding. The mechanism type
you choose (Step 3) depends entirely on whether the product naturally creates sharing
occasions, whether it benefits from more users joining, and whether its core value can
be embedded in an invite.
---
### Step 2 — Assess Viral Capability
Evaluate whether the product has structural conditions for viral growth. Answer these
three questions explicitly:
1. **Network effect?** Does the product become more valuable as more users join?
(Social networks, marketplaces, and messaging apps: yes. Grocery store apps,
single-player tools: typically no.)
2. **Natural sharing occasions?** Does using the product create moments where users
would naturally tell others or share an output? (File sharing, event tickets, payments,
content — yes. Personal productivity tools — often no.)
3. **Incentive alignment?** Can a referral incentive be tied directly to the product's
core value rather than bolted on as unrelated cash?
**Verdict:** If all three are weak, flag this explicitly. Virality is possible through
word-of-mouth amplification, but embedded referral or instrumented virality will
require heavy investment for modest returns. Recommend focusing on organic and paid
channels (see `acquisition-channel-selection-scorer`) and returning to viral design
after retention has stabilized further.
**Why:** Not every product goes viral. The mechanism must match the product's structure.
Designing an embedded referral loop for a product with no network effect and no natural
sharing occasion wastes engineering cycles and can damage user trust if the incentive
feels misaligned.
---
### Step 3 — Classify the Mechanism Type
Based on the viral capability assessment, classify the best-fit mechanism:
**Word-of-Mouth Amplification**
- Organic sharing driven by genuine product delight — not engineered
- Can be accelerated through Net Promoter Score programs, testimonials, community
building, and public-facing content seeding (e.g., Upworthy's catchy headline system)
- Best for: products with exceptional product/market fit but low natural sharing occasion
- K-factor impact: low to moderate, but high lifetime value per referred user
- Cycle time: long (unpredictable)
**Instrumented Virality**
- Product features engineered to mechanically expose new users to the product as a
side-effect of core usage
- The invite is embedded in the product's output — users need do nothing extra
- Best for: products whose output is inherently shareable or distributable
(email, documents, event listings, payments, design files)
- K-factor impact: can be very high due to high frequency; payload is often low
- Cycle time: short (continuous, passive)
- Examples: Hotmail email signature, Airbnb Craigslist cross-posting
**Embedded Referral**
- Explicit "invite friends" program with a designed incentive — bilateral (both parties
rewarded) or unilateral (only the referrer rewarded)
- Requires active user participation — users must consciously invite
- Best for: products with network effects or storage/capacity mechanics where users
genuinely benefit from more people joining
- K-factor impact: moderate, controlled by incentive quality and program visibility
- Cycle time: moderate (days to weeks from invite to activation)
- Example: Dropbox free storage for both referrer and referee
**Why:** Mechanism classification determines the build complexity, the experiment
sequence, and the failure modes to watch for. Choosing instrumented virality for a
product without shareable output produces expensive engineering work with zero result.
Choosing embedded referral for a product without incentive alignment produces low
conversion and user annoyance.
---
### Step 4 — Map the Closest Case-Study Pattern
Match the product to the canonical pattern that is structurally closest. Extract the
pattern — the structural logic — not the tactic.
**Dropbox Pattern (Bilateral Embedded Referral)**
- Structural logic: collaborative product + network effect + product-native incentive
(more storage = more value to the user) + near-zero marginal cost of the incentive.
- Apply when: users collaborate with non-users using your product; giving more product
resource as reward costs you little; incentive is hard to compare to cash effort.
- Key insight: the incentive must be product-native. Storage feels more generous than
its cost because users can't easily price it. Cash is easy to benchmark against effort.
**Hotmail Pattern (Passive Instrumented Virality)**
- Structural logic: every use generates output that reaches non-users; embed the
conversion invitation in that output; make signup require one click and thirty seconds.
- Apply when: the product's output (email, document, invoice, form, booking) is
inherently visible to non-users.
- Key insight: friction at the invitation collapses the funnel. The Hotmail link resolved
to immediate value — free email — in under a minute. Zero user action required to share.
**Airbnb Pattern (Cross-Platform Distribution)**
- Structural logic: your content lives on your platform; insert it into a higher-traffic
adjacent platform where the target audience already searches; no user action required.
- Apply when: your product has user-generated listings or content; an adjacent platform
hosts your target audience's searches; cross-posting carries manageable platform risk.
- Key insight: instrumented virality through platform arbitrage. Build it, measure it,
but treat it as a channel — not a permanent acquisition architecture.
**LinkedIn Pattern (SEO-Driven Profile Virality)**
- Structural logic: users create data inside your product; making it publicly indexable
converts your user base into a permanent SEO acquisition surface.
- Apply when: your product has user-generated data others search for by name, expertise,
or topic; making it public does not violate privacy expectations or compliance.
- Key insight: this is a distribution architecture decision, not a referral program.
Loop: user creates data → indexed → non-user finds via search → converts. Cycle time
is long (months) but compounding is durable and zero marginal cost.
**Why:** Copying tactics without understanding structural logic produces failure.
Dropbox worked not because bilateral incentives are magic, but because file storage
benefits from more users joining, the incentive was product-native, and marginal cost
was near zero. Map the structural logic — then validate fit — before committing to build.
---
### Step 5 — Draft the Loop Diagram with K-Factor Model
Produce a written loop diagram (steps with arrows) and a K-factor model.
**Loop diagram format:**
```
[Trigger] → [Share Action] → [Recipient Exposure]
→ [Recipient Conversion] → [New User Activates] → [Loop Repeats]
```
Label each step with: who acts, what they do, where friction exists, and the drop-off
risk at each transition.
**K-factor model (fill in estimates before any engineering is committed):**
```
K = (avg invites per active user) × (invitation-to-signup conversion rate)
Virality = Payload × Conversion Rate × Frequency
K > 1.0 — compounding (rare); K 0.5–1.0 — strong supplement; K < 0.1 — redesign first
```
**Cycle time:** How many days from signup to first invite sent? Shorter cycles compound
faster than a higher K with long cycles.
**Why:** Teams that skip modeling discover after building that K = 0.048 and wonder why
growth did not change. Model first — the estimates reveal whether the mechanism is worth
building or whether the incentive needs redesigning before engineering begins.
---
### Step 6 — Design the First Experiment
Choose the lowest-cost test that validates the most important assumption in the loop.
Hierarchy of testability (cheapest first):
1. **Incentive/message test** — Does the incentive resonate before building? Test with
a landing page or email offer. Compare product-resource reward vs. cash discount.
2. **Invite mechanic stub** — Can you test the sharing flow manually before engineering
it? Have users email friends manually; track conversion.
3. **Minimum loop build** — One sharing path, one incentive, one recipient landing page.
No gamification, no optimization. Just measure K.
Specify: hypothesis ("If we offer [incentive] to both parties, [X]% of active users will
send at least one invite within 14 days"), success K threshold, and three metrics to
track: referral send rate, referral-to-activation rate, time-to-first-invite.
**Why:** Dropbox discovered the collaboration framing outperformed the storage framing
only through testing — not predictable in advance. LinkedIn found four invites optimal
vs. two or six through experiment. Building the full system before testing the incentive
produces an expensive loop with an unvalidated conversion assumption at its core.
---
### Step 7 — Flag the Viral Spam Trap
Explicitly assess whether the proposed loop design risks crossing from helpful to
annoying. Check all three signals:
**Spam trap indicators:**
- The mechanism increases payload by requiring or tricking users into sending invites
to their full contact list rather than selected people
- The recipient experience hits a hard authentication wall before delivering value
- Referral activation rate is below 5% (most invites generate no engagement)
- App store reviews or social media mention "spam" in connection with the product
**Rules:**
- Never increase payload by adding friction-removing dark patterns (pre-checked contact
lists, repeated prompts, misleading invite copy). Short-term volume gains produce
long-term brand damage that is expensive to recover from.
- Measure the recipient experience explicitly before scaling invite volume. An invite
that annoys the recipient destroys two relationships: the recipient's potential
conversion and the referrer's trust in the brand.
- If opt-outs or spam complaints rise while referral volume rises, cut invite frequency
immediately. Growth at the cost of brand is negative-value growth.
**Why:** BranchOut bypassed Facebook's invite limit and grew from 4M to 25M users in
three months through engineered viral spam. Then lost 4%+ of monthly active users per
day as users experienced a hollow product and their spammed contacts never engaged.
Despite $50M in funding, BranchOut never recovered. The viral spam trap compounds:
every spammed contact is a burned conversion opportunity and a brand impression that
signals low quality.
---
### Step 8 — Emit Deliverables
Write two files:
**viral-loop-design.md** — Contains:
- Mechanism type (word-of-mouth amplification / instrumented virality / embedded referral)
- Matched case-study pattern and structural logic
- Loop diagram (steps with friction labels)
- K-factor model with estimates for payload, conversion rate, frequency
- Cycle time estimate
- Viral capability assessment verdict (GREEN / YELLOW / RED)
**referral-test-plan.md** — Contains:
- First experiment hypothesis and success/failure threshold
- Testability hierarchy decision (why this experiment before the full build)
- Measurement plan (metrics, sample size, timeline)
- Spam trap checklist (pre-flight check before scaling any invite volume)
**Why:** Separating the design document from the test plan allows the design to be
reviewed and revised independently of the experiment setup — and allows the experiment
to be handed to an engineer or growth PM who doesn't need to re-read the full design
reasoning to set up the test.
---
## Key Principles
**Virality requires product-level fit — not every product goes viral.**
Network-effect and sharing-native products have structural virality advantage. Products
without these characteristics can still grow through word-of-mouth, but embedded
referral loops will underperform unless the product is genuinely must-have and the
incentive is deeply product-native.
**Bilateral incentives consistently outperform unilateral.**
If only the referrer benefits, the referral is a transaction the recipient did not agree
to. If both parties benefit, the referrer is doing their contact a favor. The social
dynamic changes from extraction to generosity, and conversion improves accordingly.
Dropbox's bilateral storage offer worked because both sides received something they
genuinely wanted — not a promotional discount on something they hadn't asked for.
**Cycle time compounds — shorter cycles beat bigger incentives.**
A K of 0.6 with a 3-day cycle time produces more compounding over 90 days than a K of
0.8 with a 25-day cycle time. Invest in reducing the time from new-user-signup to
first-invite-sent. Friction in the share flow (too many steps, unclear incentive,
buried UI) extends cycle time. Visibility and integration (Uber's referral prompt on
the active ride screen, LinkedIn's connect prompt at sign-up) compresses it.
**K > 1 means compounding growth; K < 1 is an acquisition supplement, not an engine.**
K > 1.0 is rare and typically short-lived. A K of 0.5 consistently is excellent — it
means 50% of your growth comes from the loop. Communicate this clearly with leadership.
"Referral" does not mean "free growth that replaces paid acquisition" — it means one
cost-efficient acquisition channel that should be part of a diversified mix.
**The viral spam trap destroys brand faster than it acquires users.**
Every spammed contact is a burned conversion and a negative brand impression. Dark
patterns (pre-checked contact lists, deceptive invite copy, forced bulk sharing) produce
short-term volume spikes and long-term brand damage. BranchOut is the named cautionary
case. Measure recipient activation rate before scaling invite volume; if it is below 5%,
improve the incentive quality, not the send frequency.
**Extract the pattern, not the tactic — then validate the fit.**
Dropbox's bilateral storage incentive worked because of four specific structural
conditions: collaborative product, network effect, product-native incentive, near-zero
marginal cost. Copying the bilateral incentive structure without those conditions
produces an expensive referral program with low conversion. Map the structural logic to
your product before committing to any mechanism.
---
## Examples
### Example A: Embedded Referral for a B2B SaaS Project Management Tool
**Product:** A post-PMF project management SaaS. Teams use it together — inviting
teammates is a natural part of onboarding. Network effect is strong: more team members
using it makes it more valuable for the person who invited them.
**Viral capability assessment:** GREEN. Strong network effect (collaborative by design).
Natural sharing occasion (teammate invitation is required for core use). Incentive
alignment possible (free seats / extra storage / premium features as reward).
**Mechanism type:** Embedded Referral (bilateral incentive).
**Pattern match:** Dropbox. The product is collaborative; getting others on the platform
improves the referrer's experience; giving away additional seats has near-zero marginal
cost at volume.
**Loop diagram:**
```
[User A joins and activates] → [Prompted to invite teammates during onboarding]
→ [Teammate receives email with credit offer for both A and teammate]
→ [Teammate signs up, activates, experiences aha moment]
→ [Teammate is prompted to invite their own team members]
→ [Loop repeats]
```
**K-factor model:**
- Payload: 3.2 (average teammates invited per active inviter)
- Conversion rate: 22% (teammates who accept and activate)
- K = 3.2 × 0.22 = 0.70
- Frequency: once per new user at onboarding (low frequency, but high conversion)
- Virality = 3.2 × 0.22 × 1.0 = 0.70
**Cycle time:** ~7 days (invite sent day 1, teammate activates by day 7 on average).
**First experiment:** Test the incentive without building the full system. Email the
top 20% most active users. Offer one extra seat free (bilateral: both users get the
seat) if they invite a teammate this week. Track invite send rate and acceptance rate.
Success threshold: ≥ 30% invite rate from active users contacted, ≥ 15% teammate
acceptance. If below threshold, test a premium feature unlock instead.
**Spam trap check:** PASS. Invite is addressed to specific teammates by name. Recipient
receives a clear benefit. No dark patterns. Activation rate from teammates is expected
to be high because they are already working with the referrer.
---
### Example B: Instrumented Virality for a Content Publishing Platform
**Product:** A platform where professionals publish articles and case studies. Each
article is a public page. Non-users can read articles without signing up. The platform
has no social graph or network effect — reading an article is not improved by more
users joining.
**Viral capability assessment:** YELLOW. No network effect. Strong natural sharing
occasion (articles are meant to be shared). Incentive alignment is weak (bilateral
storage/credits feel disconnected from publishing). Instrumented virality is a better
fit than embedded referral.
**Mechanism type:** Instrumented Virality (LinkedIn pattern + Hotmail pattern hybrid).
**Pattern match:** LinkedIn public profiles. Every published article is already a public
page. The growth action is to make those pages fully searchable and to embed a conversion
call-to-action for non-authenticated readers.
**Loop diagram:**
```
[User A publishes article] → [Article is indexed by search engines (SEO surface)]
→ [Non-user B finds article via Google search]
→ [Non-user B reads article → sees "Write your own case study" CTA with free account]
→ [Non-user B signs up and publishes their first article]
→ [Article is indexed → new loop pass begins]
```
**K-factor model (adapted — this is SEO-driven, not invite-driven):**
- Payload: 0 (no explicit invite; distribution is passive)
- Conversion rate: 2.8% (non-authenticated readers who create an account)
- Frequency: articles are published ~2 times/month per active user; each article
receives on average 340 unique readers per month
- Estimated new signups per active user per month: 340 × 0.028 = ~9.5 new signups/month
- K (monthly, per active user): ~9.5 (a very different K calculation for SEO loops —
the payload is replaced by organic traffic volume)
**Cycle time:** Long (3–6 months for new articles to rank meaningfully). Invest in
on-page SEO optimization of article templates to compress this.
**First experiment:** Enable public indexing for the top 100 most-viewed articles.
Check Google Search Console in 30 days for impressions and clicks. Success threshold:
≥ 500 new organic sessions to those articles within 30 days. If yes, make all articles
public by default and instrument the "sign up to publish" CTA prominently on article pages.
**Spam trap check:** NOT APPLICABLE. This loop has no invite mechanism and generates
no unsolicited contact. Monitor for content quality degradation as sign-ups increase —
a different quality trap, not a spam trap.
---
## References
- `references/viral-loop-mechanics.md` — Detailed K-factor formula derivations, Sean
Parker's virality equation (Payload × Conversion Rate × Frequency), and cycle time
compounding models.
- `references/case-study-patterns.md` — Full structural analysis of the Dropbox,
Hotmail, Airbnb, and LinkedIn patterns with applicability criteria.
- `references/viral-spam-trap.md` — Detailed BranchOut case study, dark pattern
taxonomy, and detection checklist with metric thresholds.
---
## License
This skill is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source book content is referenced under fair use for educational purposes. Attribution:
Ellis, Sean and Brown, Morgan. *Hacking Growth*. Crown Business, 2017.
---
## Related BookForge Skills
- `clawhub install bookforge-acquisition-channel-selection-scorer` — viral loops are
one acquisition channel category; score them against organic and paid alternatives
before committing engineering resources.
- `clawhub install bookforge-north-star-metric-selector` — viral loops must compound
the North Star Metric, not a vanity metric like invites sent or accounts created.
- `clawhub install bookforge-product-market-fit-readiness-gate` — virality on a
product that people don't find must-have accelerates churn, not growth. Confirm PMF
before designing any viral loop.
- `clawhub install bookforge-growth-experiment-prioritization-scorer` — use the
experiment scorer to rank the viral loop experiment against other growth bets in
the team's backlog before committing build resources.
Use this skill to diagnose which retention phase (initial / medium / long-term) is broken for a user cohort and select the RIGHT type of intervention for tha...
---
name: retention-phase-intervention-selector
description: "Use this skill to diagnose which retention phase (initial / medium / long-term) is broken for a user cohort and select the RIGHT type of intervention for that phase — because retention tactics that work for week-1 users fail completely for month-6 users. Reads cohort retention data, classifies the phase using product-type benchmarks (mobile ~1 day, SaaS ~1 month, e-commerce ~90 days), identifies where the curve breaks, and prescribes phase-appropriate hacks: aha-moment optimization for initial, habit formation + variable rewards for medium, feature velocity + ongoing onboarding for long-term. Includes a resurrection branch for dormant users and flags the churn-masking-by-acquisition anti-pattern. Triggers when a growth PM asks 'users churn after the first week', 'our month 2 retention drops off a cliff', 'retention is bad but I don't know why', 'cohort analysis help', 'how do I improve retention', 'retention curve diagnosis', 'initial vs long-term retention', 'Balfour three-phase retention', 'habit formation', 'variable rewards Hooked', 'zombie users resurrection', 'win-back campaign', 'feature bloat', or 'churn hidden by acquisition'. Also activates for 'my DAU looks good but retention is bad', 'we're growing but churning', or 'retention-first growth'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/retention-phase-intervention-selector
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [7]
tags:
- growth
- retention
- cohort-analysis
- habit-formation
- startup-ops
depends-on:
- product-market-fit-readiness-gate
- north-star-metric-selector
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Retention cohorts CSV (retention-cohorts.csv) with weekly or monthly
cohort retention data. Product brief (product-brief.md) to classify
product type (mobile / SaaS / e-commerce / consumer).
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set + CSV. Produces retention-phase-diagnosis.md and
retention-intervention-plan.md.
discovery:
goal: >
Produce a diagnosis that names which retention phase is broken, explains
why, and prescribes phase-appropriate interventions — avoiding the common
mistake of applying initial-retention tactics to a medium-retention problem.
tasks:
- "Read cohort data and product brief"
- "Classify product type and corresponding phase boundaries"
- "Plot the cohort retention curve"
- "Identify which phase the curve breaks in"
- "Check for churn-masked-by-acquisition anti-pattern"
- "Prescribe phase-appropriate interventions"
- "Check feature velocity for long-term cohorts (bloat risk)"
- "Emit diagnosis and intervention plan"
---
# Retention Phase Intervention Selector
Diagnose which retention phase is failing for a cohort, then prescribe the interventions appropriate to that phase. The core insight: a week-1 drop-off and a month-6 decline have completely different root causes and require completely different treatments. Mixing them up wastes quarters of experiment cycles.
Research by Frederick Reichheld of Bain & Company found that a 5% increase in customer retention rates increases profits by 25 to 95 percent. Retention is not a vanity metric — it is a leverage point. But only if you know which lever to pull.
---
## When to Use
Use this skill when:
- A cohort analysis shows users are churning but you are not sure when or why
- Retention looks stable in aggregate but early cohorts are thinning out
- You are about to invest in retention experiments but have not diagnosed the phase
- A PM asks "should we improve onboarding, add features, or run win-back campaigns?" — the answer depends entirely on which phase is broken
- You suspect churn is being hidden by strong acquisition growth
- Long-term cohorts are declining despite high feature velocity
**Prerequisite signals:** Your product has cleared product-market fit (stable retention curve exists for at least one cohort segment). If the retention curve never stabilizes, run `product-market-fit-readiness-gate` first — this skill cannot fix a pre-PMF product.
---
## Context and Input Gathering
Before starting, collect:
1. **retention-cohorts.csv** — required. Weekly or monthly cohort retention table. Rows = cohort (e.g., "Jan 2024"), columns = periods after signup (Week 1, Week 2 ... or Month 1, Month 2 ...), values = percentage still active.
2. **product-brief.md** — required. Must state product type (mobile app / SaaS / e-commerce / consumer marketplace / other) and the North Star metric for retention. If missing, ask before proceeding — phase boundaries differ materially by product type.
3. **Optional:** churn survey data, exit interviews, user behavior event logs. These speed up root cause identification but are not blockers.
---
## Process
### Step 1 — Read cohort data and product brief
Read both input files. Extract:
- Product type (mobile / SaaS / e-commerce / other)
- Cohort time grain (weekly vs monthly)
- The raw retention percentages by cohort and period
**Why:** Phase boundaries are product-type-specific. Without knowing the product type, period labels ("Month 1") map to different phases. A Month 1 drop in a mobile app is medium-term. A Month 1 drop in a SaaS product is still initial. Getting this wrong causes the wrong intervention type.
---
### Step 2 — Classify product type and phase boundaries
Map the product type to phase boundary definitions:
| Product Type | Initial Phase Ends | Medium Phase Ends | Long-Term Begins |
|---|---|---|---|
| Mobile app | ~Day 1 | ~Week 2–4 | Month 1+ |
| Social network | ~Week 1–2 | ~Month 1–2 | Month 3+ |
| SaaS (subscription) | ~Month 1 or first quarter | ~Month 3–6 | Month 6 / Quarter 3+ |
| E-commerce | ~Day 90 (first 90 days) | ~Month 4–9 | Month 10+ / Year 2 |
| Consumer marketplace | ~Week 2 | ~Month 1–3 | Month 4+ |
These are reference benchmarks (sourced from Lean Analytics sector data and the Balfour three-phase model). Calibrate to your own cohort behavior: the initial phase ends where your churn rate begins to stabilize, not at a fixed calendar point.
**Why:** Applying a NUX optimization experiment to a 6-month cohort decline is a common and expensive mistake. The boundaries enforce discipline about what type of problem is being solved.
---
### Step 3 — Plot and read the retention curve
Construct the retention curve from the CSV. If no charting tool is available, build a text table:
```
Period: W1 W2 W4 W8 W12 W16
Jan cohort: 68% 52% 41% 39% 38% 37%
Feb cohort: 71% 55% 44% 38% 35% 32%
Mar cohort: 65% 41% 29% 22% 19% 17%
```
Identify:
- **Steep early slope:** Large drop between Period 1 and Period 2–3 (relative to the phase boundaries established in Step 2)
- **Mid-curve inflection:** Curve was stabilizing then resumed decline after 4–8 periods
- **Slow long-term erosion:** Curve never fully stabilizes; continues declining across all measured periods
- **Cohort-to-cohort degradation:** Later cohorts retain worse than earlier cohorts at the same period — signals a product or onboarding regression, not a phase problem
**Why:** The shape of the curve tells you which phase is responsible. Reading curve shape is more diagnostic than reading a single retention number. Cohort-to-cohort degradation is a separate failure mode that must be flagged before prescribing interventions.
---
### Step 4 — Diagnose which phase is broken
Apply the following decision rules:
**Initial phase broken:**
- Steep drop in the first 1–3 periods (relative to product-type phase boundary)
- Majority of churn occurring within the initial window
- Curve never reaches a meaningful plateau
- *Interpretation:* Users are not experiencing core value. This is an activation-extension problem.
**Medium phase broken:**
- Reasonable initial retention (40%+) followed by a drop-off at weeks 4–12 (for mobile) or months 2–5 (for SaaS)
- Curve plateaus briefly, then resumes declining
- *Interpretation:* Users experienced value initially but did not form a habit. The product has not become a default choice.
**Long-term phase broken:**
- Good initial and medium retention but slow, sustained decline over months or quarters
- Early cohorts declining while newer cohorts still look healthy (cross-cohort comparison)
- Feature velocity is high but retention is not improving
- *Interpretation:* Existing power users are losing engagement. Feature bloat or lack of discovery of advanced capabilities is a likely cause.
**Resurrection candidate:**
- A meaningful fraction of a cohort is dormant (not active for N+ periods) but not formally churned
- These users represent a lower-cost re-acquisition opportunity than new user acquisition.
Document the diagnosis in one clear sentence: "The [cohort name] cohort shows [phase] phase failure, characterized by [specific curve observation], which indicates [root cause hypothesis]."
**Why:** Naming the phase locks in the intervention type before generating experiment ideas. Without this gate, teams generate a mix of initial- and long-term tactics and run them simultaneously, making it impossible to isolate what worked.
---
### Step 5 — Anti-pattern check: churn masked by acquisition
Before prescribing interventions, check whether aggregate metrics are hiding the cohort-level problem.
Compute: for the three oldest cohorts in the dataset, is the retention rate at their current period lower than it was at the same period for newer cohorts? If yes, early adopters are churning faster — and if overall user counts are stable or growing, new user volume is masking the defections.
Signal: total users are flat or growing, but the Jan and Feb cohorts are at 15% retention by Month 6 while the May cohort is at 38% at Month 2.
**If churn masking is detected:** Flag it explicitly in the diagnosis. It changes the urgency framing — the company's retention health is worse than aggregate metrics suggest, and the gap will widen as acquisition slows.
**Why:** Teams that skip this check continue to interpret stable DAU as a retention success. The cohort-level picture reveals compounding LTV damage that will not appear in top-line metrics until acquisition slows or stops.
---
### Step 6 — Prescribe phase-appropriate interventions
Select experiments from the appropriate intervention set:
#### If initial phase is broken → Aha-moment acceleration
The initial period is a prolonging of the activation experience. Interventions mirror activation tactics:
- **NUX audit:** Map every step from signup to first value experience. Identify friction points and steps where users drop without completing. Reduce or reorder steps that are not prerequisite to the aha moment.
- **Time-to-value compression:** Remove anything that delays the core experience. If account creation is mandatory before the user sees value, move it after.
- **Trigger calibration:** Deploy push notifications or emails only when behavioral signals indicate a user is returning but has not yet activated. Calibrate to Fogg's motivation × ability model — do not send triggers when motivation is low.
- **At-risk flagging:** Implement a heuristic threshold (e.g., did not return within first 3 days for a mobile app) to trigger a targeted re-engagement message within the initial window.
- **Cross-reference:** Run `activation-funnel-diagnostic` in parallel — initial retention and activation share root causes and experiment types.
#### If medium phase is broken → Habit formation and reward engineering
The goal is to make the product the default choice for the need it serves.
- **Engagement loop audit:** Map the current trigger → action → reward → investment cycle. Identify whether the reward is predictable (low engagement potential) or variable (higher engagement potential). Variable reward schedules sustain engagement by introducing unpredictability — the core mechanism described in Nir Eyal's Hook Model.
- **Tangible rewards:** Discounts, credits, savings programs. Effective for e-commerce and marketplaces. Test multiple reward types — cash equivalents are not always the strongest motivator.
- **Experiential and social rewards:** Status markers, exclusive access, social proof (showing what peers have done). Frequent-flier programs demonstrate that status and access retain better than fare discounts over the long term.
- **Commitment devices:** Features that make switching more costly — saved preferences, accumulated history, social connections, progress streaks. The more a user has invested, the higher the switching barrier.
- **Promise of future value:** Communicate upcoming improvements. For SaaS and content platforms, scheduled feature releases or content drops give users a reason to maintain the relationship. Netflix-style episodic release pacing is an extreme form of this.
- **Notification frequency experimentation:** Test minimum effective notification frequency. Over-triggering during the medium phase accelerates churn rather than preventing it.
#### If long-term phase is broken → Two-pronged feature strategy
Long-term retention requires both maintaining the existing experience and expanding it over time.
**Prong 1 — Optimize existing features:**
- Identify which features correlate with highest long-term retention (behavioral cohort analysis)
- Run experiments to increase discovery and adoption of those features among long-term cohorts
- Address UX friction that accumulates as users attempt advanced use cases
**Prong 2 — Introduce new features at a staged cadence:**
- Release new features to 5–10% of users first; collect behavioral and satisfaction data before broad rollout
- Avoid rapid sequential feature releases that overwhelm users
- Use ongoing onboarding (see Step 7) to guide users to new capabilities
- Feature releases tied to a predictable schedule (annual or semi-annual events) create anticipated value moments
**Why two prongs:** Long-term churn has two distinct drivers — diminishing returns from the current experience, and failure to discover new value. Optimizing existing features alone does not attract users back once they have mentally "finished" the product. New features alone create confusion without proper onboarding.
---
### Step 7 — Feature velocity check (long-term plans only)
If the diagnosis is long-term phase failure and the team has been shipping features at high velocity, run the feature bloat check:
**Detection criteria:**
- Feature release rate is high (>2 significant features per quarter) but long-term retention is not improving
- User interviews or survey data show confusion about product scope or difficulty finding relevant features
- Support volume or "how do I..." queries are rising
**If feature bloat is detected:** Recommend a feature audit. The Marketing Science Institute research (Thompson, Hamilton, Rust 2005) found that maximizing feature count for initial appeal decreases customer lifetime value — users are overwhelmed and the core value is obscured. The intervention is not more features but clearer progressive disclosure of existing ones.
**Why:** Feature velocity feels like progress. The feature bloat anti-pattern makes it possible to invest significant engineering effort in long-term retention experiments that actually make retention worse.
---
### Step 8 — Resurrection branch (dormant user segment)
If a meaningful percentage of a cohort is dormant (users who were active in an early period but have not been active for several periods, without having formally canceled):
1. **Do not lump dormant users into win-back campaigns immediately.** First, interview a sample (target: 5–10 interviews) to understand why they left.
2. Classify reasons as: controllable (e.g., device migration, forgot about the product, lost access) vs. uncontrollable (changed job, life event, moved away from the use case).
3. Design win-back experiments only around controllable reasons. Campaigns targeting uncontrollable churn waste budget and may generate negative brand signal.
4. Experiment formats: re-engagement email sequences, reactivation offers, cross-device installation prompts, milestone-triggered outreach ("It's been 60 days — here's what's new").
5. Track resurrection rate separately from new user acquisition — do not blend the two in retention dashboards.
**Why:** Dormant users already know the product, which makes them cheaper to reactivate than acquiring a new user from scratch. But win-back campaigns without root cause diagnosis have low conversion rates and can create a "desperate" brand perception if poorly calibrated.
---
### Step 9 — Emit outputs
Write two documents:
**retention-phase-diagnosis.md**
```
## Cohort Analyzed
[Cohort identifier, time grain, date range]
## Product Type and Phase Boundaries
[Product type → phase boundary table as applied to this product]
## Retention Curve Summary
[Text or table representation of curve shape]
## Phase Diagnosis
[Named phase: initial / medium / long-term / multiple]
[One-sentence characterization of curve behavior]
[Root cause hypothesis]
## Churn Masking Check
[Result: detected / not detected]
[Evidence if detected]
## Feature Bloat Check (long-term only)
[Result: risk present / not present]
[Evidence]
## Dormant User Population
[Percentage of cohort dormant, if applicable]
```
**retention-intervention-plan.md**
```
## Phase-Appropriate Interventions
[Ordered list of 5–8 specific experiments, each with:]
- Experiment name
- Hypothesis
- Target metric
- Implementation notes
## Resurrection Plan (if applicable)
[Interview protocol, win-back experiment candidates]
## Anti-Patterns to Avoid
[Churn masking / feature bloat warnings as applicable]
## Dependency Skills
[Links to activation-funnel-diagnostic, monetization-experiment-planner as appropriate]
```
---
## Key Principles
1. **Retention is phased — tactics do not transfer across phases.** NUX optimization applied to a month-6 cohort does not address why long-term users disengage. Diagnosis before prescription is non-negotiable.
2. **Cohort analysis is mandatory — point-in-time retention hides the pattern.** Aggregate DAU or monthly active users can be stable or growing while early cohorts are in decline. Always diagnose from the cohort curve, not the headline number.
3. **Acquisition hides churn — compute churn on a fixed cohort.** Strong new user growth masks early-adopter defections. A company can be losing its core users while total counts look healthy. Cohort-level churn must be tracked independently.
4. **Habit formation is not a trick — variable rewards work only for products that deserve habit.** The Hook Model works when the product genuinely solves a recurring need. Applying engagement loop mechanics to a product that does not merit repeat use accelerates churn by irritating users, not retaining them.
5. **Feature bloat is the long-term retention killer.** Adding features at high velocity maximizes initial appeal but decreases customer lifetime value. More features do not equal more retention unless they are discoverable and relevant to existing use cases.
6. **Resurrection is cheaper than new acquisition — but requires root cause first.** Win-back campaigns without understanding why users left have low conversion rates and poor brand optics. Interview before you campaign.
---
## Examples
### Example 1: Mobile app with initial phase broken
**Situation:** A fitness tracking app shows that 65% of new users are active on Day 1, but only 22% return on Day 7, and only 14% return on Day 14. The curve never plateaus.
**Product type:** Mobile app. Initial phase = Day 1. Medium phase = Days 2–14. Long-term = Week 3+.
**Diagnosis:** Initial phase failure. 78% churn within the first week indicates users are not reaching a meaningful experience of core value before disengaging. The curve's failure to plateau confirms there is no retained user segment.
**Churn masking check:** Total downloads growing 20% month-over-month, which was masking the cohort-level pattern in dashboard reviews.
**Interventions prescribed:**
- NUX audit: map from app install to first completed workout. Identify all steps that precede the first success moment. Remove mandatory account creation from before first workout.
- Day 3 trigger: if user has not completed a second workout by Day 3, send a contextual push notification tied to the user's first workout type.
- At-risk threshold: flag any user who has not returned by Day 2 for a targeted re-engagement sequence.
- Run `activation-funnel-diagnostic` to identify the specific funnel step with highest drop-off.
**Not prescribed:** Habit formation experiments, new feature development, or win-back campaigns — all medium- and long-term phase tactics.
---
### Example 2: SaaS product with medium phase broken
**Situation:** A project management SaaS shows 60% of Month 1 users still active in Month 2, but only 28% in Month 4 and 18% in Month 6. The initial drop is acceptable for a SaaS product; the Month 2–4 drop is steep.
**Product type:** SaaS. Initial phase = Month 1. Medium phase = Months 2–5. Long-term = Month 6+.
**Diagnosis:** Medium phase failure. Users are successfully onboarded (strong Month 1 retention) but are not forming a habit of returning. The tool is not yet their default project management environment.
**Churn masking check:** Not detected — new sales are flat, so the cohort-level pattern is visible in aggregate.
**Interventions prescribed:**
- Engagement loop audit: map what power users (top 10% by session frequency) do in Months 2–4 that new users do not. Identify the behavioral correlate of medium-term retention.
- Weekly digest email: summarize team activity from the prior week, sent Monday morning when motivation to plan is high. Trigger is behavioral (inactivity for 5+ days) not calendar-based.
- Collaboration prompt: prompt users to invite a second team member. Social investment (teammates' data is in the tool) raises switching costs.
- Promise of future value: send a "what's coming in Q3" product update to Month 2 cohorts. Creating anticipated value gives users a reason to maintain the subscription through a low-engagement period.
**Not prescribed:** NUX redesign (initial phase is healthy), feature bloat audit (feature velocity is not the issue at Month 2–4).
---
## References
- Balfour, Brian. "Growth Is Good, but Retention Is 4+Ever." Presentation, May 10, 2015.
- Eyal, Nir. *Hooked: How to Build Habit-Forming Products.* Portfolio/Penguin, 2014.
- Thompson, Debora Viana, Rebecca Hamilton, and Roland Rust. "Feature Fatigue: When Product Capabilities Become Too Much of a Good Thing." Marketing Science Institute, 2005.
- Reichheld, Frederick. "Prescription for Cutting Costs." Bain & Company report. Cited in: Ellis & Brown, *Hacking Growth*, Chapter 7.
- Croll, Alistair, and Benjamin Yoskovitz. *Lean Analytics.* O'Reilly Media, 2013. (Phase boundary benchmarks.)
- Ellis, Sean, and Morgan Brown. *Hacking Growth.* Crown Business, 2017. Chapters 6–7.
---
## License
This skill is licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). Distilled from *Hacking Growth* by Sean Ellis and Morgan Brown. Source book is copyright of its respective authors. This skill contains no verbatim excerpts — only synthesized frameworks, process adaptations, and original analysis.
---
## Related BookForge Skills
- **product-market-fit-readiness-gate** — Run this first. Retention curve stability is the second PMF signal; this skill assumes PMF has been achieved.
`clawhub install bookforge-product-market-fit-readiness-gate`
- **north-star-metric-selector** — Retention improvements need a single metric to optimize toward. Select the retention-appropriate North Star before designing experiments.
`clawhub install bookforge-north-star-metric-selector`
- **activation-funnel-diagnostic** — Initial retention and activation share root causes. Run in parallel when initial phase failure is diagnosed.
`clawhub install bookforge-activation-funnel-diagnostic`
- **monetization-experiment-planner** — The natural next step once retention is stable. Increasing lifetime value requires a retained user base first.
`clawhub install bookforge-monetization-experiment-planner`
Use this skill to determine whether a product is ready for scaled growth experimentation BEFORE investing in acquisition, activation, or retention hacks. Run...
---
name: product-market-fit-readiness-gate
description: "Use this skill to determine whether a product is ready for scaled growth experimentation BEFORE investing in acquisition, activation, or retention hacks. Runs the Sean Ellis must-have survey (40% 'very disappointed' threshold), analyzes retention curve stability, and outputs a binary go/no-go verdict with remediation protocol for failing products. Triggers when a growth PM asks 'are we ready to scale?', 'should we invest in acquisition yet?', 'do we have product/market fit?', 'is my product must-have?', 'should we hire a growth team?', or 'why are our experiments not working?'. Also activates for 'how do I run the Sean Ellis test', 'must-have survey', 'product/market fit check', 'retention curve analysis', 'PMF gate', or 'premature scaling check'. Use BEFORE any other growth skill — this is the gate."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/product-market-fit-readiness-gate
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [2]
tags:
- growth
- product-market-fit
- experimentation
- startup-ops
depends-on: []
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Product brief (product-brief.md) describing the product, ICP, core value
proposition, and current stage. Optional: survey-responses.csv with raw
must-have survey responses. Optional: retention-cohorts.csv with weekly
or monthly cohort retention data.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set — product brief + optional CSV data. No code execution required.
discovery:
goal: >
Prevent premature scaling by producing a defensible go/no-go verdict on
whether a product has reached must-have status before the team invests
in scaled growth experimentation.
tasks:
- "Read or collect product brief"
- "Check if must-have survey data exists"
- "If no survey, generate survey template with exact Sean Ellis wording"
- "If survey exists, score it against 40% threshold"
- "Analyze retention curve shape"
- "Emit go/no-go verdict with rationale"
- "For no-go: produce remediation protocol"
---
# Product Market Fit Readiness Gate
A go/no-go gate that determines whether a product has achieved must-have status before the team invests in scaled growth experimentation. Produces a structured verdict with evidence and, for failing products, a concrete remediation protocol.
## When to Use
Use this skill at the moment a growth team is deciding whether to shift from product development into high-tempo experimentation. Typical triggers:
- The team is about to hire growth engineers, spin up an ads budget, or launch a referral program
- Experiments are running but results feel inconclusive or disappointing
- Leadership is asking "are we ready to scale?" or "why isn't acquisition converting?"
- A new growth lead has joined and needs a baseline readiness assessment
Do not use as an ongoing monitoring tool once growth is already at scale — at that stage, retention dashboards and North Star metrics replace this gate.
## Context & Input Gathering
Begin by reading the product brief. Then check whether survey data and cohort retention data are available.
**Branch A — Survey data exists (`survey-responses.csv`):**
Proceed directly to Step 3 (scoring). Skip survey template generation.
**Branch B — No survey data:**
Ask the user: "Have you already sent a must-have survey to active users? If yes, share the response file. If not, I'll generate the survey template now so you can send it before scoring."
Generate `must-have-survey-template.md` (Step 2), then pause and wait for results before completing the verdict.
**Branch C — No retention data (`retention-cohorts.csv`):**
Proceed with survey scoring only. Flag in the verdict that retention analysis is incomplete and recommend pulling cohort data from your analytics tool (Mixpanel, Amplitude, Looker, etc.) before treating the verdict as final.
## Process
### Step 1: Gather product context
Read `product-brief.md`. Extract:
- Product name and category
- Ideal customer profile (ICP)
- Core value proposition / aha moment hypothesis
- Current user volume and stage (private beta / public / post-launch)
**Why:** The scoring thresholds and remediation paths differ by product type (high-frequency consumer app vs. annual SaaS vs. marketplace). Without context, the verdict will be generic and the remediation protocol will miss the most likely failure modes.
**Output:** A one-paragraph context summary to embed in the verdict document.
---
### Step 2: Generate the must-have survey (if no data exists)
Write `must-have-survey-template.md` containing the exact survey question and answer options:
```
Subject: Quick question about [Product Name]
How would you feel if you could no longer use [Product Name]?
a) Very disappointed
b) Somewhat disappointed
c) Not disappointed (it really isn't that useful)
d) N/A — I no longer use it
```
Include the following diagnostic questions (send only if initial score is below threshold, or include upfront for efficiency):
1. What would you likely use as an alternative to [Product Name] if it were no longer available?
- I probably wouldn't use an alternative
- I would use: ___________
2. What is the primary benefit you have received from [Product Name]?
3. Have you recommended [Product Name] to anyone?
- No
- Yes — please explain how you described it: ___________
4. What type of person do you think would benefit most from [Product Name]?
5. How can we improve [Product Name] to better meet your needs?
6. Would it be okay if we followed up by email to clarify one or more of your responses?
**Targeting note:** Send to active users only (those who have used the product in the past 30 days). Dormant users produce uninformative responses and low completion rates. Aim for at least 200–300 responses before treating the score as reliable.
**Why:** Survey wording matters. Substituting "love" or "miss" for "very disappointed" changes the emotional register and invalidates comparison to the published threshold. The exact wording is the validated signal.
**Output:** `must-have-survey-template.md`
---
### Step 3: Score the must-have survey
Count responses by category. Calculate:
```
Very Disappointed % = (count of "Very disappointed" responses) / (total responses excluding "N/A") × 100
```
Apply the three-band decision rule:
| Score | Verdict | Meaning |
|---|---|---|
| >= 40% | GO | Product has achieved must-have status. Green light for growth experimentation. |
| 25–40% | CONDITIONAL | Product or messaging needs targeted improvement. Limited experiments may run in parallel, but broad acquisition investment is premature. |
| < 25% | NO-GO | Either the wrong audience is being surveyed, or the product requires substantial development. Halt acquisition spend immediately. |
**Why:** The 40% floor is not arbitrary — it reflects the minimum density of passionate users needed to sustain word-of-mouth and withstand the churn that follows broad acquisition. Scaling below this threshold accelerates user exodus (the BranchOut pattern): viral mechanics bring users who find no compelling reason to stay, CAC permanently exceeds LTV, and the company burns capital on a leaking bucket. The 25–40% band is actionable because the product has real fans — they just need either a better product or better communication of the existing value.
**Output:** Score summary section in `pmf-readiness-verdict.md`
---
### Step 4: Analyze the retention curve
If `retention-cohorts.csv` is available, read it. Evaluate:
1. **Shape:** Does the curve flatten and stabilize, or does it continue declining toward zero? A curve that flattens — even at a low absolute rate — indicates a core segment of retained users. A curve continuously declining toward zero means no durable value for any cohort.
2. **Level vs. benchmarks:**
- Consumer mobile apps: average 1-month retention ~10%; best-in-class 60%+
- SaaS (annual): retention north of 90% is the competitive baseline
- Marketplaces and social products: compare to closest public comps
3. **Masking signal:** Check whether strong new-user acquisition is hiding early-cohort churn. Isolate cohorts by signup month and track each independently.
**Verdict mapping:**
- Stable curve at competitive level → confirms GO or softens CONDITIONAL
- Stable curve but below benchmarks → retention work needed even if survey passes
- Continuously declining curve → reinforces NO-GO regardless of survey score
**Why:** The survey captures users' stated preference; the retention curve reveals revealed behavior. Both signals need to align. A product can pass the survey (enthusiastic fans who said they'd be very disappointed) but still show declining retention if the aha moment is inconsistently delivered. Retention without survey data is also incomplete — it cannot tell you whether satisfaction is high enough among those who do stay.
**Output:** Retention analysis section in `pmf-readiness-verdict.md`
---
### Step 5: Emit the verdict
Write `pmf-readiness-verdict.md` with the following structure:
```markdown
# PMF Readiness Verdict: [Product Name]
**Date:** [date]
**Verdict:** GO / CONDITIONAL / NO-GO
## Evidence Summary
- Must-have survey score: X% very disappointed (N=total responses)
- Retention curve: [stable at Y% / continuously declining / data unavailable]
- Product category: [consumer mobile / SaaS / marketplace / other]
## Rationale
[2–3 sentences explaining the primary reason for the verdict, using the evidence above]
## Recommended Next Actions
[Specific to verdict — see Step 6 for NO-GO protocol]
```
**Why:** A written verdict forces the decision to be explicit and shared. Growth teams that skip this step frequently re-debate readiness at every planning meeting, wasting cycles. The written artifact is the commitment device.
**Output:** `pmf-readiness-verdict.md`
---
### Step 6: Remediation protocol (NO-GO and CONDITIONAL only)
For products scoring below 40%, do not guess at solutions in internal planning sessions. The three concurrent methods:
**A. Customer interviews (marketing/product design lead)**
Talk to active users — especially the "very disappointed" sub-segment, however small. Observe them using the product in context, not just in a formal interview. What job are they hiring the product to do? What friction prevents others from reaching that same outcome?
**B. Diagnostic analysis of "very disappointed" cohort (data analyst lead)**
Segment survey respondents by score. Identify behavioral differences: what did the "very disappointed" users do in the product that "somewhat disappointed" or "not disappointed" users did not? That difference often reveals the aha moment path that needs to be broadened.
**C. Minimum viable tests on product and messaging (engineering + growth lead)**
Run the smallest possible experiments that could close the gap — starting with messaging changes (cheapest, fastest) before committing to feature builds. Do not add features without evidence from methods A and B that the feature addresses a confirmed gap.
**Caution on feature creep:** Adding features is the intuitive response to a failing score, but it is often wrong. The remediation frequently requires removing friction or improving delivery of existing value, not adding new capabilities.
**Why:** Teams that skip structured remediation and instead hold whiteboard brainstorms burn time and money on unvalidated bets. The three-method approach distributes diagnostic work across specializations and produces convergent evidence before any engineering investment.
**Output:** Remediation plan appended to `pmf-readiness-verdict.md`
## Key Principles
**1. 40% is a floor, not a target.**
Products that just clear 40% are marginal. Teams should continue working to raise the score even after passing the gate. The threshold is the minimum for viable growth, not evidence of a breakout product.
**2. Retention confirms what the survey suggests.**
The survey captures stated preference under a hypothetical. The retention curve reveals actual behavior. Both signals must align before committing to scaled acquisition. A strong survey score with a declining retention curve means the aha moment is being promised but not reliably delivered.
**3. Premature scaling is the number one growth failure mode.**
Viral and paid acquisition mechanics can temporarily inflate user counts regardless of product quality. The BranchOut case demonstrates the collapse pattern: rapid acquisition before must-have status creates a user pool that churns faster than new users arrive, permanently exceeding cost of acquisition over lifetime value. The growth investment is consumed before the product is ready to convert it.
**4. Survey active users only.**
Targeting dormant users produces response bias and low completion rates. The signal you need comes from people who have experienced the product — even if their verdict is disappointment. Dormant users have already voted with their absence.
**5. The 25–40% band is actionable, not a failure.**
A score in this range means real fans exist. The job is to understand what those fans experience that others do not — and redesign the product or messaging to deliver that experience more consistently or to a better-matched audience.
**6. This gate applies once per growth phase, not perpetually.**
Re-run this gate when the product undergoes significant changes (new ICP, major feature pivot, new market entry). Do not use it as a routine measurement tool once growth is established.
## Examples
### Example 1: NO-GO — SaaS Team Collaboration Tool
**Scenario:** A B2B SaaS team has 800 paying users and is considering a $200K acquisition campaign. The Head of Growth asks, "Are we ready to scale paid acquisition?"
**Trigger:** "Are we ready to scale?" + existing user base large enough to survey.
**Process summary:**
- Survey sent to 600 active users; 280 responses received
- Very disappointed: 18% | Somewhat disappointed: 44% | Not disappointed: 31% | N/A: 7%
- Retention curve: month-3 cohorts retaining at 38%, month-6 at 22%, still declining
- Verdict: NO-GO
**Output excerpt from `pmf-readiness-verdict.md`:**
```
Verdict: NO-GO
Score: 18% very disappointed (well below 40% threshold)
Retention: Continuously declining — no stable cohort identified
Rationale: Both signals confirm the product has not achieved must-have status.
Scaling acquisition now would accelerate churn rather than compound growth.
Recommended next actions:
1. Segment the 18% "very disappointed" users — identify their behavioral patterns
2. Conduct 8–10 qualitative interviews with this cohort to identify the aha moment path
3. Run messaging experiments to test whether the core value is being communicated
to the right audience before building any new features
```
---
### Example 2: GO — Consumer Productivity App
**Scenario:** A productivity app has been in public beta for 4 months. The founding team is debating whether to hire a growth engineer and launch a referral program.
**Trigger:** "Should we hire a growth team?" + "do we have product/market fit?"
**Process summary:**
- Survey sent to 400 active users; 310 responses
- Very disappointed: 52% | Somewhat disappointed: 31% | Not disappointed: 12% | N/A: 5%
- Retention curve: flattens at 41% at month 3, stable through month 6; compares favorably to consumer app benchmarks
- Verdict: GO
**Output excerpt from `pmf-readiness-verdict.md`:**
```
Verdict: GO
Score: 52% very disappointed (above 40% threshold)
Retention: Stable at 41% at month 3 — well above consumer mobile average of ~10%
Rationale: Both signals confirm must-have status. A majority of active users would
be significantly impacted by loss of the product. Retention curve has stabilized at
a competitive level. The product is ready for high-tempo growth experimentation.
Recommended next actions:
1. Proceed with referral program design (see: growth-referral-loop-designer)
2. Identify North Star metric before beginning growth experiments (see: north-star-metric-selector)
3. Re-run this gate if the product undergoes a significant pivot or new market entry
```
## References
Methodology sourced from:
- *Hacking Growth*, Ch. 2: "Determining If Your Product Is Must-Have" — Must-Have Survey (primary)
- *Hacking Growth*, Ch. 2: "Measuring Retention" — retention curve as second signal
- *Hacking Growth*, Ch. 2: "The Flameout of BranchOut" — premature scaling anti-pattern
- *Hacking Growth*, Ch. 2: "Getting to Must-Have" — remediation protocol
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — *Hacking Growth* by Sean Ellis and Morgan Brown.
## Related BookForge Skills
This skill is the foundation of the Hacking Growth skill set. Run it before these dependents:
- `clawhub install bookforge-north-star-metric-selector` — pick the growth equation variable that matters after PMF is confirmed
- `clawhub install bookforge-retention-phase-intervention-selector` — use the retention curve from this skill as the diagnostic starting point
Browse more: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
Use this skill to construct a growth equation for a product and select a defensible North Star Metric that actually reflects core value delivery — rejecting...
---
name: north-star-metric-selector
description: "Use this skill to construct a growth equation for a product and select a defensible North Star Metric that actually reflects core value delivery — rejecting vanity metrics (DAU, total signups, pageviews) that feel like growth but don't compound. Produces the full multiplicative growth equation (acquisition × activation × retention + monetization + referral) and a scored shortlist of NSM candidates with rationale, then recommends one. Triggers when a growth PM or head of growth asks 'what should our north star metric be?', 'help me pick a north star', 'is DAU the right metric?', 'my team is chasing vanity metrics', 'we're orienting around the wrong metric', 'how do I build a growth equation', 'what's our key growth metric', 'OMTM one metric that matters', 'north star framework', or 'growth equation for [product type]'. Also activates for 'WhatsApp messages sent as north star', 'Airbnb nights booked', 'north star vs input metrics', or 'why DAU is misleading'. Use AFTER product/market fit is confirmed — this metric orients every downstream growth experiment."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/north-star-metric-selector
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [3]
tags:
- growth
- metrics
- north-star
- product-marketing
- startup-ops
depends-on: []
execution:
tier: 1
mode: plan-only
inputs:
- type: document
description: >
Product brief (product-brief.md) describing product, ICP, core value prop,
current stage, and the user's hypothesis of the aha moment.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set. Plan-only — produces markdown deliverables (north-star-recommendation.md
and growth-equation.md). No code execution.
discovery:
goal: >
Produce a defensible north-star-recommendation.md and growth-equation.md that
the team can orient around for the next 6-12 months of experimentation.
tasks:
- "Read product brief and confirm aha moment"
- "Construct the multiplicative growth equation with stages"
- "Enumerate candidate NSM variables"
- "Apply rejection criteria (vanity, proxy, short-term)"
- "Score survivors against must-have experience criterion"
- "Recommend primary NSM + 2-3 input metrics"
- "Emit deliverables"
---
# North Star Metric Selector
## When to Use
You are a growth PM, head of growth, or founder at a post-PMF startup (Series A–B) and need to answer the question: *what is the one metric that should orient all growth experimentation for the next 6–12 months?*
This skill applies when:
- Your team is debating between several metrics and can't align on which to track
- You suspect the current orienting metric (DAU, total signups, GMV, pageviews) is a vanity metric that can rise while the business stagnates
- You are setting up your growth team and need a North Star before running the first experiment cycle
- You have PMF confirmed (40%+ "very disappointed" on the must-have survey, stable retention curve) and are ready to scale
- Someone on your team challenges whether the current metric captures real value delivery, and you need a structured argument
Do not use this skill before confirming product/market fit. Selecting a North Star before PMF creates false precision — you will orient a team around the wrong signal.
---
## Context and Input Gathering
Before running the process, confirm the following from the product brief:
1. **Product description** — what does the product do, for whom, in what context?
2. **Core value proposition** — what problem is definitively solved for the ideal customer?
3. **Aha moment** — the specific moment a new user first experiences the core value. If the brief does not name it, ask: "At what point does a new user stop wondering whether this is worth it?" Note: the aha moment was identified during PMF validation (see `product-market-fit-readiness-gate`).
4. **Business model** — subscription, marketplace, transactional, freemium, ad-supported? This shapes which equation variables matter.
5. **Current orienting metric** — what metric is the team currently chasing? Why? This is the candidate for rejection.
If the product brief is missing any of these, ask before proceeding. A growth equation built on an assumed aha moment is fragile.
---
## Process
### Step 1: Confirm the Must-Have Experience
Identify and state the aha moment in a single sentence: *"The aha moment is [action] that delivers [core value] to [ICP] at [trigger point]."*
**Why this step:** The North Star must reflect whether users are experiencing this moment — not whether they are merely present. Every downstream selection decision anchors on this sentence. If you skip this confirmation and proceed to equation construction, you risk selecting a metric that measures activity rather than value delivery.
Examples of correctly stated aha moments:
- "Sending a message to a friend in another country for free, immediately" (messaging app)
- "Finding and booking accommodation in a new city within 10 minutes" (marketplace)
- "Seeing my pipeline update automatically without entering data manually" (B2B SaaS)
### Step 2: Construct the Growth Equation
Build the product's fundamental growth equation — a multiplicative formula that expresses all core growth levers.
**Template structure:**
```
[Acquisition input] × [Activation conversion] × [Engagement depth/frequency]
+ Retained [users/subscribers/buyers]
+ Resurrected [lapsed users who returned]
= [Growth output metric]
```
The exact variables depend on the business model. The additive structure (+Retained +Resurrected) is not optional — making retention and resurrection explicit forces the team to treat them as levers, not background assumptions.
**Worked Example A — B2B SaaS (project management tool):**
```
New Trial Signups
× Trial-to-Seat Activation Rate
× Seats per Account
× Weekly Active Seats (depth-of-use)
+ Retained Paying Accounts
+ Resurrected Churned Accounts
= ARR Growth
```
**Worked Example B — Consumer Marketplace (short-term rentals):**
```
New Host Listings
× Listing Quality Rate (photos, description completeness)
× Guest Search Sessions
× Booking Conversion Rate
+ Retained Repeat Guests
+ Resurrected Dormant Guests
= Nights Booked
```
**Worked Example C — Subscription Media:**
```
Monthly Website Traffic
× Email Capture Rate
× Active Reader Rate (opens content ≥ 2x/week)
× Paid Subscriber Conversion Rate
+ Retained Paid Subscribers
+ Resurrected Lapsed Subscribers
= Subscriber Revenue Growth
```
**Why this step:** The growth equation makes growth levers explicit and countable. Without it, "grow the business" is the strategy. With it, the team can see exactly which stage is the weakest link and focus experimentation there. The equation also produces the candidate pool for NSM selection in Step 3.
### Step 3: Enumerate NSM Candidates
Extract the key variables from the equation. These are the candidate North Star metrics. Every variable in the equation is a candidate — including the output metric itself, the conversion rates, the depth-of-use variables, and the retained/resurrected terms.
List each candidate explicitly. For the B2B SaaS example above, candidates include: trial signups, weekly active seats, seats per account, booking conversion rate, retained paying accounts, ARR.
**Why this step:** Teams often skip straight to the output metric (ARR, GMV) and miss that an intermediate variable — one that more precisely captures the aha moment — is the better North Star. You cannot make that judgment without seeing all candidates in one view.
### Step 4: Apply Rejection Criteria
Run each candidate through four rejection filters. A metric that fails any filter is disqualified from North Star selection. It may survive as an input metric.
**Filter 1 — Vanity metric check:**
*Can this metric rise while core value delivery to users stays flat or declines?*
If yes, reject. Total signups can rise via an aggressive acquisition campaign even if zero users reach the aha moment. Pageviews can increase via SEO while user engagement craters. Daily active users can be inflated by notification spam.
**Filter 2 — Proxy check:**
*Is this metric measuring the activity that produces value, or the value itself?*
If it measures activity only, reject as a standalone NSM (it becomes an input metric). New trial signups measures acquisition activity, not value delivery. The variable that captures value delivery is the one closest to the aha moment in the equation.
**Filter 3 — Frequency mismatch check:**
*Does the natural frequency of the metric match how often users actually experience the core value?*
If not, reject. A short-term rental platform cannot use daily active users as its North Star — even loyal users book stays only a few times per year. A review platform cannot use daily visits — genuine users search weekly at most. Forcing a daily metric onto a low-frequency product makes the metric impossible to move and misaligned with actual usage patterns.
**Filter 4 — Short-term inflation check:**
*Can the metric be moved quickly through tactics that don't improve the underlying product?*
If yes, reject as the primary NSM. A metric vulnerable to gaming (login streaks, notification-driven opens, discount-triggered purchases) should be monitored as an input metric, not the orientation point for the team.
**Named failure modes:**
- DAU for WhatsApp: fails Filter 3. A user can be daily-active but send only one message. DAU doesn't capture whether WhatsApp is actually the user's primary messaging channel.
- Total registrations for a marketplace: fails Filter 1. Registrations rise whenever an acquisition campaign runs, regardless of whether any transaction occurs.
- GMV without repeat purchases: fails Filter 2 if the aha moment is repeat value (a user who buys once and never returns didn't fully experience the core value).
### Step 5: Score Surviving Candidates
For candidates that pass all four rejection filters, score each on a 1–5 scale across four criteria:
| Criterion | Score 5 | Score 1 |
|-----------|---------|---------|
| Reflects must-have experience | Directly measures the aha moment | Measures a downstream proxy |
| Actionable by the team | Team has clear levers to move it | Driven by factors outside team control |
| Survives 6-12 months | Relevant at 2x current scale | Likely to become obsolete within 3 months |
| Honest signal | Hard to inflate without real value delivery | Easy to game with tactics |
Sum the scores. The highest scorer is the primary NSM recommendation.
**Why this step:** When two candidates both pass the rejection filters, subjective debate replaces rigor. A scored comparison makes the selection defensible to the executive team and creates a record of why candidates were accepted or rejected.
### Step 6: Recommend Primary NSM and Input Metrics
Produce:
1. **Primary NSM:** the highest-scoring survivor from Step 5, stated as a specific, measurable quantity with a time dimension. Example: "Messages sent per active user per week" rather than "messages sent."
2. **2-3 Input metrics:** the upstream variables in the growth equation that the team's experiments will directly manipulate to move the NSM. Input metrics are the levers; the NSM is the outcome. They are not the same. Experiments target input metrics; success is confirmed by movement in the NSM.
3. **Explicit rejects:** list the 1-2 most tempting metrics that were rejected and why. This is as important as the recommendation — it prevents the team from reverting to vanity metrics when the NSM is hard to move.
**Why this step:** Teams that select only the NSM without naming input metrics have no operational plan. Teams that confuse input metrics for the NSM lose the north-star property — the metric becomes gameable. The distinction is critical.
### Step 7: Emit Deliverables
Write two output files:
**`north-star-recommendation.md`**
- Product name and aha moment (confirmed in Step 1)
- Primary NSM: name, definition, measurement method, current baseline
- Rationale: why this metric, why not the top 2 rejected candidates
- Input metrics: 2-3 with measurement method and team ownership
- Review cadence: recommend quarterly NSM review at minimum (the right metric may change as the company matures)
**`growth-equation.md`**
- Full equation in the multiplicative + additive template format
- Variable definitions: one sentence per variable explaining what it measures and how to instrument it
- Current values where known, gaps where not yet instrumented
- Stage diagnosis: which stage of the equation currently has the weakest conversion? This is the starting point for the first experiment cycle.
---
## Key Principles
1. **The guiding question is singular:** "Which variable in the growth equation most accurately represents the delivery of the must-have experience?" Everything else is a supporting question.
2. **Input metrics are levers; the NSM is the outcome.** The growth team pulls input metrics (trial conversion rate, onboarding completion, feature adoption). The NSM tells them whether pulling those levers is working. Conflating the two — treating an input metric as the North Star — makes the metric gameable and disconnects it from real value delivery.
3. **The right NSM changes at different company stages.** A metric that captures core value delivery for a 10,000-user product may become irrelevant at 10 million users. Early-stage Facebook correctly oriented around MAU to prove reach; later-stage Facebook correctly shifted to DAU as engagement became the constraint. Build a quarterly review into the process.
4. **If your NSM can go up while your business goes down, it's wrong.** This is the litmus test for vanity metrics. Run the test on your current metric before the process — it will confirm whether the problem is worth solving.
5. **The growth equation forces retention and resurrection into view.** The additive structure (+ Retained + Resurrected) is not decorative. Without it, teams treat retention as a passive background condition and allocate all experiments to acquisition. Seeing retained and resurrected users as explicit equation variables changes the allocation decision.
6. **The North Star creates the right disagreements.** A well-chosen NSM generates productive debates: "Why didn't this experiment move the NSM even though the input metric improved?" That disagreement surfaces disconnects between levers and outcomes. A vanity metric generates false confidence: "The number went up" — end of discussion.
---
## Examples
### Consumer Marketplace — Accommodation Platform
A short-term rental platform with strong early adopter retention asked: "Should we orient around DAU, total listings, or something else?"
The aha moment: a guest successfully books unique accommodation in their destination city within a single session. The growth equation revealed the core stages: listings (supply) × search sessions (demand) × booking conversion × guest satisfaction → repeat bookings. Nights booked was the only variable that captured both supply and demand satisfaction simultaneously — a night booked means a host listed a quality property AND a guest found and committed to it. Orienting around nights booked also surfaced a non-obvious lever: professional photography increased listing quality, which increased booking conversion in lagging markets. DAU would never have revealed this lever — the metric couldn't distinguish between users who browsed and abandoned from users who completed bookings.
Input metrics: listing completion rate, search-to-booking conversion rate, average listings per active market.
### B2B SaaS — Team Collaboration Tool
A project management SaaS with a freemium model was orienting around monthly active users (MAU). The growth team suspected this was the wrong metric because MAU was rising while paid conversion was flat.
The aha moment: a team lead sees their entire team's tasks update in real time without anyone logging data manually. The growth equation exposed the gap: trial signups × team activation rate (≥3 members completing setup) × weekly collaborative sessions × conversion to paid. MAU failed the vanity metric check — a single user opening the app weekly counted as "active" even if they never invited their team, which meant they never experienced the aha moment. The better North Star: weekly collaborative sessions per team (defined as ≥2 members using a shared project in a 7-day window). This metric could not rise without teams actually collaborating, which is precisely the must-have experience.
Input metrics: team invite completion rate, shared project creation rate, integration setup completion.
---
## References
- `research/north-star-metric-selector.md` — source passages and company-specific examples from Chapter 3
- `references/growth-equation-examples.md` — additional worked growth equations by business model
- `orchestration/specs/skill-spec.md` — BookForge skill authoring standards
---
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — *Hacking Growth* by Sean Ellis and Morgan Brown.
---
## Related BookForge Skills
This skill is the foundation of the Hacking Growth operating system — most downstream skills consume the North Star metric it produces:
- `clawhub install bookforge-growth-experiment-prioritization-scorer` — score experiments by predicted impact on this NSM using the ICE framework before committing any team time
- `clawhub install bookforge-activation-funnel-diagnostic` — diagnose which stage of the growth equation is leaking most, starting from the NSM signal
- `clawhub install bookforge-retention-phase-intervention-selector` — select the right retention intervention for each retention phase, with success measured against the NSM
- `clawhub install bookforge-growth-stall-prevention` — audit the NSM trend for plateau signals and select the right counter-measure before momentum is lost
Browse more: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
Use this skill to plan monetization experiments for a post-PMF product with stable retention — classify the monetization archetype (subscription / e-commerce...
---
name: monetization-experiment-planner
description: "Use this skill to plan monetization experiments for a post-PMF product with stable retention — classify the monetization archetype (subscription / e-commerce / ad-revenue), run cohort revenue analysis to find highest-value customer segments, and propose pricing experiments using pricing relativity (3-tier anchoring), cohort upsell, and penny-gap handling. Produces a monetization-experiment-backlog.md with ordered tests and a revenue-cohort-analysis.md showing where the revenue actually comes from. Triggers when a growth PM asks 'how do I increase revenue per user', 'pricing experiment ideas', 'should I raise my prices', 'how do I structure pricing tiers', 'pricing anchoring', 'cohort revenue analysis', 'freemium to paid conversion', 'penny gap problem', 'our LTV is too low', 'CAC LTV ratio', 'monetization funnel', 'upsell experiments', or 'how do I monetize my free users'. Also activates for 'should we drop the price', 'three tier pricing', 'Qualaroo pricing case', or 'personalization recommendations revenue'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/monetization-experiment-planner
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [8]
tags:
- growth
- monetization
- pricing
- revenue-optimization
- startup-ops
depends-on:
- retention-phase-intervention-selector
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Revenue cohorts CSV (revenue-cohorts.csv) segmenting customers by tier
or spend. Pricing tiers doc (pricing-tiers.md) with current pricing
structure. Optional: customer-surveys.md for willingness-to-pay signals.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set + CSV. Produces a monetization experiment backlog and
revenue cohort analysis as markdown.
discovery:
goal: >
Produce an ordered monetization experiment backlog prioritized by impact
on revenue per retained user, plus a revenue cohort analysis showing
where the team should focus pricing experiments.
tasks:
- "Confirm retention is stable (prerequisite)"
- "Classify monetization archetype"
- "Run cohort revenue segmentation"
- "Identify highest-value customer segments"
- "Propose pricing experiments appropriate to archetype"
- "Flag penny-gap and reactive-cut pitfalls"
- "Emit experiment backlog and cohort analysis"
---
# Monetization Experiment Planner
A structured process for growth teams at post-PMF, Series A–B companies whose
retention is stable but revenue per user is flat or growing too slowly. Applies
the monetization framework from *Hacking Growth* (Ellis & Brown, Chapter 8) to
classify your business model, surface where revenue is actually coming from,
and generate a prioritized backlog of pricing and upsell experiments grounded
in cohort data rather than intuition.
---
## When to Use
Use this skill when:
- Retention is stable (flat or rising retention curve confirmed — see
`retention-phase-intervention-selector` before proceeding)
- Revenue per user is flat despite user growth
- You need to restructure pricing, add a paid tier, or convert free users
- Leadership is asking "where does our revenue actually come from?"
- You are considering cutting prices and want to test the assumption first
- You want to identify which customer segments to target with upsell experiments
Do not use this skill when retention is still declining. Monetizing users who
are churning produces short-term revenue at the cost of long-term LTV and
compounds CAC-LTV inversion. Stabilize retention first.
---
## Context and Input Gathering
Before beginning, collect:
1. **Revenue cohorts CSV** — at minimum: customer ID, plan/tier or spend
bracket, acquisition date, acquisition source, revenue in the last 90 days.
More fields (device, geography, feature usage) improve segmentation quality.
2. **Pricing tiers doc** — current pricing structure including all plan names,
prices, and feature gates. If pricing is informal or undocumented, write it
down now before analysis.
3. **Customer surveys (optional)** — any willingness-to-pay or NPS data.
Useful for generating pricing hypotheses but not required for cohort
segmentation.
4. **Retention confirmation** — a retention curve showing stabilization (flat
baseline after the initial drop period). If this does not exist, run
`retention-phase-intervention-selector` first.
---
## Process
### Step 1 — Confirm Retention Is Stable
**What:** Verify that the retention curve has reached a stable floor before
beginning monetization work.
**Why:** Pricing experiments on a churning user base produce misleading
signals. If users are leaving because they don't find the product valuable,
raising prices will accelerate churn. If the curve is still declining, any
revenue gain from a pricing test will be offset by accelerated loss of the
user base that generates it. A flat retention curve is evidence of genuine
product-market fit — the only soil in which monetization experiments grow.
**Check:** Look at retention by weekly or monthly cohort. A healthy signal is
a curve that drops steeply in the first phase (expected) and then levels off
to a stable floor. If the curve is still declining at month three or later,
halt and address retention first.
---
### Step 2 — Classify the Monetization Archetype
**What:** Assign the product to one of three archetypes:
| Archetype | Revenue mechanism | Primary diagnostic metric |
|-----------|------------------|--------------------------|
| Subscription | Recurring fees; upsell to higher tiers | LTV by plan tier; upgrade rate |
| E-commerce | Transaction fees; repeat purchase | Annual spend bracket; repeat purchase rate |
| Ad-revenue | Impression inventory × CPM | ARPU; engagement depth per session |
Mixed models (e.g., freemium SaaS with an ad-supported tier) should be split
into their dominant revenue stream for this analysis. Pick the archetype that
accounts for more than 50% of current revenue.
**Why:** Each archetype has different pinch points in the monetization funnel,
different cohort segmentation logic, and different experiment types. Running
subscription experiments on an ad-revenue product wastes cycles. The
archetype classification determines everything downstream.
---
### Step 3 — Map the Monetization Funnel
**What:** Overlay revenue touchpoints on the customer journey map (built
during activation work). Mark every page, screen, or event where revenue is
earned or is being lost.
- **Subscription:** pricing/plan comparison page, upgrade modal, annual
discount offer, add-on upsell surfaces.
- **E-commerce:** item display pages, shopping cart, payment flow,
post-purchase upsell.
- **Ad-revenue:** every page or screen with potential inventory; pages where
inventory exists but fill rate is low; engagement entry points that increase
session depth.
Identify "pinch points" — junctures where conversion to revenue drops sharply.
These become primary experiment targets.
**Why:** Funnel mapping makes the monetization problem geometric rather than
abstract. A team that knows "our pricing page has a 3% upgrade conversion and
our add-on modal has a 0.4% click rate" can prioritize experiments on impact
data. A team guessing in the dark runs experiments on the wrong surfaces.
---
### Step 4 — Run Cohort Revenue Segmentation
**What:** Segment customers by revenue contribution using archetype-appropriate
buckets, then compute revenue per cohort and identify the highest-value
segments.
**Subscription segmentation:**
```
Cohort A: Free tier (if freemium) → $0/month
Cohort B: Starter plan → $X/month
Cohort C: Pro plan → $Y/month
Cohort D: Enterprise plan → $Z/month
```
**E-commerce segmentation (by annual spend):**
```
Cohort 1: Low spenders < $100/year
Cohort 2: Mid spenders $100–$500/year
Cohort 3: High spenders > $500/year
```
**Ad-revenue segmentation (by engagement depth):**
```
Cohort I: Light users < 5 min/session, 1–2 pages
Cohort II: Medium users 5–15 min/session, 3–8 pages
Cohort III: Power users > 15 min/session, 9+ pages
```
Cross-segment by acquisition source, geography, and device to identify which
channels produce high-value cohorts. An acquisition source that delivers 10%
of users but 40% of revenue should get more budget; an acquisition source that
delivers 30% of users but 5% of revenue warrants investigation or reduction.
**Why:** Aggregate revenue statistics hide the structure of who is actually
paying. A product with $50K MRR from 10,000 users has a very different
experiment strategy depending on whether the revenue comes from 50 power users
($1,000 each), from 5,000 mid-tier users ($10 each), or from a flat
distribution. Cohort segmentation makes the distribution visible and forces
experiment strategy to match reality.
---
### Step 5 — Identify Highest-Value Segment Characteristics
**What:** Within the highest-value cohorts identified in Step 4, identify
shared characteristics:
- Acquisition source (which channel brought them?)
- Features used (which product capabilities do they rely on?)
- Time-to-first-revenue (how quickly did they convert after signup?)
- Onboarding path (did they complete specific activation steps?)
- Company size or user role (for B2B products)
Build a short profile of the "ideal revenue customer." This profile drives
both the upsell experiments (Step 6) and acquisition channel decisions.
**Why:** The highest-value cohort is the empirical definition of your best
customer. Experiments designed to move lower cohorts toward the behavioral
patterns of the highest cohort are more likely to succeed than experiments
designed from first principles. The profile also surfaces which acquisition
channels to invest in to improve the revenue mix of new users.
---
### Step 6 — Propose Pricing Experiments by Archetype
Generate experiments appropriate to the classified archetype. For each
experiment, state the hypothesis, the primary metric being moved, and whether
it tests the pricing surface or the value delivery.
**Subscription experiments:**
1. **Three-tier anchor restructure** — Introduce or restructure to three named
tiers where the middle tier's primary function is to make the top tier
appear to be excellent value (pricing relativity / decoy effect). Dan
Ariely's Economist experiment: when a middle option at the same price as
the top option was present, 84% chose the top tier. When it was removed,
only 32% did. The middle tier does not need to be popular — it needs to
reframe the top tier.
2. **Annual discount upsell** — Offer a 15–20% annual prepay discount to
monthly subscribers. Tests whether a meaningful saving converts month-to-
month users to higher-LTV annual contracts. Confirm that annual LTV exceeds
monthly LTV × churn rate before offering the discount.
3. **Usage-based add-on** — Introduce a metered component (API calls, seats,
storage) that allows high-usage customers to expand spend without a plan
change. Tests whether there is latent willingness to pay above the current
plan ceiling in the highest-value cohort.
4. **Freemium penny-gap bridge** — If freemium users exist, do not ask them
to pay the full upgrade price. Instead, offer a time-limited trial of paid
features at no cost, then convert at the end of trial. The first dollar
collected after free trial is dramatically easier than asking for the first
dollar from a free user who has never seen paid features. Reference: 7
Minute Workout app — switching to free + in-app pro upgrade produced 300%
revenue increase despite 97% of users paying nothing.
**E-commerce experiments:**
1. **Bundle pricing** — Package two or three complementary items at a price
lower than the sum of parts. Tests whether perceived value increase drives
basket size without eroding margin. Start with item combinations most
frequently purchased together (Jaccard similarity from purchase data).
2. **Free-shipping threshold** — Set a free-shipping minimum just above the
current average order value. Tests whether customers will add one more item
to cross the threshold, raising average order value.
3. **Recommendation-driven upsell** — Surface "frequently bought with this"
recommendations at cart stage. Start with items where co-purchase rate in
data exceeds 15%. Track revenue-per-session for the recommendation variant
vs. control.
**Ad-revenue experiments:**
1. **Engagement-driven inventory expansion** — Identify product surfaces with
high engagement but no current ad inventory. Add inventory and measure
CPM and user retention impact together (not inventory alone).
2. **Direct-sold vs. programmatic mix shift** — Test whether moving a
percentage of inventory from programmatic to direct-sold increases effective
CPM. Start with the highest-engagement pages, where advertiser willingness
to pay direct premium is highest.
---
### Step 7 — Apply Pricing Relativity and Flag Pitfalls
Before finalizing any pricing experiment, apply three checks:
**Pricing relativity check (three-tier anchor):**
Does the current pricing structure present options in a way that makes the
target tier appear to be good value? If there are only two tiers (free and
paid), add a middle or high tier to anchor perception before running the
conversion experiment. The middle tier creates contrast; contrast creates
perceived value.
**Penny gap check:**
Is the experiment asking free users to pay a first dollar? If yes, the
primary experiment lever should be a time-limited paid feature trial, not a
direct price offer. The resistance between $0 and $0.01 is disproportionately
high relative to any subsequent price increase. Convert free users to paid
trials; convert trial users to subscribers.
**Reactive price cut warning:**
If the instinct is to lower prices to increase volume, require a test first.
The Qualaroo case is the canonical counter-example: the hypothesis that lower
prices would drive more upgrades failed. The hypothesis that higher prices
would attract better customers succeeded three times in sequence. In B2B and
professional services markets, price functions as a quality signal. Lowering
price can actively suppress demand from the target segment. Always test; never
assume elasticity.
**Personalization backfire check:**
If any experiment uses behavioral data to deliver personalized pricing or
recommendations, apply the Target test: would a reasonable user find this
recommendation intrusive if they discovered how it was generated? Personalized
recommendations that feel natural increase revenue; recommendations that feel
like surveillance can permanently damage trust and revenue. Test personalization
experiments with explicit user consent framing where the data source is
sensitive.
---
### Step 8 — Rank and Emit Outputs
**What:** Rank all proposed experiments by expected impact on revenue per
retained user. Emit two output documents.
**Ranking criteria (in order):**
1. Size of the cohort affected (larger cohorts = higher potential impact)
2. Proximity to a confirmed pinch point in the funnel (known leakage = higher
confidence)
3. Speed of signal (experiments that produce a clean revenue signal in 2–4
weeks rank above experiments requiring 8+ weeks of accumulation)
4. Reversibility (experiments where the control can be fully restored rank
above permanent pricing changes)
**Output 1: `monetization-experiment-backlog.md`**
One row per experiment with: experiment name, hypothesis, primary metric,
cohort affected, estimated signal timeline, ranking, and pitfall flags (penny
gap / reactive cut / personalization).
**Output 2: `revenue-cohort-analysis.md`**
One section per archetype cohort with: cohort definition, current revenue
contribution, characteristics of the segment (acquisition source, feature
usage, time-to-revenue), and the specific experiments from the backlog that
target this cohort.
---
## Key Principles
1. **Monetize retained users, not churning ones.** Pricing experiments on
a churning base produce short-term revenue at the cost of permanent LTV
damage. Confirm the retention curve is stable before any pricing work.
2. **Cohort revenue beats aggregate revenue for diagnosis.** A single ARPU
number hides the distribution. The distribution reveals where experiments
should focus.
3. **Pricing relativity — three tiers make the middle look reasonable and the
top look like a bargain.** The decoy tier's job is not to sell; its job is
to reframe. Never run a freemium-to-paid conversion experiment without a
three-tier structure already in place.
4. **Lower price does not always mean more volume.** In technology and
professional services, price functions as a quality signal. Test the
elasticity assumption before cutting. The Qualaroo pattern (raise prices,
get better customers) is more common than teams expect.
5. **Penny gap is real — free-to-paid friction is disproportionate.** The
distance between $0 and $0.01 is not $0.01 psychologically. Bridge it with
a paid feature trial, not a direct price ask.
6. **Personalization drives revenue until it feels invasive.** Amazon's
recommendation engine increases revenue. Target's pregnancy model destroyed
trust. The line is whether the user perceives the recommendation as helpful
or surveillance. Test personalization experiments on segments where the
data source is non-sensitive first.
---
## Examples
### Example 1 — SaaS Three-Tier Restructure
**Situation:** A B2B SaaS product has two plans: Free (0) and Pro ($49/month).
Free-to-Pro upgrade rate is 2.1%. The team's instinct is to lower Pro to $29.
**This skill's process:**
- Cohort segmentation shows that 80% of Pro revenue comes from teams using 3+
seats. Single-seat users churn at 2× the rate of multi-seat users.
- Pricing relativity check: two tiers provide no anchor. The jump from $0 to
$49 feels uncalibrated.
- Proposed restructure: Starter ($29/mo, 1 seat, limited features), Pro
($49/mo, 5 seats, full features), Team ($99/mo, 15 seats + admin + API).
The Team tier anchors Pro as the obvious middle choice.
- Penny gap bridge: add a 14-day full-feature trial before asking for payment.
- Experiment 1: A/B test three-tier page vs. two-tier page. Primary metric:
upgrade rate from free. Expected signal: 3 weeks.
- Reactive cut warning applied: Do not lower Pro to $29 until the restructure
test runs. Qualaroo's experience suggests price sensitivity may be lower
than assumed.
**Expected outcome:** The three-tier structure increases Pro upgrade rate. The
Team tier qualifies enterprise conversations the two-tier structure was not
having.
---
### Example 2 — E-commerce Bundle Optimization
**Situation:** An e-commerce app sells meal-prep ingredients. Average order
value is $38. Free shipping is offered at $50. Cohort analysis shows that mid-
spenders ($100–$300/year) make up 60% of customers but only 28% of revenue.
High-spenders (>$300/year) make up 12% of customers and 54% of revenue.
**This skill's process:**
- High-spender profile: acquired via recipe content channel, purchase 3+
items per order, use the "meal plan" feature, first purchase within 7 days
of signup.
- Funnel pinch point: 45% of carts are abandoned at payment. Average
abandoned cart value is $33 — just below the free-shipping threshold.
- Experiment 1: Surface a "add $12 more for free shipping" prompt in cart for
carts between $30–$48. Tests whether the threshold drives upsell without
requiring a price change. Primary metric: average order value for carts in
that range.
- Experiment 2: Bundle top-3 co-purchased items (Jaccard co-purchase rate
>20%) as a "starter kit" at $42 — above free-shipping threshold. Tests
whether bundle reduces decision friction and drives first purchase above
threshold.
- Personalization check: recommendations based on purchase history are low-
risk. Recommendations based on demographic inference (age, household
composition) require explicit data disclosure.
**Expected outcome:** Free-shipping threshold prompt increases average order
value for the $30–$48 cart bracket by 15–25%. Bundle reduces decision time
and improves conversion for new users.
---
## References
- Ellis, Sean and Brown, Morgan. *Hacking Growth.* Chapter 8: Monetization.
Crown Business, 2017.
- Ariely, Dan. *Predictably Irrational.* The Economist subscription pricing
experiment (decoy/anchor effect).
- Kopelman, Josh. "The Penny Gap." (Venture capital blog post; referenced in
Ellis & Brown Chapter 8.)
- Cialdini, Robert. *Influence.* Price-as-quality-signal principle (referenced
in Ellis & Brown Chapter 8).
- Reichheld, Fred. "Prescription for Cutting Costs." Bain & Company (5%
retention = 25–95% profit uplift formula, Chapter 7 context).
---
## License
Content derived from *Hacking Growth* (Ellis & Brown) under fair use for
educational commentary. Skill text licensed CC-BY-SA 4.0. Pipeline code MIT.
---
## Related BookForge Skills
Install the prerequisite and companion skills:
```
clawhub install bookforge-retention-phase-intervention-selector
```
Prerequisite — retention must be stable before monetization experiments begin.
```
clawhub install bookforge-growth-experiment-prioritization-scorer
```
Score the monetization experiment backlog using ICE (Impact, Confidence, Ease)
to sequence the backlog across sprint cycles.
```
clawhub install bookforge-north-star-metric-selector
```
The monetization North Star (revenue per retained user) may differ from the
growth North Star (new user acquisition). Confirm alignment before running
experiments.
Use this skill to install a disciplined weekly experimentation cadence for a growth team — the 4-stage high-tempo cycle (Analyze, Ideate, Prioritize, Test) w...
---
name: high-tempo-experiment-cycle
description: "Use this skill to install a disciplined weekly experimentation cadence for a growth team — the 4-stage high-tempo cycle (Analyze, Ideate, Prioritize, Test) with a timeboxed growth review meeting agenda, idea capture template, and cadence benchmarks. Produces an experiment-cycle-runbook the team can follow from Monday morning, a weekly growth review agenda with named roles and materials, and an idea capture template that feeds the prioritization queue. Triggers when a growth PM asks 'how do I run a weekly growth meeting?', 'our experiments are ad-hoc, how do we systematize?', 'growth meeting agenda', 'high-tempo testing', 'experiment cadence', 'how often should we test?', 'how many experiments per week?', 'weekly growth review', 'Sean Ellis growth meeting', 'growth cycle', 'growth rituals', 'how do I install a test rhythm', or 'our growth team has no rhythm'. Also activates for 'we keep running tests but nothing compounds', 'scattershot experiments', 'growth team operating system', or 'idea bank template'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/high-tempo-experiment-cycle
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [4]
tags:
- growth
- experimentation
- team-operations
- rituals
- startup-ops
depends-on: []
execution:
tier: 1
mode: plan-only
inputs:
- type: document
description: >
Team context (team-context.md) describing team size, current experiment
cadence (if any), available tools (analytics, A/B platform), and known
constraints (meeting time, review authority).
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set. Plan-only — produces a runbook, meeting agenda, and idea
capture template the team can start using the following Monday.
discovery:
goal: >
Install a disciplined weekly experimentation operating rhythm so experiments
compound into learning rather than dissipating as ad-hoc one-offs.
tasks:
- "Gather team context"
- "Tailor the 4-stage cycle (Analyze → Ideate → Prioritize → Test) to the team"
- "Draft the weekly growth review meeting agenda with timeboxes"
- "Produce an idea capture template"
- "Set a cadence target based on team size and stage"
- "Emit runbook, agenda, and idea capture template"
---
# High-Tempo Experiment Cycle
Install a disciplined weekly experimentation operating rhythm for a growth team. The cycle converts ad-hoc, isolated tests into a compounding learning machine — each week's results directly feed the next week's experiments, producing gains that grow exponentially rather than evaporating.
The core insight: companies that grow fastest are the ones that learn fastest. A 5% monthly improvement in a key metric compounds to an 80% annual gain. But compounding only happens when experiments build on each other, and that requires a repeatable process.
## When to Use
Use this skill when:
- A growth team is running experiments without a disciplined weekly rhythm
- Tests are being run ad hoc, results are reviewed inconsistently, and learning doesn't carry forward
- A growth PM or head of growth wants a concrete operating cadence to install from day one
- A team is debating how many experiments to run per week
- Leadership wants to know what a "growth meeting" should look like
- A team is hitting the scattershot anti-pattern: lots of effort, no compounding insight
Do not use this skill if the product has not yet established must-have status with a meaningful user segment — install the `product-market-fit-readiness-gate` first.
## Context and Input Gathering
Before producing deliverables, read `team-context.md` and extract:
1. **Team size and roles** — who is on the team (growth lead, data analyst, engineer, marketer, designer), which roles are missing or shared
2. **Current cadence** — how many experiments are running per week, whether there is a regular review meeting, how experiment ideas are submitted today
3. **Tool stack** — what analytics platform is in use (Mixpanel, Amplitude, GA4, etc.), whether an A/B testing platform exists, how ideas are tracked (spreadsheet, Notion, Jira, etc.)
4. **Meeting constraints** — what day and time can the team meet weekly for 60 minutes
5. **Blocking throughput today** — is the bottleneck ideation (too few ideas), prioritization (no scoring), test velocity (engineering backlog), or review (no structured debrief)?
If `team-context.md` is not available, ask the user for these five data points before proceeding.
## Process
### Step 1: Gather Team Context
Read the input document and extract the five data points above. Map the current state to one of three maturity stages:
- **No cadence:** No regular meeting, tests run when someone has time, results reviewed informally
- **Partial cadence:** A meeting exists but lacks structure, or experiments are tracked but not scored
- **Broken cadence:** A process was installed but fell apart — identify which stage broke down
**Why this step cannot be skipped:** The deliverables produced in Steps 3–6 must be calibrated to the team's actual constraints. A 4-person team at 2 tests/week needs a different runbook than a 12-person team targeting 15/week. Producing generic outputs without reading context produces outputs the team will not use.
### Step 2: Install the 4-Stage Cycle
Explain the four stages to the team and define what happens in each. Every stage is required — skipping any one breaks the loop.
**Stage 1 — Analyze (Data Analysis and Insight Gathering)**
Before ideating, the data analyst builds cohort reports and funnel drop-off reports, and the team identifies the most significant gaps or opportunities in the current data. Marketing or research members run any needed user surveys or interviews. All findings are compiled and distributed to the team before the meeting.
*Why this stage cannot be skipped:* Without a structured analysis, the Ideate stage produces guesses rather than data-driven ideas. The cycle's compounding power comes from each round of experiments informing the next — that only happens if results are systematically reviewed before new ideas are generated.
**Stage 2 — Ideate (Idea Generation)**
All team members submit experiment ideas to a shared idea bank using the standardized template (see Step 4). Self-censorship is discouraged. Volume is the goal — most experiments will not produce large wins; finding the few that do is a numbers game. Ideas should eventually flow from colleagues outside the core team and from customers, not only from team members.
*Why this stage cannot be skipped:* Without a formal idea bank that anyone can contribute to at any time, idea generation becomes bottlenecked to whoever shouts loudest in the meeting. The bank decouples contribution from discussion and ensures the team always has a prioritized backlog to draw from.
**Stage 3 — Prioritize (Experiment Prioritization)**
Each idea is scored by its submitter before it enters the pipeline. The ICE framework (Impact, Confidence, Ease — each 1–10, averaged) provides a single comparable score across all ideas. The growth lead reviews scores before the meeting and may suggest modifications. The ranked list is the starting agenda for the selection segment of the growth meeting. ICE score guides but does not dictate — the team can override after discussion.
*Why this stage cannot be skipped:* Without a pre-meeting scoring step, the weekly meeting becomes a debate about which ideas to try rather than a focused decision about the top-ranked options. Prioritization done in the meeting burns the entire meeting's time. Done beforehand, it makes the selection segment crisp and fast.
**Stage 4 — Test (Running Experiments)**
Selected experiments are moved to an "Up Next" queue. Each experiment is assigned an owner responsible for getting it launched. Tests are designed to reach statistical validity (99% confidence level is recommended — at 95%, one in twenty "winning" tests may be a false positive). When results are inconclusive, the default is to stay with the control. Completed test results feed directly back into the next Analyze stage.
*Why this stage cannot be skipped:* The test stage is where the cycle closes. Without it, the analyze and ideate stages produce plans that never ship. Without the 99%-confidence rule and the "control wins ties" rule, the team accumulates false positives that send it down wrong paths.
### Step 3: Draft the Weekly Growth Review Meeting Agenda
The growth meeting runs for 60 minutes, held on a fixed day each week. The book's recommended day is Tuesday, which gives the team Monday to finish prep work. Adapt the day based on the team's constraints, but fix it — a floating meeting day breaks the rhythm.
**Monday prep (growth lead + data analyst):**
- Growth lead reviews experiment velocity from prior week against the team's weekly target
- Data analyst updates the North Star metric and all key metrics being tracked
- Growth lead compiles concluded test data and writes a summary of findings (positive, negative, focus area)
- Combined into a meeting agenda document shared with the team before Tuesday
**60-Minute Tuesday Meeting Agenda:**
| Segment | Duration | Owner | Purpose |
|---|---|---|---|
| Metrics review and update focus area | 15 min | Growth lead | North Star metric status, key positives, key negatives, current focus area (confirm or change) |
| Review last week's testing activity | 10 min | Growth lead | Velocity vs. goal, which tests did not launch and why |
| Key lessons learned from analyzed experiments | 15 min | Growth lead + data analyst + experiment owners | Conclusive results, preliminary results, implications for next steps |
| Select growth tests for current cycle | 15 min | Full team | Discuss nominated experiments, assign owners, set target launch dates |
| Check growth of idea pipeline | 5 min | Growth lead | Pipeline health — number of ideas in queue, call for more ideation if volume is low |
**Critical rule:** The meeting is not for brainstorming. All ideas must be submitted before the meeting via the idea capture template. If brainstorming sessions are needed (e.g., when entering a new focus area), run them separately — monthly is a reasonable cadence.
**Attendees:** Growth lead (facilitator), data analyst, engineer(s), marketer, designer. Meeting notes and the agenda document live in shared cloud storage (Google Docs, Notion, Confluence) as a living document updated each week.
### Step 4: Create the Idea Capture Template
The template must be filled out by the idea submitter before the idea enters the pipeline. Standardizing the format eliminates ambiguity and makes the prioritization meeting fast.
**Required fields:**
```
IDEA NAME:
(Max 50 characters — brief and specific)
DESCRIPTION:
(Cover: Who is targeted? What will be built or changed? Where in the product or funnel
does it appear? When does it trigger for users? Why should it improve the metric?
How will it be tested — A/B test, feature flag, new channel, copy change?)
HYPOTHESIS:
(Simple cause-and-effect: "By [doing X], [metric Y] will improve by [estimated amount].")
METRICS TO MEASURE:
(Primary metric. List downstream metrics that may be affected — improvements in one
metric sometimes come at the expense of others.)
ICE SCORE:
Impact (1–10):
Confidence (1–10):
Ease (1–10):
Average:
```
*Why standardization matters:* Vague submissions ("our sign-up form is too hard, let's simplify it") cannot be prioritized or evaluated. The template forces clarity on what will be tested, how success will be measured, and how much confidence the submitter has. This is what keeps the idea bank usable at scale.
### Step 5: Set the Cadence Benchmark
Match the team's weekly experiment target to its actual size and stage. Do not set an aspirational target that the team cannot deliver — missed targets demoralize faster than any failed experiment.
| Team Stage | Weekly Target | Notes |
|---|---|---|
| Early (2–4 people, first cycle) | 1–2 experiments/week | Build process discipline before building volume |
| Growing (5–8 people, process established) | 3–8 experiments/week | Increase velocity only after the cycle is running cleanly |
| Mature (10+ people, dedicated tooling) | 10–20 experiments/week | Leading teams run 20–30/week at full maturity |
The ramp from 1–2 to 20+ is a multi-month or multi-year journey. Starting at high volume before the process is solid leads to poor test design, invalid results, and team burnout. Set the initial target based on what the team can implement cleanly, not what they aspire to eventually reach.
### Step 6: Define Sprint Length
Default sprint length is one week. Two-week sprints are permitted if meeting time is genuinely constrained, but they carry a cost: the Analyze stage relies on having fresh results from the previous cycle. With two-week sprints, results are four weeks old by the time they influence the next wave of ideas — the compounding effect weakens.
*Why one-week sprints are strongly preferred:* Each week, the team learns something it can act on the following week. Over a 12-week quarter, that is 12 learning cycles. With two-week sprints, it is 6. The asymmetry compounds over time.
If the team cannot run a full weekly cycle, the minimum viable version is: a fixed weekly meeting with at least one experiment launched each week, even if the Ideate and Analyze stages are lighter.
### Step 7: Emit Deliverables
Write three output files tailored to the team's context:
1. **`experiment-cycle-runbook.md`** — The team's operating manual: the 4-stage cycle with stage-by-stage activities, who does what, how prep works, and the sprint schedule. Should be short enough to read in 5 minutes.
2. **`weekly-growth-review-agenda.md`** — The meeting agenda as a reusable template: date field, the five agenda segments with timeboxes, named owners, materials required beforehand, and space for notes per segment.
3. **`idea-capture-template.md`** — The blank template with all required fields and brief guidance for each. Teams copy this for each new idea submission.
## Key Principles
**Rhythm over volume.** A team running 3 disciplined experiments per week, each properly designed and reviewed, generates more compounding insight than a team running 10 chaotic ones that are never properly analyzed. Install the rhythm first; velocity follows. Volume is a byproduct of a well-oiled process, not a starting condition.
**Analyze first, or experiments don't compound.** The analyze stage is not optional prep — it is what makes the cycle a learning machine rather than a test-running machine. Skipping it means each experiment starts from zero instead of building on the last. The compounding effect that drives exponential growth (a 5% monthly improvement compounds to 80% annually) only activates when each cycle learns from the previous one.
**The meeting is a forcing function, not a status update.** The weekly growth meeting has one purpose: agree on what to test next. Metrics are reviewed so the team knows what to optimize. Results are reviewed so the team learns what worked. The selection segment is the meeting's payoff. If the meeting drifts into status reporting, refocus it. Keep the meeting to 60 minutes — if it regularly runs over, segments are not being respected.
**One-week sprints keep errors from ossifying.** A two-week cycle means a bad experiment runs for two weeks before the team recalibrates. A one-week cycle means the team corrects course every seven days. The faster the feedback loop, the smaller the mistakes. Over a quarter, a one-week cadence produces 12 learning cycles; a two-week cadence produces 6.
**Anti-pattern — scattershot experimentation.** Running tests without the cycle — ad hoc, without scoring, without structured review — burns team effort and produces no compounding insight. Each test is an island. Teams in this pattern often conclude that systematic testing does not work for their company when the actual problem is the absence of the cycle, not the absence of good ideas. The cycle is the antidote.
**The idea bank is the team's most valuable asset.** A deep, well-scored pipeline means the team never wastes meeting time debating what to try. It also means that a failed experiment is never a dead end — the next ranked idea is already ready. Protect the pipeline: if ideation volume drops, the whole cycle slows.
**Statistical discipline protects the learning.** Each experiment run comes at the cost of another candidate not being tested. A poorly designed test — one that reaches false-positive results due to insufficient confidence thresholds — sends the team down a wrong path and wastes future cycles. Set a 99% confidence threshold on A/B tests. When results are genuinely inconclusive, the control version wins.
## Examples
### Example 1: Series A Team of 4, Starting from Zero
**Scenario:** A SaaS startup with 4 people on the growth team — a growth PM (also the growth lead), a data analyst, a full-stack engineer, and a marketer. No current cadence. Experiments have been run ad hoc when one of the founders had an idea. Results are rarely reviewed. The team has Mixpanel for analytics and uses Google Sheets for experiment tracking.
**Trigger:** The growth PM says: "We're running tests but nothing compounds. We need a system."
**Process:**
1. Growth PM reads `team-context.md` and identifies: no meeting cadence, no idea scoring, analytics in place but no formal funnel reporting.
2. The 4-stage cycle is installed with a one-week sprint. The data analyst is tasked with building a weekly funnel report to distribute every Monday.
3. Meeting scheduled for Tuesdays at 10am. Monday prep: growth lead + analyst prepare the metrics brief and compile any concluded test results.
4. A Google Sheet is set up as the idea pipeline with the idea capture template as each row's structure.
5. Cadence target: 2 experiments per week. This is achievable given the engineer's bandwidth.
6. First deliverable: a 2-page runbook the team reads before the first meeting.
**Output:** `experiment-cycle-runbook.md` (tailored to a 4-person team with Google Sheets pipeline), `weekly-growth-review-agenda.md` (Tuesday, 10am, 60 min, 5 segments), `idea-capture-template.md` (Google Sheet row template).
**Expected outcome at 4 weeks:** Team has run 6–8 experiments. At least one conclusive result is in the bank. The meeting runs to time consistently. Ideas are flowing into the pipeline between meetings rather than only surfacing in the meeting itself.
### Example 2: Series B Team of 12, Shifting from Chaos to Discipline
**Scenario:** A marketplace startup with 12 people across two growth squads — one focused on acquisition, one on retention. Each squad runs 5–6 experiments per week but without a shared scoring system or a consistent review meeting. Experiments from the acquisition squad often conflict with retention squad tests because there is no shared pipeline. The team uses Amplitude and Optimizely.
**Trigger:** The head of growth says: "We're running 10 tests a week but I have no idea which ones are actually working or why. We need a shared operating system."
**Process:**
1. Context review reveals: dual squad structure, no shared idea pipeline, no unified meeting, high test volume but low review discipline.
2. The skill recommends a unified weekly meeting with both squad leads presenting their results, plus a shared idea pipeline in Optimizely's project management feature.
3. ICE scoring is introduced as a required pre-submission step. Squads' ideas compete for prioritization in a shared queue filtered by focus area.
4. Cadence target raised from 10/week to 15/week — but with a new quality gate: each experiment must have a pre-written hypothesis and a designated metrics owner before it launches.
5. Meeting agenda adapted: the "lessons learned" segment expands to 20 minutes to cover both squads, the "select tests" segment includes a focus-area filter to ensure squads aren't running conflicting tests on the same funnel stage.
**Output:** `experiment-cycle-runbook.md` (dual-squad variant with shared pipeline protocol), `weekly-growth-review-agenda.md` (adapted 60-min agenda with dual squad coverage), `idea-capture-template.md` (with squad field and focus-area tag added).
**Expected outcome at 4 weeks:** Both squads have a shared idea bank with 30+ ideas scored and ranked. Experiment conflicts across squads have been eliminated via the shared focus-area tag. Velocity has increased from ~10/week to 13–15/week while average test quality has improved.
## Common Failure Modes to Diagnose
When the cycle is installed but is not producing results, check these failure modes first:
| Symptom | Root cause | Fix |
|---|---|---|
| Meeting runs over 60 minutes | Brainstorming happening in the meeting | Move ideation to async; enforce the "no brainstorming in the meeting" rule |
| Idea pipeline is always empty before the meeting | Team is submitting ideas only in the meeting | Make idea submission a daily habit; growth lead primes the pipeline between meetings |
| ICE scores are inflated (everything is 8+) | Submitters are not calibrating against past experiments | Growth lead reviews and challenges scores before the meeting |
| Tests are never concluded | Insufficient sample size, or no one owns the conclusion | Assign each test a target conclusion date and a metrics owner at launch |
| Results are never acted on | Analyze stage skipped; conclusions not fed back into ideation | Make Monday prep mandatory; growth lead writes a summary of implications, not just results |
## Calibration Questions
Before finalizing the runbook, ask the growth lead these calibration questions to ensure the deliverables fit the team's reality:
1. **Who has authority to launch an experiment without additional approval?** If every experiment requires sign-off from a VP or CTO, the cycle will slow to a crawl. Identify what can be shipped autonomously and what requires a fast-track approval path.
2. **What is the team's current statistical tooling?** Teams without an A/B testing platform (Optimizely, VWO, split.io, LaunchDarkly, etc.) will need to route experiments through engineering feature flags or run sequential rather than concurrent tests. Adjust the cadence target accordingly.
3. **Is there an existing experiment backlog?** If ideas already exist (in Slack threads, docs, or a PM's notebook), start by formalizing them into the idea capture template before the first meeting. A seeded pipeline makes the first meeting's selection segment immediately productive.
4. **Who will own the Monday prep?** If the growth lead and data analyst are the same person, or if the analyst is part-time, Monday prep will be the bottleneck. Plan for it explicitly in the runbook.
## References
- ICE scoring details and worked examples: `../../references/ice-scoring-guide.md` *(if built)*
- Meeting facilitation best practices: `../../references/meeting-facilitation.md` *(if built)*
- Statistical validity for A/B tests: 99% confidence level rule and "control wins ties" rule come from Chapter 4.
- Source chapter: *Hacking Growth* Chapter Four, "Testing at High Tempo" — full cycle, meeting agenda, and cadence benchmarks.
## License
CC-BY-SA-4.0. Derived from *Hacking Growth* by Sean Ellis and Morgan Brown. You are free to share and adapt this skill with attribution and under the same license.
## Related BookForge Skills
This skill works within a system. Install these companion skills for the full operating foundation:
```
# The metric the cycle orients around
clawhub install bookforge-north-star-metric-selector
# The ICE scoring step of the cycle, as a standalone tool
clawhub install bookforge-growth-experiment-prioritization-scorer
# Install a team before installing a cadence
clawhub install bookforge-growth-team-structure-planner
# Verify PMF before starting the cycle
clawhub install bookforge-product-market-fit-readiness-gate
```
Use this skill to design a cross-functional growth team for a Series A–B scaling startup and produce a concrete proposal the growth lead can bring to their e...
---
name: growth-team-structure-planner
description: "Use this skill to design a cross-functional growth team for a Series A–B scaling startup and produce a concrete proposal the growth lead can bring to their executive team. Recommends product-led (growth team embedded in product) vs independent (standalone growth team reporting to CEO or VP Growth) based on org context, assigns named roles (growth lead, PM, engineer, designer, data analyst, marketer), defines executive sponsorship requirements, and drafts a kickoff meeting agenda. Triggers when user asks 'how do I structure a growth team?', 'should growth be under product or standalone?', 'who should be on my growth team?', 'what roles does a growth team need?', 'how do I pitch a growth team to my CEO?', 'growth team kickoff', 'first growth hire', 'growth team model', or 'building a growth team from scratch'. Also activates for 'our marketing and product aren't aligned on growth', 'we need a growth function but don't know how to structure it', 'growth team charter', or 'growth team proposal'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/growth-team-structure-planner
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [1]
tags:
- growth
- team-structure
- startup-ops
- organizational-design
depends-on: []
execution:
tier: 1
mode: plan-only
inputs:
- type: document
description: >
Org context describing company stage (Series A–B), existing functions
(product, engineering, marketing, data), reporting lines, known political
constraints, and the CEO/exec team's appetite for growth investment.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set. Plan-only — produces a proposal document and kickoff agenda
for human review and exec presentation. No code execution.
discovery:
goal: >
Produce a defensible growth team proposal (team model, roles, sponsorship,
kickoff agenda) that a growth lead can take to their CEO.
tasks:
- "Gather org context from user"
- "Recommend product-led vs independent model with rationale"
- "List named roles with job description stubs"
- "Define executive sponsorship requirements"
- "Draft kickoff meeting agenda"
- "Produce growth-team-proposal.md and kickoff-agenda.md"
---
# Growth Team Structure Planner
Design a cross-functional growth team proposal that a Growth PM or Head of Growth
can bring to their executive team. Produces two ready-to-present documents:
`growth-team-proposal.md` (model rationale, roles, sponsorship plan) and
`kickoff-agenda.md` (first team meeting, structured to align on North Star,
growth levers, and velocity commitment).
## When to Use
Use this skill when you are tasked with building or proposing a growth team and
need to make an org design decision (product-led vs independent), staff it with
the right cross-functional roles, secure executive buy-in, and run a first meeting
that sets the team up to experiment immediately.
This skill is appropriate before running any growth experiments. It precedes
`north-star-metric-selector` (metric alignment) and `high-tempo-experiment-cycle`
(weekly experimentation cadence). Use it when you have product-market fit signals
but no structured growth function yet.
Not appropriate for: teams already operating with a defined growth process, or
companies pre-product-market fit (growth investment at that stage is premature).
---
## Context and Input Gathering
Before producing the proposal, collect this information from the user. Ask for
it directly if org-context.md has not been provided.
**7 questions to ask:**
1. **Company stage and headcount:** Series A or B? How many people total, and
roughly how many in product, engineering, marketing, and data?
2. **Existing growth ownership:** Who currently owns metrics like activation rate,
retention, and revenue? Is it distributed across departments or does someone
own it end-to-end?
3. **Reporting lines:** Does the CEO have direct involvement in growth decisions?
Is there a VP Product, VP Marketing, or VP Engineering who would be the natural
exec sponsor?
4. **Political constraints:** Are there known turf tensions between product and
marketing? Has a previous cross-functional initiative failed? Who would resist
a growth team and why?
5. **Budget and hiring authority:** Is this team assembled from existing staff,
new hires, or a mix? Does the growth lead have a headcount budget?
6. **Product complexity:** Single product or multiple? One audience or several?
This determines whether scope should be narrow (one product area) or broad
(all growth levers).
7. **Exec alignment:** Has the CEO or a C-level executive explicitly endorsed
a growth investment? Or does this proposal need to make the case from scratch?
---
## Process
### Step 1 — Classify the org (WHY: the two structural models have opposite tradeoffs; choosing wrong costs months of political friction)
Evaluate the org context against two models identified from Silicon Valley growth
team research (McInnes and Miyoshi):
**Product-Led (Functional) Model:**
- Growth team reports to a product management executive
- Scope limited to one product or one area of the product
- Works within existing hierarchy — minimal reorganization required
- Appropriate when: company has an established product org, multiple products with
distinct PMs, or when political resistance to a standalone function is high
- Examples: Pinterest (four growth subteams under product), LinkedIn, Twitter, Dropbox
**Independent (Stand-Alone) Model:**
- Growth team is separate from product; growth lead reports to VP Growth or CEO
- Authority spans all products and functional areas
- Requires reorganization or new reporting lines
- Appropriate when: single product company, CEO is the sponsor, growth must cut
across all silos without permission overhead, or company is early enough that
org structure is still fluid
- Examples: Facebook, Uber
**Decision matrix:**
| Signal | Product-Led | Independent |
|--------|-------------|-------------|
| CEO actively involved in growth decisions | Either | Prefer Independent |
| Multiple products, established PM org | Prefer Product-Led | — |
| Single product, one core metric | — | Prefer Independent |
| High cross-functional friction expected | Prefer Product-Led (less disruption) | — |
| Growth must span marketing + product + data | — | Prefer Independent |
| Series A, <50 people | — | Prefer Independent |
| Series B, 50–200 people | Prefer Product-Led | Either |
Document the recommended model and the two or three org-context signals that drove
the recommendation. This becomes Section 1 of `growth-team-proposal.md`.
### Step 2 — Define the team roster (WHY: under-specifying roles is how growth teams become everyone's side project and no one's responsibility)
List each role with name, function source (which existing team they come from or
whether they are a new hire), time commitment, and key responsibilities.
**Core six roles:**
| Role | Source | Minimum commitment | Core responsibility |
|------|--------|--------------------|---------------------|
| Growth Lead | New hire or internal promotion | Full-time | Sets focus area and objectives, runs weekly growth meeting, monitors experiment velocity, owns North Star metric |
| Data Analyst | Data/BI team | Full-time or 80% | Builds cohort reports and funnel reports, compiles experiment results, identifies metric drop-off points, prepares weekly analytics brief |
| Engineer | Engineering team | Full-time (dedicated, not on loan) | Implements experiment code, builds A/B test infrastructure, runs technical variants |
| Marketer | Marketing team | Full-time or 60% | Runs promotional channel experiments (paid, email, content), implements channel-level tests |
| Designer | Design/UX team | 50% minimum | Designs experiment variants, collects qualitative user feedback, evaluates feature usability |
| Product Manager | Product team | 50% minimum (optional at Series A) | Coordinates experiment dependencies with product roadmap, manages stakeholder alignment |
**Minimum viable team (Series A, <30 people):** Growth Lead + Data Analyst +
Engineer + Marketer. Designer and PM fold into existing roles.
**Growth team sizing examples:**
- IBM Bluemix growth team: 5 engineers + 5 operations/marketing staff
- Inman News: data scientist + 3 marketers + web developer + COO (growth lead)
- Series A typical: 4–6 people, mostly reallocated from existing departments
Document each role as a one-paragraph job description stub. This becomes
Section 2 of `growth-team-proposal.md`.
### Step 3 — Specify executive sponsorship (WHY: growth without executive sponsorship dies within six months — teams hit departmental resistance on their first cross-functional experiment and have no one to clear the path)
Growth teams need authority to cross established departmental boundaries. Without
a named exec sponsor who can resolve turf conflicts at the C-suite level, teams
get blocked by brand guidelines, product roadmap locks, and budget gatekeeping.
**Sponsorship requirements by company stage:**
- **Series A / founder-led:** CEO or founder is the sponsor. Non-negotiable.
If the CEO won't sponsor it, growth investment is premature. The sponsor
attends the kickoff meeting and the first four weekly growth meetings to
signal organizational commitment.
- **Series B / professional management team:** VP Growth, VP Product, or COO
is the sponsor. The sponsor has cross-suite authority — meaning they can
authorize the growth team to run experiments that touch marketing assets,
product features, and pricing without needing separate approvals from each
department head.
**Sponsorship operating model:** The exec sponsor is not a day-to-day manager.
Their role is: (a) clear political blockers when the growth team's experiments
conflict with other departments, (b) attend growth meeting reviews quarterly
to assess progress against North Star, and (c) protect growth team headcount
and budget from reorganizations.
Document sponsor name (or title if not yet identified), their authority scope,
their attendance commitment for the first 90 days, and the escalation path when
experiments hit departmental resistance. This becomes Section 3 of
`growth-team-proposal.md`.
### Step 4 — Define scope and operating parameters (WHY: an unbounded growth mandate paralyzes new teams — a narrow first scope builds credibility before expanding)
For the first 90 days:
1. **Scope:** Choose one area — one product, one funnel stage (e.g., activation),
or one metric (e.g., D7 retention). Do not attempt to cover all growth levers
in the first quarter.
2. **Permanence:** Propose the team as permanent, not project-based. Project-based
growth teams lose institutional knowledge when disbanded and restart from zero.
Document the intended permanence.
3. **Experiment velocity target:** Start with 1–2 experiments per week. Scale to
10–20 as team builds confidence. Document the week-1 velocity commitment.
4. **Reporting cadence:** Weekly growth meeting (all team members). Monthly
North Star review with exec sponsor. Quarterly scope reassessment.
This becomes Section 4 of `growth-team-proposal.md`.
### Step 5 — Draft the kickoff meeting agenda (WHY: the kickoff is the team's first shared contract — it aligns everyone on process, metric, and velocity before the first experiment, preventing the chaos of undirected early experiments)
The kickoff is the first growth team meeting. It should run 90–120 minutes and
produce four shared commitments.
**Kickoff agenda structure:**
**Opening (15 min)**
- Growth lead explains the growth hacking methodology: continuous cycle of
Analyze → Ideate → Prioritize → Test
- Growth lead clarifies each team member's role and what they own
**Charter review (20 min)**
- Review the recommended team model and why it was chosen
- Review scope boundaries: what is in scope for the first 90 days, what is not
- Review exec sponsorship: who the sponsor is, how escalation works
**North Star commitment (20 min)**
- Data analyst presents initial analysis: current state of the primary metric,
known drop-off points, baseline cohort data
- Team discusses and commits to the North Star metric
(if not yet selected, flag that `north-star-metric-selector` runs first)
- Growth lead documents the agreed North Star in writing during the meeting
**First experiment discussion (30 min)**
- Data analyst presents two or three high-priority areas surfaced by initial analysis
- Team generates ideas for first experiments using brainstorm format
(no filtering yet — quantity over quality)
- Team agrees on a velocity goal: how many experiments to run in week 1–2
**Cadence agreement (15 min)**
- Confirm weekly growth meeting day and time
- Confirm monthly exec sponsor review date
- Assign owner for first experiment submission
This becomes `kickoff-agenda.md`, formatted as a shareable meeting doc with
blank sections for participants to fill in during the meeting.
### Step 6 — Produce output documents (WHY: a verbal proposal evaporates — a written document survives the exec review cycle and serves as the team's founding charter)
Write two files:
**growth-team-proposal.md** containing:
- Section 1: Recommended team model with decision rationale (2–3 signals)
- Section 2: Role roster with one-paragraph stubs per role
- Section 3: Executive sponsorship — named sponsor, authority scope, 90-day attendance
- Section 4: Scope, permanence, velocity target, reporting cadence
- Section 5: What success looks like in 30 / 60 / 90 days
**kickoff-agenda.md** containing:
- Meeting details: date, attendees, facilitator
- All five agenda blocks with time allocations
- Blank fields for team to complete during the meeting:
- Agreed North Star metric: ___
- Week-1 velocity goal: ___ experiments
- First experiment owner: ___
- Weekly meeting cadence: ___
---
## Key Principles
1. **Growth without executive sponsorship dies in six months.** The first cross-
functional experiment will hit departmental resistance. Without a named sponsor
who can clear the path at the C-suite level, the team burns its credibility
on internal politics instead of experiments.
2. **Product-led vs independent is not about quality — it's about org physics.**
Neither model is better. The right model is the one that generates the least
political friction given the company's existing reporting lines and the CEO's
appetite for disruption.
3. **The engineer must be dedicated, not on loan.** Growth teams that share
engineers with product squads will lose those engineers to roadmap emergencies
within weeks. Experiment velocity requires a committed engineer who can push
code on the growth team's schedule, not product's.
4. **Start narrow, expand after wins.** The first scope should be one product
area or one funnel stage. A single, visible win in a narrow area converts
skeptical department heads into advocates faster than a broad mandate with
diffuse results.
5. **The kickoff meeting is the team's founding contract.** Every team member
must leave the kickoff with a shared understanding of: the North Star metric,
the velocity target, and who owns what. An unstructured kickoff produces
an unstructured team.
6. **Resist the urge to outsource the core.** External consultants can add
specialist expertise (e.g., a paid acquisition specialist), but internal
product knowledge cannot be hired in. The growth lead must be deeply familiar
with the product and the customer.
---
## Examples
### Example 1 — Series A SaaS, Product-Led Model
**Scenario:** 35-person Series A SaaS company. Product team (5 engineers, 2 PMs),
marketing team (3 people), one data analyst. CEO is focused on fundraising.
VP Product has strong exec presence and wants to own growth.
**Trigger:** Head of Growth (newly hired) asked to propose a team structure before
their first 30-day review.
**Process:** Org context signals point to product-led model — established PM org,
VP Product as natural sponsor, CEO bandwidth is limited, and realigning reporting
lines would require board approval. Growth lead reports to VP Product. Team roster:
growth lead (new hire) + one dedicated engineer (from product team) + marketer
(from marketing, 80% allocation) + data analyst (full-time, from existing team).
Designer is shared at 50%. Scope: activation funnel only (D7 retention is the
North Star). Velocity target: 2 experiments per week.
**Output:** `growth-team-proposal.md` with VP Product named as sponsor and
authority to approve experiments touching onboarding flow and email without
separate product team sign-off. `kickoff-agenda.md` with first experiment
discussion focused on onboarding drop-off points surfaced by data analyst.
---
### Example 2 — Series B B2C, Independent Model
**Scenario:** 90-person Series B consumer app. Four product squads (acquisition,
activation, retention, monetization), large marketing team, CEO actively engaged
in growth decisions. Marketing and product have friction over campaign attribution.
**Trigger:** CEO asks Head of Growth to propose a standalone growth function that
can run cross-funnel experiments without requiring approvals from each squad PM.
**Process:** Independent model is the clear fit — growth must span four existing
squads, CEO is the sponsor and has expressed willingness to create a new reporting
line, and the cross-departmental friction is exactly what the independent model
is designed to break through. Growth lead reports to CEO directly. Team: growth
lead + two dedicated engineers + data analyst + marketer + UX designer, all
full-time. Scope: North Star metric is weekly active engagements (WAE); first
90-day focus is top-of-funnel acquisition conversion. Velocity target: 5
experiments per week by end of month 1.
**Output:** `growth-team-proposal.md` establishing the growth team as a permanent
standalone unit with CEO sponsorship letter attached. `kickoff-agenda.md` including
all four squad PMs as observers for the first meeting to signal that the growth
team has cross-functional authority, not just advisory status.
---
## References
- `references/growth-team-model-comparison.md` — Side-by-side comparison of
product-led vs independent models with full criteria table
- `references/role-definitions.md` — Expanded job description templates for all
six growth team roles
- `references/kickoff-facilitation-guide.md` — Facilitator notes for running
the kickoff meeting including common objections and how to handle them
---
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — *Hacking Growth* by Sean Ellis and Morgan Brown.
---
## Related BookForge Skills
After structuring your team, install the operating system:
- `clawhub install bookforge-north-star-metric-selector` — pick the metric the team will orient around
- `clawhub install bookforge-high-tempo-experiment-cycle` — install the weekly experimentation cadence
- `clawhub install bookforge-product-market-fit-readiness-gate` — confirm PMF before investing in team
Browse more: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
Use this skill to run a quarterly audit on a growth program to detect and prevent growth stalls — the 18-month plateau that affects 87% of companies (HBR) an...
---
name: growth-stall-prevention
description: "Use this skill to run a quarterly audit on a growth program to detect and prevent growth stalls — the 18-month plateau that affects 87% of companies (HBR) and destroys 74% of their market cap. Reviews the North Star metric trend, channel concentration, experiment volume/cadence decay, and names the stall pattern (complacency / premature lever exit / channel dependency) with specific recovery actions. Consumes metrics history + experiment log from an existing growth operating system. Triggers when a growth lead or Head of Growth asks 'our growth has plateaued', 'we were growing then stopped', 'growth stall', 'why did our growth slow down', 'growth stall audit', 'are we at risk of stalling', 'our channels are saturating', 'experiments are slowing down', 'we keep running the same tests', 'virtuous growth cycle', 'how do I sustain growth', 'Skype growth stall', 'HBR growth stall research', or 'quarterly growth check-up'. Also activates for 'we're 18 months in and growth is flat', 'channel dependency', 'growth engine breakdown', or 'how do we reinvigorate growth'. Run this skill quarterly — stalls are preventable, but only if detected early."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/growth-stall-prevention
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [9]
tags:
- growth
- growth-stalls
- quarterly-audit
- virtuous-cycle
- startup-ops
depends-on:
- north-star-metric-selector
- acquisition-channel-selection-scorer
- growth-experiment-prioritization-scorer
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Growth metrics history (growth-metrics-history.csv) with 3+ months of
North Star metric, funnel conversion, and channel performance data.
Optional: experiment-log.md with experiment history for cadence analysis.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set + CSV history. Produces a quarterly growth-stall-risk-audit.md
with findings and prescribed recovery actions.
discovery:
goal: >
Produce a growth-stall-risk-audit.md naming the stall risk pattern (if any),
the warning signals, and concrete recovery actions — quarterly.
tasks:
- "Read growth metrics history"
- "Compute NSM trend (slope, volatility)"
- "Compute channel concentration (% growth from top channel)"
- "Compute experiment cadence trend (tests/week over time)"
- "Diagnose stall pattern (complacency / channel dependency / cadence decay)"
- "Check virtuous cycle integrity (are loops feeding each other?)"
- "Prescribe recovery actions"
- "Emit growth-stall-risk-audit.md"
---
# Growth Stall Prevention
## When to Use
You are a growth lead or Head of Growth at a Series B–C company. Your growth operating system has been running for 12–18 months. Things were working, then they weren't — or things still look fine but you want to know if a stall is approaching before it arrives.
This skill runs as a **quarterly check-up**. It is not reactive firefighting; it is structured early detection. Research on 87% of companies across HBR's growth-stall study shows that stalls almost always sneak up *after* a period of strong growth, not during a weak patch. The team that feels the least urgency is usually the team most at risk.
Run this skill when any of the following apply:
- Month-over-month North Star metric growth has visibly flattened over the past 6–8 weeks
- Someone on the leadership team has asked "why did growth slow down?" without a clear answer
- The growth team is running fewer experiments this quarter than last quarter, without a deliberate reason
- More than 60% of new acquisition is coming from a single channel
- The growth operating system is 12+ months old and has never been audited for stall risk
- You want to run a scheduled quarterly review (the highest-value use of this skill)
Do not use this skill as a substitute for `north-star-metric-selector` (which selects the metric) or `acquisition-channel-selection-scorer` (which scores and selects channels). This skill assumes those are already in place and audits their health.
---
## Context and Input Gathering
Before running the audit, collect the following:
**Required:**
- `growth-metrics-history.csv` — at minimum 3 months of weekly or monthly data. Preferred: 6+ months. Must include: North Star metric value by period, funnel conversion rates (acquisition → activation → retention → monetization → referral), and channel breakdown (what % of new acquisition came from each channel per period).
**Optional but strongly recommended:**
- `experiment-log.md` — a record of experiments run by week or month, ideally including experiment name, hypothesis, result, and status (running / concluded / doubling down). Even a rough count of tests per week is sufficient for cadence analysis.
If experiment-log data is not available, ask the growth lead to estimate: "How many experiments did the team run in the last full quarter? How does that compare to the quarter before?" A single estimate is sufficient to flag cadence decay.
If metrics history covers fewer than 3 months, note this as a data gap in the audit output — insufficient history limits slope confidence — and proceed with what is available.
---
## Process
### Step 1: Read the Metrics History
Read `growth-metrics-history.csv` in full. Parse out:
- North Star metric (NSM) value per period
- Week-over-week or month-over-month growth rate per period
- Channel split per period (what % of acquisition came from each channel)
- Funnel conversion rates per stage: acquisition → activation → retention → monetization → referral
**Why:** The raw data must be read before any diagnosis. A stall can be invisible in a single snapshot but clear in a trend. Reading the full history surfaces the shape of growth: accelerating, flat, or declining, and where in the funnel the weakness lives.
---
### Step 2: Compute the NSM Trend
Calculate two rolling windows:
- **Short window:** average NSM growth rate over the most recent 6 weeks (or 2 months)
- **Long window:** average NSM growth rate over the full history (12+ weeks)
Compare the two. Flag the trend as:
- **Healthy:** short window growth rate ≥ long window rate (growth is at least holding)
- **Decelerating:** short window rate is 20–40% below long window rate (early warning)
- **Stalled:** short window rate is >40% below long window rate, or flat/negative (active stall)
Also compute volatility: if NSM swings more than ±15% week-to-week without a corresponding experiment explanation, flag instability.
**Why:** The NSM is the single number that reflects whether the virtuous cycle is compounding. A declining NSM slope is almost always a lagging indicator — something upstream (cadence, channel health, or a broken funnel link) degraded first. Catching the slope early gives recovery time.
---
### Step 3: Compute Channel Concentration
For each channel, calculate its share of total new acquisition averaged across the most recent full quarter.
Flag if:
- **Any single channel exceeds 60% of acquisition:** high concentration risk
- **Top two channels together exceed 80%:** moderate concentration risk
- **Channel share has shifted >15 percentage points** in either direction quarter-over-quarter: trend change worth investigating
The 60% threshold is the decision criterion from the source research: a single channel carrying more than 60% of acquisition means one platform rule change, one algorithm update, or one cost spike can functionally shut down growth.
**Why:** Channel dependency is the most dangerous stall pattern because it feels like success. A company growing fast on a single paid channel has no warning signal until the channel degrades. Viddy reached a $370M valuation with massive Facebook dependency, then a single News Feed algorithm change collapsed its user base from 50 million to under 500,000 per month. The dependency was structural, not visible in growth metrics until it was too late.
---
### Step 4: Compute Experiment Cadence Trend
From `experiment-log.md` (or the growth lead's estimate), compute:
- Average experiments per week in the most recent full quarter
- Average experiments per week in the prior quarter
- Direction of change: up, flat, or down
Flag if:
- Experiment volume declined more than 30% quarter-over-quarter without a documented strategic reason (e.g., deliberate sprint pause)
- Fewer than 1 experiment per week on average in the most recent quarter for a team of 3+ people
- The backlog of untested ideas has fewer than 20 items
**Why:** Cadence decay is the leading indicator that precedes NSM decline. In the GrowthHackers.com case, the team ran fewer than 10 experiments in an entire quarter while traffic plateaued — the drop in experiment volume preceded and caused the growth stall. By the time the NSM shows a problem, cadence has usually been declining for weeks or months. Tracking volume directly short-circuits the lag.
---
### Step 5: Diagnose the Stall Pattern
Using the outputs of Steps 2–4, assign a diagnosis. A program can carry multiple patterns simultaneously.
**Pattern A — Complacency (premature deceleration)**
Indicators: NSM trend decelerating or stalled; experiment cadence declining; no channel concentration issue. Root cause: the team achieved strong growth and allowed administrative routine to crowd out growth work. The false confidence that "growth is assured" replaced the urgency to keep experimenting. Recovery: reset the minimum cadence floor, rebuild the idea backlog, and explicitly re-prioritize growth work over administrative tasks in the weekly meeting.
**Pattern B — Premature Lever Exit (failure to double down)**
Indicators: NSM trend decelerating; experiment volume may be normal, but experiments are exploring new territory instead of fully exploiting proven wins. Diagnostic question: "When did the team last run a follow-on experiment on the highest-performing lever from last quarter?" If the answer is "we moved on," this is Pattern B. Recovery: apply the Battleship rule — when you get a hit, pursue it until the ship sinks. Identify the top 3 proven levers and assign explicit follow-on experiments before exploring new ground.
**Pattern C — Channel Dependency**
Indicators: channel concentration flag triggered (>60% from one source); NSM may still be healthy but the risk is structural. Recovery: immediately begin parallel channel development. Re-invoke `acquisition-channel-selection-scorer` to score and prioritize 2–3 new channel candidates. Begin at Discovery phase with small budget; do not wait until the dominant channel degrades.
**If no pattern is flagged:** NSM trend healthy, no concentration risk, cadence flat or growing. Output a clean bill of health with the recommendation to re-run in one quarter.
---
### Step 6: Virtuous Cycle Integrity Check
Review the funnel conversion rates collected in Step 1. Map them to the four links:
- **Acquisition → Activation:** is the aha moment conversion rate stable or improving?
- **Activation → Retention:** are retained users returning at the expected frequency?
- **Retention → Monetization:** are retained users converting to paid at the expected rate?
- **Monetization → Referral:** are paying users generating referrals or organic word of mouth?
Flag any link where the conversion rate has declined more than 10% over the past quarter.
**Why:** The virtuous cycle is the structural protection against long-term stalls. Facebook's growth team sustained compounding growth for a decade by continuously reinforcing every link in the loop. A stall in acquisition can be caused by a broken referral link — the feedback loop depends on every stage functioning. If a single link is broken, the compounding effect degrades even if the team is running experiments at high tempo. The virtuous cycle check catches funnel decay that channel metrics and NSM trend may not surface immediately.
---
### Step 7: Prescribe Recovery Actions
Based on the diagnosis, prescribe concrete actions matched to the pattern(s). Each action must include who owns it and when it should show measurable progress.
**For Pattern A (Complacency):**
- Set a minimum weekly experiment floor (e.g., 2–3 experiments/week for a team of 3–5; adjust for team size)
- Schedule a backlog rebuild session within the next two weeks; target 50+ ideas in the backlog
- Add "current experiment count vs. floor" as a standing agenda item in the weekly growth meeting
- Expected signal: cadence should recover within 2 weeks; NSM slope should show improvement within 6–8 weeks
**For Pattern B (Premature Lever Exit):**
- Identify the top 3 performing levers from the prior two quarters
- For each, generate at least 3 follow-on experiment ideas (deeper optimization, new contexts, larger scope)
- Slot these into the experiment backlog before any new-territory explorations
- Add a "doubling down check" to experiment prioritization: before testing a new idea, confirm the best prior wins have been fully exploited
- Expected signal: winning lever experiments should show continued lift within 4–6 weeks
**For Pattern C (Channel Dependency):**
- Immediately begin parallel channel experimentation (do not wait for the dominant channel to degrade)
- Re-invoke `acquisition-channel-selection-scorer` with the current channel mix as the baseline
- Allocate a defined discovery budget (10–15% of acquisition spend) to 2 new channel candidates
- Set a 90-day milestone: reduce single-channel concentration below 60%
- Expected signal: new channel experiments running within 2 weeks; concentration ratio improving by next quarterly audit
**For broken virtuous cycle links:**
- Flag the degraded link to the relevant functional owner (product for activation/retention, monetization lead for conversion, referral loop owner)
- Schedule a focused experiment sprint on the degraded link within 30 days
- Reference `retention-phase-intervention-selector` for retention link failures; `monetization-experiment-planner` for monetization link failures
---
### Step 8: Emit the Audit Report
Write `growth-stall-risk-audit.md` with the following structure:
```
# Growth Stall Risk Audit — [Quarter] [Year]
## Risk Score
[Low / Medium / High] — [one sentence summary]
## NSM Trend
[Healthy / Decelerating / Stalled] — [slope data]
## Channel Concentration
[Flag: Yes/No] — [top channel %, quarter-over-quarter shift]
## Experiment Cadence
[Healthy / Declining / Critical] — [tests/week, quarter-over-quarter change]
## Virtuous Cycle Integrity
[All links healthy / Degraded links: X, Y] — [conversion rates]
## Diagnosed Pattern(s)
[None / Pattern A / Pattern B / Pattern C / Combination]
## Evidence
[Specific data points supporting the diagnosis]
## Recovery Actions
[Numbered list, owner, timeline, expected signal]
## Next Audit
[Date — one quarter out]
```
---
## Key Principles
**Stalls are preventable but hard to reverse once established.** The HBR study showed that 74% of market cap is lost in the decade surrounding a stall — not just the year of. The asymmetry between early detection and late-stage recovery is extreme. A quarterly audit costs hours; a full stall recovery costs months and organizational trust.
**87% of companies experience a stall — yours is statistically likely.** The question is not whether a stall will happen but whether it will be caught early. Companies that treat stalls as an "other companies" problem are the most vulnerable.
**Channel dependency feels like success until it doesn't.** Viddy's collapse from 50 million to under 500,000 users in a single quarter was not a surprise in hindsight — the dependency was visible in the metrics. The team just wasn't auditing for it.
**Cadence decay is a leading indicator.** Experiment volume drops before the NSM drops. Tracking cadence directly is the earliest available signal. If the team ran fewer than 10 experiments last quarter without a deliberate reason, the stall has already begun structurally even if growth still looks healthy on the dashboard.
**The virtuous cycle is the protection — feed every loop.** Acquisition-only focus creates a leaky bucket. The companies that sustain growth for 10+ years (Facebook is the book's primary case) reinforce every link: activation, retention, monetization, and referral, continuously and in parallel. A broken referral link is a broken acquisition multiplier.
**Run this skill quarterly even when growth looks healthy.** The most dangerous stalls follow periods of strong growth. The team with the least urgency is usually closest to a plateau. Schedule the audit before it feels necessary.
---
## Examples
### Example 1: Channel Dependency Stall (Viddy Pattern)
A mobile productivity startup grew from 100K to 2M monthly actives over 18 months, primarily through Facebook paid acquisition. NSM trend was healthy — the team was proud of their growth rate. Quarterly audit revealed: 72% of new acquisition from Facebook ads, 15% from App Store organic, 13% from all other sources combined. The team had never run experiments on organic channels because paid was working so well.
The audit diagnosed Pattern C (Channel Dependency). Recovery actions: (1) allocated 15% of paid budget to Apple Search Ads experiments; (2) re-ran `acquisition-channel-selection-scorer` to score SEO, content partnerships, and referral; (3) set a 90-day target to reduce Facebook concentration below 60%. Six months later, Facebook share was 51%, App Store organic was 28%, and referral had grown to 12% — a structurally healthier distribution, and the startup survived a subsequent Facebook algorithm change that dropped similar competitors' acquisition by 40%.
### Example 2: Cadence Decay Stall (GrowthHackers Pattern)
A B2B SaaS company had strong product-market fit and had grown ARR 3x in its first year. The growth team of four people slowed their experiment pace after the initial push — Q1 had 24 experiments; Q2 had 11; Q3 had 6. NSM (weekly active teams) was still growing slightly, but the slope had decelerated from 12% month-over-month to 4%. No one on the team had noticed the cadence drop.
The quarterly audit flagged Pattern A (Complacency) and Pattern B (Premature Lever Exit). The email onboarding sequence had been the highest-performing lever in Q1 but had received no follow-on experiments in six months. Recovery: (1) set a cadence floor of 2 experiments/week; (2) rebuilt the backlog from 15 to 80 ideas in a single team session; (3) ran 4 follow-on experiments on the email onboarding sequence in the next 6 weeks, producing a 31% lift in 30-day retention. NSM growth rate returned to 9% month-over-month by end of quarter.
---
## Audit Limitations and Edge Cases
**Fewer than 3 months of data:** The NSM slope calculation requires at least 3 months (12 weeks) to be meaningful. With less data, report the NSM trend as "insufficient history" and focus the audit on channel concentration and cadence, which are assessable with a single quarter of data.
**Deliberate cadence pauses:** A team that intentionally paused experiments for a product rebuild or major launch is not experiencing cadence decay — it is making a strategic trade-off. Audit output should note the pause reason and confirm a resumption date rather than diagnosing Pattern A.
**Pre-revenue or pre-retention programs:** This audit assumes all four virtuous cycle links are active. If monetization or referral loops have not yet been built, limit the virtuous cycle check to the available links and note the gap as a planned future audit scope.
**Multiple patterns simultaneously:** The most common finding at 18+ months is a combination of Patterns A and B — the team slowed down *and* failed to double down on proven levers. Prescribe recovery actions for both; do not force a single-pattern diagnosis.
**No experiment log available:** If the team has no experiment log, the cadence analysis relies on self-report. Accept the estimate but recommend that the team begin maintaining a simple experiment log (experiment name, date, result) as a prerequisite for the next quarterly audit. Without it, cadence is blind.
---
## References
- `references/growth-stall-statistics.md` — HBR study: Olson, Van Bever, Verry (CEB); 87% stall rate; 74% market cap loss; Levi Strauss case ($7B → $4.6B); additional named brands (3M, Apple, Caterpillar, Toys "R" Us, Volvo)
- `references/virtuous-growth-cycle.md` — The acquisition → activation → retention → monetization → referral compounding loop; Facebook 10-year case as the primary sustained-growth exemplar
- `references/channel-concentration-thresholds.md` — The >60% single-channel concentration decision rule; Viddy case (50M → <500K after Facebook algorithm change); Upworthy/BuzzFeed News Feed dependency; Google ranking algorithm as analogous SEO risk
- `references/experiment-cadence-baselines.md` — GrowthHackers.com recovery case: <10 tests/quarter → 3/week floor → 76% traffic increase in one quarter; cadence floor calibration by team size
---
## License
[CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) — Skills from the BookForge library. Share alike with attribution.
---
## Related BookForge Skills
This skill sits on top of the growth operating system and audits its health. For recovery actions, reinvoke the relevant operating skills:
```bash
# NSM trend analysis — provides the north star metric this skill audits
clawhub install bookforge-north-star-metric-selector
# Channel recovery — re-score and prioritize new acquisition channels
clawhub install bookforge-acquisition-channel-selection-scorer
# Cadence recovery — re-prioritize experiments and rebuild the backlog
clawhub install bookforge-growth-experiment-prioritization-scorer
# Cadence floor — reset the experiment cycle to a minimum weekly tempo
clawhub install bookforge-high-tempo-experiment-cycle
# Retention link failures — diagnose and intervene on broken retention
clawhub install bookforge-retention-phase-intervention-selector
# Monetization link failures — experiment on the monetization funnel link
clawhub install bookforge-monetization-experiment-planner
```
Use this skill to score and rank a growth experiment backlog using the ICE framework (Impact, Confidence, Ease — each rated 1–10 and averaged) and select the...
---
name: growth-experiment-prioritization-scorer
description: "Use this skill to score and rank a growth experiment backlog using the ICE framework (Impact, Confidence, Ease — each rated 1–10 and averaged) and select the top experiments for the next sprint. Reads an experiment-backlog.md file, applies the ICE rubric with explicit definitions for each dimension, ranks all experiments, separates 'launch now' from 'pipeline with target date' from 'drop', and emits a scored backlog the team can bring to their weekly growth review. Triggers when a growth PM asks 'help me prioritize my experiment ideas', 'which test should I run next', 'ICE score my backlog', 'rank these growth experiments', 'how do I pick experiments for this sprint', 'impact confidence ease', 'ICE framework', 'growth experiment prioritization', or 'I have 40 test ideas, which ones first'. Also activates for 'score this backlog', 'my experiment queue is a mess', 'we keep running low-impact tests', 'growth experiment ranking', or 'how to rank A/B test ideas'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/growth-experiment-prioritization-scorer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [4]
tags:
- growth
- experimentation
- prioritization
- ICE-scoring
- startup-ops
depends-on:
- north-star-metric-selector
- high-tempo-experiment-cycle
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Experiment backlog (experiment-backlog.md) as a markdown list of ideas
with hypothesis, metric, and optional effort estimate. North Star metric
from prior invocation of north-star-metric-selector, or asked from user
if not available.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set. Reads a backlog file and emits a scored/ranked version.
discovery:
goal: >
Produce a ranked experiment-scored-backlog.md with ICE scores and a clear
recommendation for the next sprint.
tasks:
- "Read experiment backlog file"
- "Confirm or gather the team's North Star metric"
- "Score each experiment on Impact, Confidence, Ease (1-10)"
- "Compute average and rank"
- "Split into 'launch now', 'pipeline with date', and 'drop'"
- "Emit experiment-scored-backlog.md"
---
# Growth Experiment Prioritization Scorer
Score and rank a growth experiment backlog using the ICE framework, then select the top experiments for the next sprint. The output is a structured scored backlog the team can bring directly to their weekly growth review meeting and begin acting on.
ICE — Impact, Confidence, Ease — was developed at GrowthHackers.com as a way to convert a pile of unordered experiment ideas into a ranked, actionable queue. The scoring is deliberately simple: three dimensions, a 1–10 scale each, averaged to one number. The value is not precision — it is shared calibration. A team that scores ideas against a common rubric before the meeting arrives at a defensible ranking in minutes rather than spending the full meeting debating which idea to try first.
## When to Use
Use this skill when:
- You have a backlog of growth experiment ideas (5–100+) and need to decide which to run in the next 1–2 week sprint
- Your team keeps gravitating toward familiar or easy tests rather than the highest-impact options
- Experiment selection is happening in the weekly meeting by whoever argues most persuasively, rather than by a pre-meeting scoring process
- You are entering a new growth focus area (acquisition, activation, retention, monetization) and need to groom the idea bank for that area
- You have just used `high-tempo-experiment-cycle` to install a weekly cadence and need to score the initial backlog before the first growth meeting
Prerequisites before using this skill:
- You have a confirmed North Star metric (from `north-star-metric-selector` or from the team's existing documentation). Impact scoring is meaningless without a shared reference metric.
- The backlog exists as a document with at least a name and hypothesis per idea. Vague ideas ("improve onboarding") cannot be scored — they must be made specific first.
---
## Context and Input Gathering
Before scoring, collect two inputs.
**Input 1 — Experiment backlog file**
Ask the user: "Please share the path to your `experiment-backlog.md` file, or paste the backlog content directly." Read the file. Each idea in the backlog should have:
- A name (max 50 characters is the book's recommendation — force brevity)
- A hypothesis: "By doing X, metric Y will change by Z"
- A target metric (which metric this experiment directly affects)
- An optional effort estimate (days / team members required)
If ideas are missing a hypothesis, flag them. You can score ideas without an effort estimate (Ease will be estimated), but you cannot score ideas without a hypothesis — without knowing what the experiment is supposed to change, Impact and Confidence scores are fabricated.
**Input 2 — North Star metric**
Ask: "What is your team's current North Star metric and focus area (e.g., 'weekly revenue per active user, focused on activation this sprint')?" If the user ran `north-star-metric-selector`, pull the NSM from `north-star-recommendation.md`. If unavailable, ask directly. Do not proceed to scoring without this — Impact scores are defined relative to the NSM.
---
## Process
### Step 1: Read the Backlog
Read `experiment-backlog.md` in full. Count the ideas. Note:
- Which ideas have a complete hypothesis (name + cause-and-effect hypothesis + target metric)
- Which ideas are vague or missing a hypothesis — flag these as "needs clarification before scoring"
- Which ideas target the current focus area vs. other growth levers (e.g., retention ideas in an activation sprint)
**Why this step:** Scoring a vague idea produces a fake ICE score. An idea named "improve checkout" with no hypothesis could mean anything — a copy tweak (days of work) or a full redesign (weeks). Flagging incomplete ideas before scoring forces the submitter to sharpen the idea, which is itself valuable and usually surfaces the key assumption the experiment is testing.
### Step 2: Confirm the North Star Metric
State the confirmed NSM explicitly: "NSM for this sprint: [metric name], defined as [precise definition with time dimension]."
**Why this step:** Impact scores are always relative to the NSM. "10" means "expected to move the NSM meaningfully in the next sprint." Without this anchor, a "9" from one team member means something completely different from a "9" from another — the scores are not comparable, and the ranking is not useful.
### Step 3: Score Impact (1–10)
For each idea, assign an Impact score using this rubric:
| Score | Meaning |
|-------|---------|
| 9–10 | Expected to produce a large, direct movement in the NSM. The mechanism is clear and well-grounded. |
| 7–8 | Expected to produce a meaningful, measurable improvement in an NSM input metric. Direct NSM effect is plausible but not certain. |
| 5–6 | Expected to improve a metric adjacent to the NSM. Effect may be indirect or conditional. |
| 3–4 | Narrow improvement — affects a small user segment or a metric weakly connected to the NSM. |
| 1–2 | No credible mechanism to move the NSM or any meaningful input metric. May produce a local improvement that does not compound. |
Key calibration rule: a low-impact score is not a reason to discard an idea — it is a reason to pair it with high Ease or reconsider what it is actually testing. A quick, low-cost idea with modest impact still belongs in the queue; it just runs after the high-impact work.
**Why this step:** Impact calibration is the most common ICE failure mode. Teams inflate impact scores ("everything is an 8!") because they want their ideas to win, not because they have a clear mechanism for NSM movement. Anchoring to the NSM rubric forces the question: "What specifically will change in the NSM if this test wins?"
### Step 4: Score Confidence (1–10)
For each idea, assign a Confidence score using this evidence ladder:
| Score | Evidence Base |
|-------|--------------|
| 9–10 | Strong prior evidence: internal data showing user behavior consistent with the hypothesis, plus a published case study or industry benchmark from a comparable company. Direct iteration of a past winning experiment at this company ("doubling down"). |
| 7–8 | Solid evidence: internal behavioral data supports the hypothesis, or a well-documented case study from a comparable product type. Team has run a related (not identical) experiment with positive results. |
| 5–6 | Moderate evidence: user survey data or interview data suggests the problem exists. No direct experimental evidence, but logic is sound and the mechanism is plausible. |
| 3–4 | Weak evidence: hypothesis is based primarily on team intuition or general industry patterns. No internal data support. May be correct, but the submitter is guessing. |
| 1–2 | Pure conjecture: no data, no precedent, no user evidence. The idea is speculative. May still be worth testing if Ease is high. |
**Why this step:** Confidence rewards epistemic honesty. A submitter who scores their own idea at 9 on confidence is claiming they have strong evidence — that is a testable claim the growth lead can check. Teams that cannot differentiate a data-backed hypothesis from a gut feeling are systematically over-investing in weak ideas. The evidence ladder gives submitters a shared vocabulary for calibrating uncertainty.
### Step 5: Score Ease (1–10)
For each idea, assign an Ease score using this rubric:
| Score | Effort Level |
|-------|-------------|
| 9–10 | Less than 1 day, single person, no dependencies. Copy change, image swap, minor UI adjustment. Can be shipped and measured within the current sprint. |
| 7–8 | 1–3 days, one or two people, minimal coordination. Small feature flag, minor backend change, email copy variant. |
| 5–6 | 3–7 days, involves a second function (e.g., engineering + design). Can probably be launched in the current sprint if started immediately. |
| 3–4 | 1–2 weeks, cross-team coordination required (e.g., engineering + product + legal review). Target the sprint after this one. |
| 1–2 | More than 2 weeks, significant product investment or cross-functional coordination. Requires a scheduled target date, not a sprint slot. |
Penalty rule for cross-team dependencies: if an experiment requires sign-off or execution from a team that is not on the growth team (e.g., a feature that requires a full product sprint, or an email campaign that requires legal review), reduce the Ease score by at least 2 points. Cross-team dependencies predictably slow experiments beyond the initial estimate.
**Why this step:** Ease provides the cycle's pacing mechanism. Without it, teams fill the sprint with high-impact, low-ease experiments that never finish and crowd out the learning the cycle needs to compound. Ease also surfaces low-hanging fruit — quick tests that might surprise the team, the way a newsletter form relocation at GrowthHackers produced a 700% lift despite a low initial impact estimate.
### Step 6: Compute Average and Rank
For each idea: ICE Score = (Impact + Confidence + Ease) / 3. Round to one decimal place.
Sort all ideas by ICE score, highest to lowest. This is the ranked backlog.
**Why this step:** Averaging — not summing — means the three dimensions are equally weighted and no single dimension can dominate. A 10/10/1 idea scores 7.0; a 7/7/7 idea also scores 7.0. The parity is intentional: an extremely high-impact idea that is nearly impossible to ship should not outrank a solid, executable idea just because the impact score is higher.
Produce a table in this format:
```
| Rank | Experiment Name | Impact | Confidence | Ease | ICE Avg | Status |
|------|-----------------|--------|------------|------|---------|--------|
| 1 | [Name] | 8 | 7 | 9 | 8.0 | Launch now |
| 2 | [Name] | 9 | 8 | 4 | 7.0 | Pipeline: [date] |
| ... | ... | ... | ... | ... | ... | ... |
```
### Step 7: Split into Launch / Pipeline / Drop
After ranking, apply the triage:
**Launch now** — Ideas the team should select for the current sprint. Criteria: ICE average high enough to justify the team's time relative to other ideas in the queue, and Ease score high enough to be completable within one sprint (Ease ≥ 5 as a default threshold, adjustable by the growth lead). The number of "launch now" ideas should match team capacity — a 4-person team running 2 experiments per week selects 2–4 ideas; a 12-person team running 8 per week selects 8–12.
**Pipeline with target date** — Ideas that are worth running but cannot launch in the current sprint. This includes:
- High-ICE ideas with low Ease (≤ 4) that require engineering scope estimation before a sprint slot can be assigned
- Ideas that are solid but ranked below the sprint capacity cutoff
- Ideas that target a different focus area than this sprint's priority
For each pipeline idea, assign a target sprint date using input from the relevant team (engineers estimate product work; marketers estimate channel experiments). Record the date in the scored backlog.
**Drop** — Ideas with ICE average below a threshold (default: ≤ 3.5) that lack a clear mechanism for NSM movement and have no strong evidence base. Before dropping, verify: is the low score due to a poorly written hypothesis that could be improved? If so, return it to the submitter with feedback. If the idea is genuinely low-value, mark it as dropped with a one-sentence reason.
**Why this step:** The pipeline is the team's most important asset. An empty pipeline means the weekly meeting runs out of candidates. But a pipeline filled with vague, undifferentiated ideas is equally useless — it creates the illusion of a full queue while the real work is selecting from noise. The three-bucket split forces explicit decisions rather than allowing ideas to accumulate indefinitely without a commitment.
### Step 8: Emit experiment-scored-backlog.md
Write `experiment-scored-backlog.md` with the following structure:
1. **Header** — sprint date, North Star metric, focus area, total ideas scored
2. **Launch now** — ranked table for the current sprint, with owner field (blank, to be assigned in the growth meeting)
3. **Pipeline** — ranked table with target sprint dates
4. **Dropped** — list with reason per idea
5. **Flagged for clarification** — ideas that could not be scored because they lack a hypothesis, with specific question to ask the submitter
---
## Key Principles
1. **Impact is always relative to the North Star metric.** There is no such thing as an impact score without a reference metric. An idea that scores 9 on impact for a retention-focused sprint would score 4 in an acquisition-focused sprint. Before scoring, confirm the NSM and the current focus area — every Impact score is anchored there.
2. **Confidence rewards evidence over enthusiasm.** Submitters naturally overestimate confidence in their own ideas. The evidence ladder exists to make the claim testable: "You scored this a 9 — show me the internal data and the case study you're referencing." If the submitter cannot produce evidence matching the ladder, the score should come down.
3. **Ease penalizes cross-team dependencies because they predictably slow the cycle.** An experiment that requires sign-off from a team outside the growth team almost always takes longer than estimated. The 2-point penalty is a calibration correction for the team's natural optimism about coordination costs.
4. **ICE is deliberately blunt — the goal is a relative ranking, not an absolute forecast.** A score of 7.3 vs. 7.1 is not meaningful. A score of 8.5 vs. 5.2 is. The system's value is in separating the top tier from the middle tier from the bottom tier, not in creating a precise numerical ranking. When two ideas are close, the growth lead uses judgment — the score is the starting point for the selection conversation, not the final word.
5. **Run the scoring before the meeting, not in it.** ICE scoring done synchronously in the weekly growth meeting burns the entire meeting on a task that can be done asynchronously. The scored backlog is prep work. The meeting uses the ranked list to make the selection decision in 15 minutes.
6. **The lowest-scoring ideas can still be the biggest winners.** The 700% newsletter lift came from a form relocation that scored 4 on Impact. Score informs selection order, not selection cutoff. If an idea is quick to run and the downside of being wrong is minimal, run it — the cost of not learning is real.
---
## Examples
### Example 1: Series A — 20-Idea Backlog, Activation Sprint
A B2B SaaS team of 5 (growth PM, engineer, designer, marketer, data analyst) has 20 experiment ideas targeting activation — getting new trial users to the "aha moment" (≥3 team members using a shared project within 7 days). North Star: weekly collaborative sessions per team.
The scored backlog (abbreviated) looks like:
| Rank | Experiment | Impact | Confidence | Ease | ICE | Status |
|------|------------|--------|------------|------|-----|--------|
| 1 | Add progress bar to team setup wizard | 8 | 7 | 9 | 8.0 | Launch now |
| 2 | In-app prompt to invite 2nd team member at step 3 | 8 | 8 | 7 | 7.7 | Launch now |
| 3 | Email sequence: days 1, 3, 7 post-signup | 7 | 8 | 6 | 7.0 | Launch now |
| 4 | Rebuild onboarding checklist UI | 9 | 6 | 3 | 6.0 | Pipeline: next sprint |
| 5 | Integration with Slack for task notifications | 8 | 5 | 2 | 5.0 | Pipeline: 3 weeks (eng scoping) |
| ... | ... | ... | ... | ... | ... | ... |
| 19 | Add mascot to empty state screens | 3 | 4 | 8 | 5.0 | Drop |
| 20 | Redesign marketing site hero | 4 | 3 | 2 | 3.0 | Drop |
**Sprint selection:** The growth PM selects experiments 1, 2, and 3 for the sprint (matching the team's capacity of 2–3 tests per week). Experiment 4 is slotted for the following sprint pending design mockups. Experiment 5 goes to engineering for a scope estimate. Experiments 19 and 20 are dropped — neither has a clear mechanism to increase collaborative sessions, which is the current NSM.
### Example 2: Series B — 50-Idea Backlog, Retention Sprint
A marketplace team of 12 (two growth squads: acquisition + retention) needs to groom 50 ideas for a retention sprint targeting 90-day repeat purchase rate. North Star: orders per buyer per quarter.
**Scoring challenge:** The retention squad notices that 15 of the 50 ideas are acquisition ideas submitted by the other squad — they score well on Impact but target a different focus area. The growth lead moves all 15 to a separate acquisition backlog without scoring them in the retention sprint.
The remaining 35 retention ideas are scored. The top 8 (ICE ≥ 6.5) are selected for the sprint — 4 per squad per week. Ideas ranked 9–20 (ICE 4.5–6.4) are pipelined with dates. Ideas 21–35 (ICE ≤ 4.4) are reviewed for hypothesis quality: 5 are returned to submitters for clarification; 10 are dropped.
**Key finding from the scoring:** Three ideas that the retention squad assumed were quick (scoring themselves 8 on Ease) were recalibrated to Ease 4 by the growth lead after consulting the engineering team — all three required A/B test infrastructure that was not yet in place. Moving them to the pipeline freed up sprint slots for two mid-ranked ideas (ICE 6.2) that were genuinely fast to ship. The sprint launched on schedule.
---
## References
- `research/growth-experiment-prioritization-scorer.md` — source passages from Chapter 4 on ICE scoring, evidence definitions, and pipeline management
- `references/ice-scoring-guide.md` — worked scoring examples by experiment type
- `orchestration/specs/skill-spec.md` — BookForge skill authoring standards
---
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — *Hacking Growth* by Sean Ellis and Morgan Brown.
---
## Related BookForge Skills
This skill is the Prioritize stage of the high-tempo experiment cycle and consumes the North Star metric produced upstream:
```
# Required upstream — NSM must be confirmed before scoring
clawhub install bookforge-north-star-metric-selector
# The cycle this skill plugs into — Prioritize is Stage 3
clawhub install bookforge-high-tempo-experiment-cycle
# Feeds activation experiment ideas into the backlog
clawhub install bookforge-activation-funnel-diagnostic
# Feeds retention experiment ideas into the backlog
clawhub install bookforge-retention-phase-intervention-selector
```
Browse more: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
Use this skill to diagnose where in an activation funnel users drop off and decide between removing friction or adding 'positive friction' (guided steps) to...
---
name: activation-funnel-diagnostic
description: "Use this skill to diagnose where in an activation funnel users drop off and decide between removing friction or adding 'positive friction' (guided steps) to fix it. Maps the route from signup to the aha moment (first core-value experience), builds a channel-segmented funnel conversion report from metrics data, identifies the highest-drop-off step, interprets user survey data at drop-off points, and emits an activation-funnel-diagnosis.md plus a ranked list of activation experiment candidates. Triggers when a growth PM asks 'why are users signing up but not coming back?', 'our activation rate is terrible', 'where are users dropping off in onboarding?', 'activation funnel audit', 'users don't reach aha moment', 'onboarding diagnosis', 'NUX problems', 'first-run experience broken', 'how do I find friction', 'should I simplify or guide my onboarding?', or 'help me diagnose my activation'. Also activates for 'funnel analysis', 'drop-off diagnosis', 'desire friction conversion', 'magic moment audit', 'Twitter 30 follows pattern', or 'Facebook 7 friends rule'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/activation-funnel-diagnostic
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [6]
tags:
- growth
- activation
- onboarding
- funnel-analysis
- startup-ops
depends-on:
- north-star-metric-selector
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Funnel metrics (funnel-metrics.csv) with step-by-step conversion counts.
Activation flow doc (activation-flow.md) describing the current onboarding
step-by-step. Optional: survey-responses.md with user feedback from
drop-off points.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set + CSV data. Reads metrics and flow docs, produces diagnosis
and experiment candidates as markdown.
discovery:
goal: >
Produce activation-funnel-diagnosis.md that names the highest-drop-off step,
explains WHY users drop there, and recommends remove-friction vs add-positive-friction
interventions with a ranked experiment list.
tasks:
- "Confirm the aha moment (first core-value experience)"
- "Read funnel metrics CSV"
- "Read activation flow doc"
- "Build channel-segmented funnel conversion table"
- "Identify highest drop-off step"
- "Interpret why users drop (from survey data or infer)"
- "Decide remove-friction vs add-positive-friction"
- "Generate ranked experiment candidates"
- "Emit diagnosis and experiment backlog"
---
# Activation Funnel Diagnostic
## When to Use
Use this skill when users are signing up but not coming back — the classic activation gap. Specifically run it when:
- Activation rate is unknown or known to be poor (industry baseline: 98% of website traffic never activates; up to 80% of mobile users churn within three days of install)
- Users reach signup but do not complete the first meaningful action
- You have step-level funnel metrics and want to know which step is bleeding users
- You are unsure whether to simplify onboarding (remove friction) or add guided steps (positive friction)
- Retention is suffering because users never experienced core value in the first session
**Prerequisite:** The aha moment must be defined. If it is not, run `north-star-metric-selector` first — the aha moment is the activation target, and diagnosing a funnel without knowing its destination produces useless results.
---
## Context and Input Gathering
Before starting, confirm you have or can locate:
| Input | Required | Expected Format |
|---|---|---|
| `funnel-metrics.csv` | Required | Columns: step_name, users_entered, users_completed, channel (optional) |
| `activation-flow.md` | Required | Prose or numbered list describing each onboarding step |
| `survey-responses.md` | Optional | User verbatim responses at drop-off points, or email/interview notes |
| Aha moment definition | Required | One sentence: the moment users first experience core product value |
If the aha moment is not confirmed, ask: "What is the single action or outcome that makes this product feel indispensable to a new user?" Do not proceed until you have an answer.
---
## Process
### Step 1: Confirm the aha moment
Ask the growth PM to state the aha moment in one sentence. If they cannot, surface a working hypothesis from the activation flow doc ("completing first X" or "seeing first Y") and ask them to confirm or correct it.
**Why:** The aha moment is the activation target — every funnel step is evaluated by how well it moves users toward that moment. Optimizing a funnel without a defined endpoint means you may be improving steps that lead nowhere near core value. The aha moment is defined through research; it is never assumed.
Typical aha moment patterns:
- SaaS tools: running a first meaningful task (first survey sent, first report generated, first deployment)
- Social products: connecting with enough people to see a relevant feed (Twitter: follow accounts across topics; Facebook: find and connect with friends)
- Marketplaces: completing first transaction on both sides
- Consumer apps: receiving a tangible result (order delivered, recommendation acted on)
---
### Step 2: Read the funnel metrics CSV
Open `funnel-metrics.csv`. Confirm columns are present: `step_name`, `users_entered`, `users_completed`. The `channel` column is optional but critical if present.
Compute for each step:
```
conversion_rate = users_completed / users_entered × 100
drop_off_count = users_entered - users_completed
drop_off_rate = 1 - conversion_rate
```
Flag any step where `drop_off_rate > 0.40` (over 40% of entering users do not complete the step) as a high-priority investigation point.
**Why:** Raw user counts obscure the conversion shape. Computing rates per step reveals where the funnel narrows most sharply. The highest drop-off step — not the first step, not the last — is the highest-leverage point for experimentation. Treating all steps equally wastes experiment budget on low-impact changes.
---
### Step 3: Read the activation flow doc
Read `activation-flow.md`. For each step in the funnel metrics, map it to the corresponding description in the flow doc. Note:
- Steps requiring user input (forms, uploads, searches)
- Steps requiring understanding of an unfamiliar concept
- Steps where the product's value is not yet visible to the user
- Steps that could be deferred or reordered
**Why:** Funnel data shows where users drop off; the flow doc shows what they are being asked to do at that point. The combination reveals the gap between what the product asks and what users are willing to do. A high drop-off rate on a "create account" step means something different than a high drop-off on "configure your first workflow" — the flow doc supplies the context that the CSV cannot.
---
### Step 4: Build the channel-segmented funnel conversion table
Construct a markdown table with steps as rows. If the `channel` column exists in the CSV, add columns for each channel. Compute per-channel conversion rates for each step.
```
| Step | Overall Conv% | Organic | Paid | Referral | Social |
|-------------------|---------------|---------|------|----------|--------|
| App download | 100% | 100% | 100% | 100% | 100% |
| Account created | 68% | 74% | 51% | 81% | 62% |
| First action | 41% | 48% | 29% | 57% | 38% |
| Aha moment | 23% | 31% | 14% | 38% | 21% |
```
Flag any channel-step combination where conversion is 2× worse than the channel average for that step. These are broken-channel signals.
**Why:** Averaging across channels hides broken acquisition paths. A paid channel that converts at half the rate of organic at the "first action" step indicates a language or expectation mismatch — the ad promised something the onboarding does not deliver. Fixing the onboarding for everyone does not solve a channel-specific mismatch; it dilutes the fix. Channel segmentation before diagnosis is not optional.
---
### Step 5: Identify the highest drop-off step
Name the single step with the highest absolute `drop_off_count`. This is the primary intervention target. If two steps are close, pick the one earlier in the funnel — fixing it compounds downstream.
State it explicitly:
- Step name
- Users entering vs. users completing
- Drop-off count and rate
- Position relative to aha moment (how many steps before core value?)
**Why:** The highest drop-off step represents the most users who gave up before experiencing the product's core value. Every user who drops here is a user the acquisition spend paid to reach but failed to convert. This step is where the diagnosis concentrates.
---
### Step 6: Interpret why users drop
Use two sources in priority order:
**Source A — Survey data (if available).** Read `survey-responses.md`. Look for recurring themes: confusion about what to do next, missing information, unexpected requirements, unclear value, distrust, technical problems. Cluster responses by theme. Do not project your own assumptions onto them.
High-signal question patterns to look for in the data:
- "What's the one thing that nearly stopped you from completing?" (asked of completers — they know what almost stopped them)
- "Is there anything preventing you from signing up at this point?"
- "What were you hoping to find on this page?"
**Source B — Structural inference (if no survey data).** Examine the flow doc description of the high-drop-off step. Ask:
- Does this step require users to provide significant information before seeing any value?
- Does this step introduce a concept users may not understand (product-specific jargon, unfamiliar workflow)?
- Is it unclear what happens after this step?
- Does this step involve trust or commitment (payment info, contact info, integrations)?
- Is the product's value visible before this step, or has the user been asked to invest effort with no payoff yet?
**Why:** Funnel data is behavioral; it shows that users drop, not why. Survey data is the only direct source of the reasoning behind behavior. Inferring from flow structure is second-best but necessary when survey data does not exist. The book's clearest lesson from the HubSpot Sidekick case: teams that assumed they understood drop-off causes (poor product education) ran 11 failed experiments. The real cause (users needed a trigger to act, not more explanation) only emerged from deeper data analysis and user feedback.
---
### Step 7: Apply the friction decision rule
Apply this formula to evaluate the drop-off step:
```
DESIRE – FRICTION = CONVERSION RATE
```
- **DESIRE** = the strength of the user's want for the product at this step. Proxied by: channel quality, landing page messaging match, user segment fit.
- **FRICTION** = the sum of impediments between the user and completing the step. Includes: form length, required information the user may not have at hand, unclear instructions, technical barriers, unfamiliar concepts, trust gaps.
- **CONVERSION RATE** = the observed output.
**Diagnosis routes:**
**Route A — Remove friction.** Apply when:
- Survey data or structural analysis shows confusion, overwhelm, or unexpected requirements
- Users are asked for information before experiencing any value
- The step involves a standard action (login, form completion) that competes with simpler alternatives
- There is high desire but users are blocked
Remove-friction tactics: single sign-on (Facebook/Google/LinkedIn login); fewer required fields; deferred account creation (let users start using the product before signing up); pre-filling known information; clearer copy and error messages.
**Route B — Add positive friction.** Apply when:
- Users can technically proceed but would not understand the product's value on arrival
- The product requires users to adopt an unfamiliar concept or behavior
- Users arrive with low context about how to use the product
- Structured guidance would create psychological commitment (once users take small actions, they are more inclined to continue)
Positive friction tactics: a learn flow — guided steps that show users what the product does while getting them to take small actions (interest selection, profile setup, first content creation); progress indicators; questionnaires that both collect data and create commitment; gamification (missions, milestones, earned rewards) where the rewards have clear relevance to core value.
**The counterintuitive rule:** More steps in onboarding is not always worse. Pinterest's addition of a topic-selection screen increased activation 20%. Twitter's learn flow — which required new users to follow accounts and set up a profile before arriving at a feed — produced users with a live feed on first visit instead of an empty one. The question is never "how many steps?" but "does each step help users arrive at the aha moment with greater confidence and context?"
**Why:** DESIRE and FRICTION are independent variables. A product with strong desire (early adopters, strong referrals) can tolerate high friction — users push through. A product reaching mainstream users or users who came through a lower-intent channel needs low friction at the exact same steps. The formula makes the diagnostic explicit: if desire is high and conversion is still low, friction is the problem. If desire is low, adding guided steps to help users understand value is the fix — removing friction alone will not help users who do not yet see why they should complete the step.
---
### Step 8: Generate ranked experiment candidates
Produce a ranked list of 3–6 experiment candidates targeting the highest-drop-off step. Each entry includes:
- **Experiment name:** short, descriptive
- **Hypothesis:** "If we [change], then [users_completing_step] will increase because [reason]"
- **Intervention type:** remove-friction or add-positive-friction
- **Implementation effort:** low / medium / high
- **Expected signal speed:** how quickly the experiment will produce measurable results
Prioritize low-effort, fast-signal experiments first. A simple copy change or form-field removal can be tested in days; a full learn flow redesign cannot. Start small — the HubSpot Sidekick team ran 11 failed experiments before finding the trigger message that moved the needle.
**Why:** An experiment list without prioritization creates a queue that teams work through in arbitrary order. Low-effort experiments run faster, generate learnings sooner, and compound. If a low-effort fix solves the problem, the high-effort rebuild was never needed. Ranking by effort and signal speed is the minimum viable prioritization for activation experiments.
For full experiment scoring (ICE: Impact × Confidence × Ease), pass the candidates list to `growth-experiment-prioritization-scorer`.
---
### Step 9: Emit deliverables
Write two files:
**`activation-funnel-diagnosis.md`** — contains:
1. Aha moment (confirmed)
2. Channel-segmented funnel conversion table
3. Highest drop-off step: name, users lost, rate
4. Why users drop (evidence from surveys + structural analysis)
5. Friction decision: remove-friction or add-positive-friction, with reasoning
6. DESIRE–FRICTION diagnosis for the target step
**`activation-experiment-candidates.md`** — contains:
- Ranked list of 3–6 experiment candidates with hypothesis, type, effort, signal speed
- Link to `growth-experiment-prioritization-scorer` for ICE scoring
**Why:** Two separate files keep the diagnosis (what is wrong and why) distinct from the experiment backlog (what to try). The diagnosis is a durable artifact that explains the current state; the experiment list is a working backlog that will change as experiments run. Keeping them separate prevents the team from treating hypotheses as diagnoses before they are tested.
---
## Key Principles
1. **The aha moment is defined, not assumed.** Diagnosing an activation funnel without a clear aha moment is optimizing toward an undefined goal. The aha moment comes from product research (must-have surveys, qualitative interviews) — not from guessing the most impressive-looking step in the onboarding flow.
2. **Segment before optimizing — channel averages hide broken channels.** A 30% average activation rate across channels may be a 50% rate in organic and 15% in paid. Fixing the onboarding for everyone does not fix the paid channel. Segmentation is not a nice-to-have; it determines whether your interventions are targeted or scatter-shot.
3. **Remove vs. add friction is a diagnostic decision, not a preference.** "Simplify everything" is a default, not a diagnosis. Sometimes more steps improve activation by ensuring users arrive at the aha moment with context and commitment. The question is always: why is this step causing drop-off — confusion/blocking (remove friction) or lack of context/commitment (add positive friction)?
4. **Positive friction is counterintuitive and often correct for new-concept products.** If your product asks users to adopt a new behavior or understand a novel concept, stripping all onboarding steps will produce users who arrive at core functionality with no idea what to do. Guided steps that teach and commit simultaneously — as Twitter's learn flow demonstrated — can generate higher activation than minimal-friction raw product access.
5. **Survey completers, not just abandoners.** People who passed a difficult step know what nearly stopped them. "What's the one thing that nearly stopped you from completing?" asked at the order confirmation or activation screen consistently produces higher response rates and more actionable qualitative data than exit surveys of people who left.
6. **Triggers must be tested, not assumed helpful.** Push notifications and email reactivation messages are among the most powerful and most abused activation tools. Deploy them only when the rationale is clear value to the user (a sale on a saved item, a relevant feature alert) — not to inflate short-term engagement statistics. Ask for notification opt-in only after users have experienced enough value to understand why they would want the messages. Test trigger timing, frequency, and copy as experiments, not as settled design.
---
## Examples
### Example 1: SaaS Tool with Empty-State Problem
**Situation:** A B2B analytics tool has 1,200 users sign up per month. Only 180 (15%) reach the aha moment (generating a first report). The team has funnel metrics but no survey data.
**Process summary:**
1. Aha moment confirmed: "user generates and views first analytics report"
2. Funnel metrics read: account creation (78%), workspace setup (61%), first data connection (42%), first report generated (15%)
3. Activation flow read: "first data connection" requires users to input API credentials or upload a CSV — no sample data available
4. Channel-segmented table built: paid search channel drops from 61% to 28% at "first data connection"; organic drops to 48%
5. Highest drop-off: "first data connection" — 490 users lost, 42% completion rate; paid search 2.2× worse than organic
6. Structural inference (no survey data): users are asked to provide credentials before seeing any product output; the product looks empty until connected; users arriving from paid ads may have lower intent than organic
7. Friction decision: **add positive friction** — the product requires users to set up before experiencing value; offer a sandbox dataset so users can generate a sample report before connecting real data
8. Experiments ranked: (1) add sample dataset for demo report — low effort, fast signal; (2) add progress bar showing "one step away from your first report" — low effort; (3) simplify API credential input form — medium effort; (4) add short video showing a completed report at the connection step — medium effort
**Output:**
- `activation-funnel-diagnosis.md`: confirms empty-state as root cause, paid channel mismatch, positive-friction recommendation
- `activation-experiment-candidates.md`: 4 experiments ranked by effort
---
### Example 2: Consumer App with Mid-Funnel Drop
**Situation:** A recipe and grocery app has 8,000 weekly installs. Funnel: app open (100%), browse items (72%), add to cart (48%), enter payment info (31%), first purchase (19%). Team has exit survey responses from users who reached the cart but did not purchase.
**Process summary:**
1. Aha moment confirmed: "user receives first grocery order as expected"
2. Funnel metrics read: steepest absolute drop is "add to cart → payment info" — 17% of all installs lost (1,360 users/week)
3. Activation flow read: payment info step requires new credit card entry and delivery address; no saved defaults; no indication of delivery fee until checkout summary
4. Channel-segmented table built: referral channel activates at 38% vs. paid social at 11% — large gap at payment step
5. Survey data analysis: top cluster (41% of responses) — users did not know whether delivery was free; second cluster (28%) — users forgot their first-order discount code
6. Friction decision: **remove friction** — users want the product (browse and cart rates are solid); specific information gaps are causing abandonment, not lack of understanding
7. Experiments ranked: (1) display delivery fee and first-order discount code automatically on cart page — low effort, addresses top two survey clusters; (2) simplify payment form with single sign-on (Google Pay/Apple Pay) — medium effort; (3) add delivery fee estimate earlier (browse screen) — low effort
**Output:**
- `activation-funnel-diagnosis.md`: payment-step friction identified, two specific causes from survey data, remove-friction recommendation
- `activation-experiment-candidates.md`: 3 experiments ranked, first two directly address surveyed reasons for abandonment
---
## References
- `references/activation-concepts.md` — aha moment definition, DESIRE–FRICTION=CONVERSION formula, positive friction definition, NUX principles, BJ Fogg behavior model
- `references/case-studies.md` — HubSpot Sidekick segmentation case, Airbnb sign-up prompt experiments, Twitter learn flow, Pinterest topic-selection onboarding, Qualaroo 50-response tipping point
---
## License
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) — BookForge Skills
Source book: *Hacking Growth* by Sean Ellis and Morgan Brown. Skills distilled from book content under fair use for transformative educational purposes. See [BookForge copyright framework](https://github.com/bookforge-ai/bookforge/blob/main/docs/legal/copyright-framework.md).
---
## Related BookForge Skills
- `clawhub install bookforge-north-star-metric-selector` — defines the aha moment this skill uses as its activation target; run first if the aha moment is not confirmed
- `clawhub install bookforge-growth-experiment-prioritization-scorer` — apply ICE scoring (Impact × Confidence × Ease) to the experiment candidates this skill produces
- `clawhub install bookforge-retention-phase-intervention-selector` — initial retention is a continuation of activation; users who activated but do not return are a retention problem that begins at the activation boundary
- `clawhub install bookforge-product-market-fit-readiness-gate` — if activation rates are catastrophically low across all channels and positive-friction experiments fail, the product may not yet be must-have; this gate diagnoses whether product work should precede growth work
Use this skill to choose acquisition channels for a post-PMF product using the Balfour 6-factor scoring matrix (Cost, Targeting, Control, Input Time, Output...
---
name: acquisition-channel-selection-scorer
description: "Use this skill to choose acquisition channels for a post-PMF product using the Balfour 6-factor scoring matrix (Cost, Targeting, Control, Input Time, Output Time, Scale — each rated 1–10). First runs two diagnostic prerequisites: language/market fit (does your copy resonate?) and channel/product fit (do your channels match how your audience discovers products?). Then classifies candidate channels into viral/word-of-mouth, organic, and paid categories; scores each on the 6 factors; and recommends 2–3 channels for a Discovery phase with explicit graduation criteria to an Optimization phase. Triggers when a growth PM asks 'which acquisition channels should I test?', 'should I do Facebook ads or SEO?', 'help me pick growth channels', 'acquisition channel prioritization', 'channel/market fit', 'Balfour channel framework', 'our paid ads aren't working', 'which channels for B2B SaaS', 'which channels for e-commerce', 'which channels for consumer app', 'acquisition strategy', or 'how do I pick channels'. Also activates for 'we're spreading too thin across channels', 'single channel focus', 'channel diversification', 'Peter Thiel one channel', 'discovery phase channels', 'channel scoring matrix', or 'six factor channel framework'."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/hacking-growth/skills/acquisition-channel-selection-scorer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: published
source-books:
- id: hacking-growth
title: "Hacking Growth"
authors: ["Sean Ellis", "Morgan Brown"]
chapters: [5]
tags:
- growth
- acquisition
- marketing-channels
- channel-selection
- startup-ops
depends-on:
- north-star-metric-selector
- growth-experiment-prioritization-scorer
execution:
tier: 1
mode: hybrid
inputs:
- type: document
description: >
Product brief (product-brief.md) with business model and target audience.
Optional: channel-candidates.md listing channels the team is considering.
Optional: current-acquisition-data.csv with CAC/conversion by channel.
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: >
Document set + optional CSV. Produces channel-selection-matrix.md and
acquisition-fit-diagnosis.md.
discovery:
goal: >
Produce a defensible channel selection matrix and fit diagnosis so the team
can commit to 2–3 channels for a Discovery phase instead of spreading thin.
tasks:
- "Read product brief"
- "Run language/market fit diagnosis"
- "Run channel/product fit diagnosis"
- "Apply first-cut heuristic by business model"
- "Classify candidate channels (viral / organic / paid)"
- "Score each on 6-factor Balfour matrix"
- "Rank and recommend Discovery phase channels"
- "Define graduation criteria to Optimization phase"
- "Emit deliverables"
---
# Acquisition Channel Selection Scorer
Structured acquisition channel selection using the Balfour 6-factor matrix. Runs two
prerequisite fit diagnostics, applies a business-model first-cut to filter the channel
space, scores each candidate on six dimensions (Cost, Targeting, Control, Input Time,
Output Time, Scale), and recommends 2–3 channels for a time-boxed Discovery phase with
explicit graduation criteria to Optimization.
---
## When to Use
Use this skill when:
- You need to decide which 2–3 acquisition channels to test next
- Your team is debating paid ads vs. SEO vs. referral and needs a defensible rationale
- You are entering or restarting an acquisition phase after confirming product/market fit
- Your current channels feel expensive, unpredictable, or aren't producing results
- The team is spreading effort across too many channels and not optimizing any
**Prerequisites:**
- Product/market fit confirmed (must-have survey ≥ 40% "very disappointed" or stable
retention curve). If this is not confirmed, stop — no channel will fix a product
people don't need. See `north-star-metric-selector` for North Star setup.
- A defined North Star Metric. Channel selection is meaningless without knowing what
"acquisition success" means in terms of downstream value, not just installs or clicks.
---
## Context and Input Gathering
Before scoring, collect:
1. **Business model** — B2B SaaS, e-commerce, consumer app, marketplace, media/ad-revenue?
This determines the first-cut heuristic (Step 4).
2. **Target audience** — Who exactly is the ideal customer? What platforms, communities,
and search behaviors characterize them? Be specific: "developers at Series A startups
who use Slack daily" is actionable. "SMBs" is not.
3. **Budget and team capacity** — How much can be spent per channel experiment? How many
engineers and marketers can work on acquisition? High input-time channels need slack
in the schedule.
4. **Current channels** — What is the team already running? What is the current CAC and
conversion rate per channel? (Provide current-acquisition-data.csv if available.)
5. **Candidate channels** — Does the team already have a list of channels to consider?
(Provide channel-candidates.md if available.) If not, candidates will be proposed in
Step 4 based on business model.
If the product brief is missing any of these, surface the gaps as questions before
proceeding. Scoring a channel without knowing the business model is guessing.
---
## Process
### Step 1 — Read the Product Brief
Read `product-brief.md` (and `current-acquisition-data.csv` and `channel-candidates.md`
if provided).
Extract and confirm:
- Core product value proposition (one sentence)
- Business model type
- Target audience definition
- Current acquisition spend and results (if available)
- Team capacity constraints
**Why:** The 6-factor scores are not universal — they are product-specific. A channel
that scores 9 on Targeting for a B2B developer tool may score 4 for a mass-market
consumer app. Grounding in the brief before scoring prevents generic recommendations
that don't survive contact with the team's real constraints.
---
### Step 2 — Language/Market Fit Diagnosis
**Definition:** Language/market fit is how well the language used to describe and market
the product resonates with potential users and motivates them to try it. The term was
coined by growth practitioner James Currier. It covers all marketing copy: landing page
taglines, ad headlines, email subject lines, in-app feature descriptions, value
propositions — every string of text a prospective user encounters.
**Why diagnose this before scoring channels:** With average online attention spans of
roughly 8 seconds, if your copy doesn't immediately connect with a felt need, every
channel will look broken. Low CTR on ads, low landing page conversion, and high bounce
rates are often misdiagnosed as channel failures — they are language failures. Scoring
channels on top of bad copy wastes the scoring effort.
**Diagnosis questions:**
- Do you have conversion rate data from landing page A/B tests, email subject lines, or
ad copy variants? If yes, which variants won and by what margin?
- Does your current tagline describe the outcome users get, or what the product does?
(Users care about outcomes, not features.)
- Have you asked 5–10 users to describe the product in their own words? Does their
language match your copy, or is there a gap?
- Are your ads generating high impressions but low CTR? (Symptom of language-market
misfit — people see the ad but don't recognize the value.)
**Output:** Flag as GREEN (language fit established — proceed), YELLOW (partial fit —
note gaps that may skew channel test results), or RED (copy untested or clearly
misaligned — recommend a messaging sprint before scaling any channel).
---
### Step 3 — Channel/Product Fit Diagnosis
**Definition:** Channel/product fit is how well the selected distribution channels match
how the target audience actually discovers products like yours. It is distinct from
language/market fit: you can have perfectly resonant copy delivered through channels
your audience never uses, and see zero results. Both fits must hold.
**Why diagnose this separately:** Teams often default to channels they are familiar with
(Facebook ads, Google Search) regardless of whether their audience uses them. A B2B
developer tool team running Instagram ads is structurally misaligned, regardless of copy
quality. Channel/product fit diagnosis prevents structural errors before scoring.
**Diagnosis questions:**
- Where does your target audience currently discover products like yours? (Communities,
newsletters, conferences, search queries, social platforms — be specific.)
- Have existing customers told you how they found you? What are the top 3 sources?
- Are there channels where your product is already getting organic traction without
deliberate effort? (These are strong fit signals worth amplifying.)
- Are there channels you are running that produce impressions/clicks but no retention
after acquisition? (Symptom of channel/product misfit — reaching the wrong people.)
**Output:** Flag as GREEN (clear channel behaviors visible for audience), YELLOW
(partial data — proceed with caution), or RED (no behavioral data — recommend audience
research before committing to channels).
---
### Step 4 — First-Cut Heuristic by Business Model
Before running the full 6-factor scoring, eliminate channels that are structurally
incompatible with the business model. Scoring a structurally wrong channel wastes time
and pollutes the ranked list.
**Why:** Different business models have fundamentally different acquisition economics.
E-commerce runs on volume; B2B enterprise runs on relationship and trust; consumer apps
run on network effects and virality. A channel that is structurally optimal for one
model can be structurally wrong for another — no amount of optimization fixes a
structural mismatch.
**Apply the following first-cut by model type:**
| Business Model | Priority Channel Space | Rationale |
|---|---|---|
| B2B / Enterprise | Outbound sales, trade shows, content marketing (thought leadership), LinkedIn | Buyers require relationship-building, trust signals, and expertise validation before purchase |
| E-commerce | Paid search (SEM), SEO, retargeting, email/loyalty programs | Model depends on high-volume shopper traffic; search intent is the highest-signal entry point |
| Consumer app (viral potential) | Referral programs, social sharing, word-of-mouth, community | Network effects and instrumented virality lower CAC as user base grows |
| Marketplace (two-sided) | Split strategy: supply-side vs. demand-side channels; treat as two separate channel selection problems | Supply and demand have different acquisition needs |
| Media / ad-revenue | SEO, social content distribution, syndication, email newsletters | Revenue depends on attention volume; organic scale compounds better than paid |
After the first cut, you should have 5–8 candidate channels. If the team provided
`channel-candidates.md`, reconcile the list against this heuristic — add any that were
missed, and flag any that are structurally incompatible.
---
### Step 5 — Classify Candidate Channels
Classify each candidate into one of three mutually exclusive categories. This matters
because the categories have different cost structures, feedback loops, and time horizons
— you want at least one candidate from each category if possible, to avoid a portfolio
that is monolithic in risk profile.
**Why classify before scoring:** Classification reveals structural imbalances. A list
of seven candidates that are all paid channels will score well on Control and Output
Time but will have correlated CAC risk. One viral channel in the mix provides
diversification within the portfolio, not across the portfolio.
**The three categories:**
**Viral / Word-of-Mouth**
Distribution happens through the product itself or through users sharing it without
deliberate paid promotion. Examples: referral programs (Dropbox's free storage exchange,
PayPal's cash referral), product sharing features (Venmo's social feed, Hotmail's email
signature), community building (Slack's viral enterprise spread), instrumented virality
(invite flows, "powered by" attribution links). Key property: marginal cost of an
additional user approaches zero as the base grows.
**Organic**
Distribution happens through earned media, search presence, and content that continues
generating traffic long after creation. Examples: SEO, content marketing (blog posts,
ebooks, case studies, infographics, podcasts, webinars, video), public relations, app
store optimization (ASO), community participation (forums, Reddit, Hacker News). Key
property: high upfront input time, low marginal cost at scale, compounding returns.
**Paid**
Distribution happens through purchased placements. Examples: paid search (Google Ads,
Bing Ads), social ads (Facebook, LinkedIn, Twitter, TikTok), display advertising,
affiliate programs, sponsorships, TV/radio/print. Key property: immediate feedback loop,
precise targeting, linear cost scaling (more spend = more reach, stops when spend stops).
---
### Step 6 — Score Each Channel on the 6-Factor Balfour Matrix
For each candidate channel, assign a score of 1–10 on each of the six factors. Higher
score = more favorable for your situation. Record scores in a table.
**Factors and scoring summary (1 = unfavorable, 10 = highly favorable):**
| Factor | What it measures | Score 10 | Score 1 |
|---|---|---|---|
| **Cost** | Expected spend to run the experiment | Near-zero (email list, organic SEO) | High (TV, trade show, competitive SEM) |
| **Targeting** | Precision of audience reach | Surgical (named-account list, LinkedIn filters) | Broad (display network, national TV/radio) |
| **Control** | Ability to adjust or stop once live | Full real-time (paid ads, A/B test) | None after launch (print, live events) |
| **Input Time** | Time to launch the experiment | Same day (email, existing paid search page) | 4+ weeks (SEO content series, TV production) |
| **Output Time** | Time to get actionable results | 1–3 days (paid search, email click rates) | 2–6 months (SEO ranking, community growth) |
| **Scale** | Maximum reachable audience size | Massive (Google Search, Facebook, viral K>0.5) | Small (outbound sales reps, local community) |
For detailed rubric anchors with scored examples for each factor, see
[`references/balfour-six-factor-rubric.md`](references/balfour-six-factor-rubric.md).
**Scoring table format:**
| Channel | Category | Cost | Targeting | Control | Input Time | Output Time | Scale | Avg | Notes |
|---|---|---|---|---|---|---|---|---|---|
| [Channel A] | [Viral/Organic/Paid] | X | X | X | X | X | X | X.X | |
| [Channel B] | ... | ... | ... | ... | ... | ... | ... | ... | |
Compute the average score across all 6 factors for each channel. Sort descending.
---
### Step 7 — Compute Average and Rank
Sum the 6 factor scores for each channel and divide by 6. Sort the channel list by
average score, descending.
**Why average rather than weighted sum:** The 6 factors are designed to collectively
surface fitness, not to be optimized individually. A channel that scores 10 on Scale but
1 on Cost and 1 on Control is a high-risk bet that should surface as borderline, not
as a winner. Averaging preserves this balance. If the team has specific constraints
(e.g., "we have no engineering bandwidth for 8 weeks"), downweight Input Time manually
and note the adjustment.
**Tie-breaking rule:** When two channels have equal averages, prefer the one with higher
Control — it means you can learn faster and course-correct more cheaply.
---
### Step 8 — Recommend 2–3 Channels for Discovery Phase
Select the top 2–3 channels from the ranked list for the Discovery phase.
**Why 2–3, not more:** The channel diversification fallacy is a well-documented startup
trap. Larry Page's framing is instructive: "more wood behind fewer arrows" — concentrated
effort on fewer channels produces deeper learning faster. Peter Thiel's framing is more
stark: most businesses get zero distribution channels to work; if you try several but
don't nail one, you're finished. Attempting 5–7 channels simultaneously means each gets
shallow testing, no channel reaches statistical confidence, and the team learns nothing
actionable.
**Why not just 1:** A single channel creates brittle dependence. Two or three allows
comparison learning: you discover not just whether a channel works, but *why* one works
better than another — which reveals audience insights that compound across future
experiments.
**Composition guidance:** Ideally select one channel from each category (viral, organic,
paid) if the top 3 allow it. If all top 3 are paid, note the risk and consider adding
the highest-scoring organic or viral channel even if its average is slightly lower.
**For each recommended channel, document:**
- Why it ranked highest (which factors drove the score)
- The specific experiment to run in Discovery (what hypothesis, what creative, what
audience segment, what landing page)
- The resource commitment required (budget, engineering hours, content creation time)
---
### Step 9 — Define Graduation Criteria to Optimization Phase
For each Discovery phase channel, define explicit, time-bounded graduation criteria.
Without these, Discovery never ends — teams keep tweaking without committing to scale.
**Why explicit criteria:** The Discovery-to-Optimization gate is where most teams get
stuck. They run one experiment, see mixed results, run another variant, see better
results, run another variant — and never make the call to scale. Pre-committing to
graduation criteria removes the decision bias.
**For each channel, define:**
- **CAC target:** What maximum cost per acquired user (or lead, or install) is acceptable
given LTV? (e.g., "CAC ≤ $25 for a product with $120 expected 12-month LTV")
- **Volume threshold:** Minimum number of acquisitions in the test window to confirm the
signal is real and not noise (e.g., "100 installs with ≥ 20% Day-7 retention")
- **Confidence window:** Time box for the Discovery experiment (e.g., "3-week test
window, minimum $3,000 spend on paid channels")
- **Segment specificity:** Does the channel only work for a narrow segment, or broadly?
(Narrow is fine for Discovery; confirm breadth in Optimization)
**Graduation call:** If a channel meets CAC target AND volume threshold within the
confidence window → promote to Optimization. If it misses either → either redesign the
experiment (different creative, different audience, different landing page) with one
more iteration, or retire the channel and replace with the next-ranked candidate.
---
### Step 10 — Emit Deliverables
Write two files:
**`channel-selection-matrix.md`** — Contains:
- Fit diagnosis summary (language/market fit status, channel/product fit status)
- Business model first-cut rationale
- Channel classification table
- Full 6-factor scoring matrix with scores, averages, and ranking
- Recommended Discovery phase channels (2–3) with rationale
- Experiment brief for each recommended channel
- Graduation criteria table
**`acquisition-fit-diagnosis.md`** — Contains:
- Language/market fit diagnosis (GREEN/YELLOW/RED) with specific gaps identified
- Channel/product fit diagnosis (GREEN/YELLOW/RED) with specific gaps identified
- Audience behavior summary (where the audience actually is)
- Recommendations for resolving any RED flags before scaling
---
## Key Principles
**1. Language/market fit is the prerequisite — without it, every channel looks broken.**
If copy doesn't resonate in 8 seconds, ads generate low CTR, landing pages fail to
convert, and the team misattributes the failure to the channel. Fix the message before
diagnosing the distribution.
**2. Channel diversification is a fallacy for startups — depth beats spread.**
Two or three channels tested deeply produces actionable learning. Seven channels tested
shallowly produces noise. Concentrated effort reaches statistical confidence faster and
cheaper.
**3. Discovery phase ≠ Optimization phase — don't optimize what you haven't validated.**
Running A/B tests on ad creative before confirming that the channel can hit CAC target
at any meaningful volume is premature optimization. Discover first. Optimize second.
**4. The 6 factors capture what is commonly mis-weighted — especially Input Time.**
Teams routinely underestimate Input Time, selecting channels that sound promising but
take 6 weeks to launch, delaying learning. Scoring Input Time forces an honest
conversation about team capacity before committing.
**5. Business model dictates the first cut — don't score channels that are structurally
wrong.**
A B2B enterprise team should not be spending scoring cycles on TikTok ads. The
first-cut heuristic eliminates structural mismatches before they pollute the matrix.
**6. Score channels as a team, not in isolation — calibration matters.**
The 6-factor scores are estimates. Different team members may score the same channel
differently based on different assumptions. Run the scoring exercise together, surface
disagreements explicitly, and document assumptions. A calibrated team score is more
reliable than an individual PM's score.
---
## Examples
### Example A — B2B SaaS (developer tool, $49/mo, targeting senior engineers at Series A–B companies)
**Language/market fit:** YELLOW — landing page copy uses generic "improve your workflow"
framing. Interviews reveal users talk about "stopping context switching between tools."
Recommendation: update copy before scaling.
**Channel/product fit:** GREEN — existing customers found the product through Hacker
News posts, Twitter/X engineering threads, and word-of-mouth from colleagues.
**First-cut:** B2B → prioritize content (thought leadership in developer communities),
outbound (senior engineers on LinkedIn), and paid search on specific query terms.
**Scoring snapshot (illustrative):**
| Channel | Category | Cost | Targeting | Control | Input | Output | Scale | Avg |
|---|---|---|---|---|---|---|---|---|---|
| LinkedIn outbound | Paid | 7 | 9 | 8 | 8 | 8 | 5 | 7.5 |
| Hacker News Show HN | Organic | 9 | 6 | 4 | 8 | 5 | 6 | 6.3 |
| Google Search (long-tail) | Paid | 6 | 8 | 9 | 6 | 9 | 5 | 7.2 |
| Twitter/X engineering content | Organic | 8 | 6 | 7 | 7 | 4 | 6 | 6.3 |
| Content blog (SEO) | Organic | 7 | 5 | 6 | 3 | 2 | 7 | 5.0 |
**Recommended Discovery channels:** LinkedIn outbound (7.5 avg), Google Search long-tail
(7.2 avg), Hacker News Show HN (6.3 avg, plus organic brand signal).
**Graduation criteria:**
- LinkedIn: 30 qualified demo requests within 4 weeks at ≤ $150 CAC
- Google Search: 50 free trial signups within 3 weeks at ≤ $40 CAC
- HN Show HN: 200 signups from one post — repeat with 2nd post to confirm repeatability
---
### Example B — Consumer App (meal planning app, freemium, targeting busy parents)
**Language/market fit:** RED — all ads use "meal planning made easy" which tests poorly.
User interviews reveal the felt need is "stop the 5pm dinner panic." Recommend a
messaging sprint before scaling any channel.
**Channel/product fit:** YELLOW — analytics show organic social referrals and App Store
search drive 70% of current installs. Team wants to add paid Instagram. Need to confirm
audience is on Instagram specifically (vs. Facebook or TikTok).
**First-cut:** Consumer viral potential → referral program and viral sharing first.
E-commerce-adjacent → App Store search (ASO) and paid search on recipe queries.
**Scoring snapshot (illustrative):**
| Channel | Category | Cost | Targeting | Control | Input | Output | Scale | Avg |
|---|---|---|---|---|---|---|---|---|---|
| Referral program | Viral | 8 | 7 | 7 | 5 | 6 | 8 | 6.8 |
| App Store Search (ASO) | Organic | 9 | 7 | 5 | 5 | 4 | 7 | 6.2 |
| Facebook/Instagram ads | Paid | 5 | 7 | 9 | 8 | 9 | 8 | 7.7 |
| Google Search (recipe queries) | Paid | 6 | 6 | 9 | 7 | 9 | 8 | 7.5 |
| TikTok organic content | Organic | 8 | 5 | 5 | 6 | 3 | 8 | 5.8 |
**Recommended Discovery channels:** Facebook/Instagram ads (7.7 — but hold until
language/market RED resolved), Google Search recipe queries (7.5), Referral program
(6.8 — low input cost, compounding upside).
**Graduation criteria:**
- Facebook/Instagram: 500 app installs at ≤ $3.50 CAC with ≥ 25% Day-7 retention within
3 weeks at $5,000 test budget. Only start after messaging sprint resolves RED flag.
- Google Search: 300 installs at ≤ $4 CAC within 2 weeks
- Referral: K-factor ≥ 0.3 within first 6 weeks of launch (meaning 30% of acquirees
refer at least one more user)
---
## References
- Brian Balfour, "5 Steps to Choose Your Customer Acquisition Channel," Coelevate, 2013
(the source of the 6-factor framework, cited directly in Hacking Growth Chapter 5)
- James Currier coined "language/market fit" — discussed in Chapter 5 acquisition intro
- Justin Mares, Gabriel Weinberg, Andrew Chen, James Currier — channel category taxonomy
(viral/organic/paid) as attributed in the book
- Peter Thiel on single-channel depth — referenced in Chapter 5 footnote 12, originally
from Blake Masters' notes on Thiel's Stanford CS183 startup course, 2012
- Larry Page on "more wood behind fewer arrows" — Chapter 5
---
## License
Content derived from *Hacking Growth* by Sean Ellis and Morgan Brown, used under fair
use for educational skill generation. This SKILL.md file is licensed under
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
---
## Related BookForge Skills
**Required dependencies (run these first):**
```
clawhub install bookforge-north-star-metric-selector
clawhub install bookforge-growth-experiment-prioritization-scorer
```
**Useful follow-on skills (run these after channel selection):**
```
clawhub install bookforge-viral-loop-designer
clawhub install bookforge-activation-funnel-diagnostic
```
- `north-star-metric-selector` — Define the metric that determines what "acquisition
success" means before choosing channels
- `growth-experiment-prioritization-scorer` — Score and sequence experiments within the
channels you've selected (ICE scoring, growth meeting protocol)
- `viral-loop-designer` — Design the instrumented virality mechanism once viral is your
top-ranked Discovery channel (K-factor modeling, loop architecture)
- `activation-funnel-diagnostic` — Diagnose and fix activation before scaling acquisition
spend; channels that acquire well but activate poorly burn budget with no compounding
FILE:references/balfour-six-factor-rubric.md
# Balfour 6-Factor Channel Scoring Rubric
Detailed 1–10 scoring anchors for each factor in the acquisition channel selection
matrix. Use these anchors when running Step 6 of the `acquisition-channel-selection-scorer`
skill to calibrate scores across the team.
Higher score = more favorable for your situation. All scores are product- and
business-context-specific — the same channel can score differently for different teams.
---
## Factor 1: Cost
What you expect to spend to run the experiment in question.
- **10 (near-zero cost):** Organic SEO optimization of existing pages, referral program
built on existing product infrastructure, email to an existing opted-in list, writing a
guest post, posting in a community you already participate in
- **5 (moderate cost):** Facebook or LinkedIn ads at $2,000–$5,000 test budget; content
blog post commissioned from a contractor ($300–$800); basic affiliate setup with a
small partner; PR outreach with a freelance publicist
- **1 (high cost):** TV or radio campaign production + media buy, trade show booth
sponsorship ($10,000–$50,000+), large influencer partnership (>$5,000 per post),
highly competitive SEM keywords ($20–$100+ CPC where you must outbid entrenched
competitors)
**Key calibration note:** Cost is relative to your budget and to the LTV of the
customers you're acquiring. A $50,000 trade show investment may score 7 if your average
customer LTV is $100,000 and the show reaches 200 highly qualified prospects.
---
## Factor 2: Targeting
How precisely you can reach your intended audience and how specific you can be about
who the experiment reaches.
- **10 (surgical precision):** LinkedIn ads filtered by exact job title + company size +
industry + seniority level; personalized email outreach to a curated named-account
list; a niche community (Slack group, subreddit, Discord server) where your ICP hangs
out and you can post directly to them; retargeting a warm list of known prospects
- **5 (moderate precision):** Facebook or Instagram interest targeting (reaches
approximate audience but with noise); Google Search on moderately specific keywords;
topical newsletter sponsorship (good audience alignment but not individual-level
precision); conference sponsorship in your vertical
- **1 (near-zero precision):** Display ad network programmatic buys, billboards,
national TV, national radio — reaches large audiences, most of whom are not your
target customer; broad keyword SEM on head terms ("project management software")
**Key calibration note:** Targeting score increases with audience specificity. A product
with a very narrow ICP (e.g., "CFOs at healthcare companies with >500 employees") will
score targeting differently than a mass consumer app where almost anyone is a potential
user.
---
## Factor 3: Control
How much control you have over the experiment once it is live. Can you make changes?
Can you pause or stop it? Can you adjust targeting, creative, or budget mid-flight?
- **10 (full real-time control):** Paid social (Facebook, LinkedIn, Google Ads) — budget
cap, creative swap, audience adjustment, and pausing are all possible in real time;
A/B test with a holdout group where you can call the test at any time; email campaign
where you can halt sending
- **5 (partial control):** Email campaign being sent in batches (can pause between
batches); content marketing with republication or update option; referral program where
you can adjust the incentive structure; pre-roll video ads where creative is fixed but
targeting and budget can adjust
- **1 (no control after launch):** Print advertisement in a magazine (cannot change once
printed), TV spot after the buy is placed, trade show sponsorship once the event has
begun, PR pitch once the journalist has the story — all are committed and irreversible
**Key calibration note:** Control is directly correlated with learning speed. High-
control channels let you make mid-experiment corrections, increasing the expected value
of each experiment dollar spent.
---
## Factor 4: Input Time
How much time it will take the team to launch the experiment — from the decision to run
it until it is live and generating data.
- **10 (launch within 1 day):** Email to an existing opted-in list using an existing
template; paid search ad pointing to an existing landing page; organic social post;
LinkedIn direct outreach message using a prepared template; DM campaign to known
prospects
- **5 (launch within 1–2 weeks):** New landing page A/B test requiring design and
development; Facebook ad campaign requiring new creative production; basic referral
program using an off-the-shelf tool (e.g., ReferralHero, Viral Loops); podcast
sponsorship with an available slot
- **1 (launch requires 4+ weeks):** SEO content series requiring research, writing,
editing, and indexing; trade show preparation (booth design, staff scheduling,
materials printing, travel logistics); TV or radio ad production cycle; conference
talk submission reviewed by a committee; partnership negotiation and legal review
**Key calibration note:** Input Time is consistently underestimated. A channel that
requires 6 weeks of engineering to instrument means 6 weeks of delay before you learn
anything. Score Input Time honestly based on your actual team bandwidth, not in theory.
---
## Factor 5: Output Time
How long it will take to get actionable results out of the experiment once it is live.
- **10 (results within 1–3 days):** Paid search or paid social ads — conversion data
is near-immediate; statistical significance can be reached within a week at modest
budgets ($1,000–$3,000); email campaign open and click-through rates visible within
hours; A/B test on a high-traffic landing page
- **5 (results within 2–4 weeks):** Referral program — need enough cycles to observe
meaningful K-factor signal; landing page A/B test at moderate traffic volume (5,000–
10,000 unique visitors/week); outbound sales sequence (need to complete the sequence
to measure response rate through the full funnel)
- **1 (results take 2–6 months):** SEO content (organic indexing + ranking time before
traffic materializes); PR campaign (earned media impact is diffuse and hard to
attribute); community building (time to build audience density that generates traffic);
brand advertising without direct response tracking
**Key calibration note:** Output Time affects your experiment learning rate. Short output
time channels allow you to run 4–6 discovery experiments in a quarter. Long output time
channels may allow only 1–2. When budget is constrained, prefer faster feedback loops.
---
## Factor 6: Scale
The maximum size of the audience you can reach with the experiment, assuming the channel
performs well.
- **10 (massive addressable scale):** Google Search ads on high-volume keywords (millions
of queries/day); Facebook or Instagram ads (3+ billion users, many addressable
segments); viral mechanism with K-factor > 0.5 and a growing base (exponential
growth); TV national broadcast
- **5 (medium scale):** LinkedIn (targeted professional audience — large in total but
targeting reduces effective reach to tens of thousands per campaign); email newsletter
sponsorship in a niche vertical (10,000–100,000 readers); mid-tier influencer with
100,000 engaged followers; regional radio
- **1 (small scale):** Targeted sales outreach (capped by rep capacity at 50–200
high-quality touches per rep per week); local community meetup or Meetup.com group;
niche trade publication with 2,000 subscribers; small subreddit or Slack community
**Key calibration note:** High scale is a double-edged sword. Channels with very high
scale often have lower targeting precision. Do not chase scale before confirming that
the channel can convert at acceptable CAC — scaling a leaky channel amplifies waste.
---
## Summary Table (Rubric at a Glance)
| Factor | Score 10 | Score 5 | Score 1 |
|---|---|---|---|
| **Cost** | Near-zero (email, organic SEO) | Moderate ($2K–$5K paid test) | High (TV, trade show, competitive SEM) |
| **Targeting** | Surgical (named account list, exact LinkedIn filters) | Moderate (interest targeting, keyword SEM) | Broad (display, national TV/radio) |
| **Control** | Full real-time (paid ads, A/B test) | Partial (batched email, adjustable referral) | None (print, TV spot, live events) |
| **Input Time** | Same day (email, paid search with existing page) | 1–2 weeks (new creative, basic referral) | 4+ weeks (SEO series, trade show, TV production) |
| **Output Time** | 1–3 days (paid search, email open rates) | 2–4 weeks (referral K-factor, moderate traffic A/B) | 2–6 months (SEO, community building, PR) |
| **Scale** | Massive (Google Search, Facebook, viral K>0.5) | Medium (LinkedIn campaign, niche newsletter) | Small (outbound sales, local community, niche pub) |
---
*Source: Balfour's original framework from "5 Steps to Choose Your Customer Acquisition
Channel" (Coelevate, 2013), as cited in Hacking Growth Chapter 5. The 1–10 numeric
scale and rubric anchors are BookForge extensions for operationalizing the original
High/Medium/Low scheme.*
Prepare architecture negotiation strategies for conversations with business stakeholders, other architects, and developers using proven techniques. Use this...
---
name: stakeholder-negotiation-planner
description: Prepare architecture negotiation strategies for conversations with business stakeholders, other architects, and developers using proven techniques. Use this skill whenever the user needs to push back on unrealistic requirements, defend an architecture decision to management, convince a skeptical developer or senior engineer, navigate disagreements about technology choices, negotiate trade-offs between features and technical debt, deal with stakeholders who demand conflicting quality attributes, handle situations where someone with more authority or experience disagrees with their technical recommendation, or any situation requiring persuasion around architecture decisions — even if they don't explicitly say "negotiation."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/stakeholder-negotiation-planner
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [23]
tags: [software-architecture, architecture, negotiation, leadership, stakeholders, communication, soft-skills]
depends-on: []
execution:
tier: 1
mode: hybrid
inputs:
- type: none
description: "Negotiation context from the user — who they're negotiating with, what the disagreement is about, and what outcome they want"
tools-required: [Read, Write]
tools-optional: []
mcps-required: []
environment: "Any agent environment. No codebase required."
---
# Stakeholder Negotiation Planner
## When to Use
You need to prepare a strategy for an architecture negotiation — a conversation where the architect must persuade, push back, or reach consensus with someone who disagrees. Typical triggers:
- A stakeholder demands unrealistic quality attributes (e.g., 99.999% availability for an internal tool)
- Another architect or senior developer disagrees with a technology or pattern choice
- The product team wants features but the architect believes technical debt must be addressed first
- The architect needs to justify infrastructure costs to business leadership
- A team member with more seniority or authority is blocking a technical recommendation
This skill prepares the negotiation strategy. The human executes the actual conversation.
Before starting, verify:
- Who is the negotiation with? (business stakeholder, architect, developer)
- What is the specific disagreement?
## Context
### Required Context (must have before proceeding)
- **Negotiation counterpart:** Who is the architect negotiating with?
-> Check prompt for: titles (CTO, VP, PM, developer, senior engineer), names, relationships
-> If still missing, ask: "Who are you negotiating with — a business stakeholder, another architect, or a developer?"
- **The disagreement:** What is the specific point of contention?
-> Check prompt for: technology choices, quality attributes, requirements, priorities, trade-offs
-> If still missing, ask: "What is the specific disagreement or decision you need to navigate?"
### Observable Context (gather from environment)
- **Power dynamics:** Does the counterpart have more organizational authority?
-> Check prompt for: hierarchy mentions, "my boss," "C-level," seniority references
-> If unavailable: assume peer-level negotiation
- **Relationship history:** Is there an existing relationship or pattern of disagreement?
-> Check prompt for: "always," "keeps saying no," "we've had this fight before"
-> If unavailable: assume first significant disagreement
- **Stakes:** What happens if the negotiation fails?
-> Check prompt for: project impact, cost implications, timeline effects
-> If unavailable: assess from the disagreement itself
### Default Assumptions
- If counterpart type unknown -> assume business stakeholder (most common architecture negotiation)
- If power dynamics unknown -> prepare strategies that work regardless of hierarchy
- If stakes unknown -> assume moderate (worth negotiating but not existential)
### Sufficiency Threshold
```
SUFFICIENT when ALL of these are true:
- The counterpart type is known (business, architect, developer)
- The specific disagreement is understood
- The user's desired outcome is clear or can be inferred
PROCEED WITH DEFAULTS when:
- Counterpart and disagreement are known
- Power dynamics and history can be assumed
MUST ASK when:
- The counterpart type is completely unclear
- The disagreement itself is ambiguous
```
## Process
### Step 1: Classify the Negotiation Type
**ACTION:** Determine which of the three negotiation audience types this falls into, as each requires different techniques.
**WHY:** Negotiating with a business stakeholder who doesn't understand technology requires completely different techniques than negotiating with a senior developer who understands the technology deeply. Business stakeholders respond to cost and time framing. Developers respond to technical demonstrations. Other architects respond to evidence-based trade-off analysis. Using the wrong approach wastes credibility.
| Audience | Core Technique | Key Lever |
|----------|---------------|-----------|
| **Business stakeholders** | Leverage their grammar — translate to cost, time, and risk | They care about business outcomes, not technical elegance |
| **Other architects** | Divide and conquer — find areas of agreement, isolate disagreements | They understand trade-offs; focus on evidence and alternatives |
| **Developers** | Demonstration defeats discussion — show, don't tell | They trust working code and concrete examples over authority or argument |
### Step 2: Prepare Audience-Specific Strategy
**AGENT: EXECUTES** — produces the negotiation brief
**ACTION:** Based on the audience type, prepare the negotiation strategy using the appropriate techniques. For detailed technique breakdowns, see [references/negotiation-techniques.md](references/negotiation-techniques.md).
**WHY:** Each technique targets a specific cognitive bias or communication barrier. Leverage grammar works because stakeholders literally don't hear technical arguments — they filter for business impact. Demonstration defeats discussion because developers have seen too many "it should work in theory" arguments fail in practice. Divide and conquer works on architects because it prevents ego-driven all-or-nothing positions.
**For Business Stakeholders:**
1. **Leverage their grammar** — Use business terms (cost, time-to-market, risk, competitive advantage) not technical terms. If the CTO says "we need 99.999% availability," don't argue about infrastructure complexity. Instead say: "99.999% means 5 minutes downtime per year and costs $200K in infrastructure. 99.9% means 8.7 hours per year and costs $40K. Given this is an internal tool used by 50 people, which investment matches the business value?"
2. **State impacts in cost and time** — Every technical recommendation must translate to dollars and calendar time. "We should refactor the data layer" is invisible to a business stakeholder. "Refactoring the data layer now costs 3 weeks but saves 2 weeks per feature for the next 12 features" is a clear business case.
3. **Provide justification, not dictation** — Never say "because I'm the architect." Explain the WHY. Architects who dictate without justification create the Ivory Tower anti-pattern — disconnected, distrusted, eventually ignored.
**For Other Architects:**
1. **Divide and conquer** — Before debating the disagreement, establish everything you agree on. "We agree the system needs to be distributed. We agree REST is the right protocol for these 4 services. The only disagreement is whether services A and B should communicate via REST or events." This prevents the conversation from becoming a referendum on either architect's overall competence.
2. **Present trade-off analysis** — Bring a structured comparison of both approaches across multiple dimensions. Let the evidence drive the conclusion rather than either person's preference.
3. **Acknowledge when they're right** — If the other architect has a valid point, say so explicitly. Credibility comes from intellectual honesty, not from winning every point.
**For Developers:**
1. **Demonstration defeats discussion** — If a developer insists REST is better than event-driven, don't argue. Build a small proof-of-concept showing the specific problem (latency under load, coupling during deployments) that event-driven solves. Working code settles technical arguments faster than any slide deck.
2. **Avoid ivory tower behavior** — Stay connected to the codebase. Architects who don't code lose credibility with developers. You can't demand event-driven architecture if you've never implemented an event consumer.
3. **Explain the WHY behind constraints** — Developers comply reluctantly with rules they don't understand. When they understand WHY a constraint exists, they enforce it themselves.
### Step 3: Apply the 4 C's Framework
**AGENT: EXECUTES** — integrates the 4 C's into the strategy
**ACTION:** Evaluate and strengthen the negotiation strategy against the 4 C's of Architecture.
**WHY:** The 4 C's are a meta-framework for all architect communication, not just negotiation. Failing at any one of them undermines the negotiation regardless of how good the technical argument is.
1. **Communication** — Is the message clear to THIS audience? Technical language for business stakeholders = communication failure. Business jargon for developers = condescension.
2. **Collaboration** — Is this framed as a joint problem-solving exercise or a win-lose argument? Negotiations framed as "let's figure this out together" succeed more than "I'm right, here's why."
3. **Clarity** — Is the recommendation unambiguous? Vague recommendations ("we should probably consider something more scalable") invite reinterpretation. Clear recommendations ("we should replace the single database with two domain-partitioned databases, one for user profiles and one for transactions") leave no room for misunderstanding.
4. **Conciseness** — Is the argument as short as it can be while remaining complete? Executives check out after 3 minutes. Developers check out when they sense padding. Architects check out when arguments become circular.
### Step 4: Identify BATNA and Compromise Positions
**AGENT: EXECUTES** — defines fallback positions
**ACTION:** Define the Best Alternative to a Negotiated Agreement (BATNA) and identify possible compromise positions.
**WHY:** Entering a negotiation without knowing your walk-away point and your compromise zone is negotiating blind. The architect should know: "If I can't get event-driven architecture, I can accept REST with a message queue for the high-throughput services as a compromise. My BATNA is documenting the risk and revisiting in 6 months when the performance problems materialize."
Include:
1. **Ideal outcome** — what you want if the negotiation goes perfectly
2. **Acceptable compromise** — what you can live with that still addresses the core concern
3. **BATNA** — what you do if the negotiation fails entirely
4. **Red lines** — what you cannot accept under any circumstances, with justification for each
### Step 5: Generate the Negotiation Brief
**AGENT: EXECUTES** — produces the final deliverable
**HANDOFF TO HUMAN** — the user conducts the actual negotiation
**ACTION:** Compile the complete negotiation strategy into a concise brief the user can reference before and during the conversation.
## Inputs
- Who the negotiation is with (role, seniority, relationship)
- What the specific disagreement is about
- What outcome the user wants
- Optionally: organizational context, budget constraints, timeline pressures, past negotiation history
## Outputs
### Negotiation Strategy Brief
```markdown
# Negotiation Brief: {Topic}
## Situation
- **Counterpart:** {who, role, relationship}
- **Disagreement:** {what's contested}
- **Stakes:** {what's at risk if unresolved}
## Audience Classification: {Business / Architect / Developer}
## Strategy
### Key Techniques
1. {primary technique with specific application}
2. {secondary technique}
3. {fallback technique}
### Opening Frame
{How to open the conversation — specific language to use}
### Key Arguments (in order of deployment)
1. {strongest argument, in counterpart's language}
2. {supporting argument}
3. {evidence/demonstration if available}
### Anticipated Objections and Responses
| They might say... | Respond with... |
|-------------------|-----------------|
| {objection 1} | {response using appropriate technique} |
| {objection 2} | {response} |
## Positions
| Position | Description |
|----------|-------------|
| **Ideal outcome** | {best case} |
| **Acceptable compromise** | {what you can live with} |
| **BATNA** | {what you do if negotiation fails} |
| **Red lines** | {what you cannot accept, with WHY} |
## 4 C's Check
- Communication: {language adapted for audience? Y/N}
- Collaboration: {framed as joint problem-solving? Y/N}
- Clarity: {recommendation unambiguous? Y/N}
- Conciseness: {argument as short as possible? Y/N}
```
## Key Principles
- **Leverage the counterpart's language, not yours** — WHY: People literally don't hear arguments framed in unfamiliar vocabulary. A CTO who hears "event sourcing with CQRS" stops listening. A CTO who hears "$50K infrastructure savings per year" leans forward. The same recommendation, different framing, completely different reception.
- **Demonstration defeats discussion** — WHY: Particularly with developers, words are cheap. Everyone has heard "this architecture will be better" before. A 30-minute proof-of-concept that shows the latency improvement under load settles the argument permanently. Code doesn't have an ego.
- **Divide and conquer reduces ego investment** — WHY: When the whole architecture is on the table, defending a position feels like defending your professional identity. When only one isolated decision is being discussed ("REST vs events for this specific service pair"), it's just a technical choice. Isolating the disagreement makes it safe to change your mind.
- **Never dictate without justification** — WHY: Architects who say "because I said so" or "because I'm the architect" create the Ivory Tower anti-pattern. They become disconnected from the team, distrusted by stakeholders, and eventually irrelevant. Every constraint must come with a WHY. When people understand the reasoning, they become allies instead of reluctant compliers.
- **Always have a BATNA** — WHY: An architect without a walk-away plan makes desperate concessions. Knowing "if this negotiation fails, I will document the risk and revisit when the predicted problems occur" provides confidence and prevents over-compromise.
- **The 4 C's are not optional** — WHY: Communication, Collaboration, Clarity, and Conciseness are the minimum standard for all architect interactions. An architect who is right but unclear, or right but adversarial, or right but rambling, fails just as thoroughly as one who is wrong.
## Examples
**Scenario: Pushing back on unrealistic availability requirements**
Trigger: "My CTO wants 99.999% availability for our internal tool used by 50 employees. That would cost $200K in infrastructure."
Process: Classified as business stakeholder negotiation. Primary technique: leverage their grammar by translating availability percentages to cost and downtime impact. Prepared comparison table: 99.9% = 8.7 hours downtime/year at $40K vs 99.999% = 5 minutes downtime/year at $200K. Framed as "what's the cost of each hour of downtime for 50 internal users?" to let the math make the argument. BATNA: implement 99.9% with monitoring, propose revisiting if actual downtime impacts exceed the cost differential. 4 C's check: using cost language (Communication), framing as "let's find the right investment" (Collaboration), specific numbers not vague "it's expensive" (Clarity), two-slide comparison not a 30-page report (Conciseness).
Output: Negotiation brief with cost comparison table, opening frame, anticipated objections (e.g., "but what about that outage last year?"), and compromise position at 99.95%.
**Scenario: Disagreement with senior developer on architecture pattern**
Trigger: "The senior developer insists REST is better for everything. He has 15 years of experience."
Process: Classified as developer negotiation. Primary technique: demonstration defeats discussion. Prepared a plan for a small proof-of-concept showing the specific scenario where event-driven outperforms REST (e.g., order processing where downstream services don't need synchronous response). Also identified the Frozen Caveman pattern — the senior developer may be defaulting to REST because of a bad experience with messaging years ago. Strategy: acknowledge their REST expertise explicitly, agree REST is appropriate for most of the services (divide and conquer), but propose a POC for the 2 services where async communication has clear benefits. BATNA: implement REST everywhere with the understanding that the 2 high-throughput services may need refactoring later, and document this prediction in an ADR.
Output: Negotiation brief with POC proposal, specific service pair for demonstration, acknowledgment language, and ADR template for documenting the decision.
**Scenario: Negotiating tech debt vs features with product leadership**
Trigger: "The product team wants 5 new features. I think we need to address tech debt first."
Process: Classified as business stakeholder negotiation (VP of Product). Primary technique: state impacts in cost and time. Translated tech debt to business language: "Each feature currently takes 3 weeks because of the data layer complexity. After a 4-week refactor, each feature would take 1 week. Five features at 3 weeks = 15 weeks. Four-week refactor + five features at 1 week = 9 weeks. The refactor saves 6 weeks and every future feature is faster." Compromise position: do the refactor in parallel with 2 of the 5 features, deferring 3 features by 2 weeks. BATNA: proceed with all 5 features, document the increasing delivery time trend, and revisit when feature delivery time exceeds stakeholder patience.
Output: Negotiation brief with ROI calculation, delivery timeline comparison chart, compromise proposal, and risk documentation plan.
## References
- For detailed negotiation techniques by audience type, the 4 C's framework, and additional examples, see [references/negotiation-techniques.md](references/negotiation-techniques.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/negotiation-techniques.md
# Negotiation Techniques Reference
Detailed breakdown of negotiation techniques by audience type, with specific language patterns and examples.
## Audience Type 1: Business Stakeholders
Business stakeholders (CTOs, VPs, PMs, product owners) think in terms of business outcomes: revenue, cost, time-to-market, competitive advantage, risk. Technical arguments are invisible to them — not because they're unintelligent, but because their decision framework operates on different dimensions.
### Technique: Leverage Their Grammar
**What it means:** Translate every technical recommendation into business terms before presenting it. The architect does the translation work, not the stakeholder.
**Language patterns:**
- Instead of: "We need to implement event-driven architecture"
- Say: "We can reduce order processing time from 3 seconds to 200ms, which directly impacts cart abandonment rates"
- Instead of: "The monolith has high coupling"
- Say: "Every feature change takes 3 weeks instead of 1 because changes ripple through the entire system"
- Instead of: "We should use Kubernetes"
- Say: "We can reduce our deployment failures by 80% and deploy 5x more frequently, meaning features reach customers faster"
### Technique: State Impacts in Cost and Time
Every technical recommendation must include:
1. **How much does it cost** (in dollars and developer-weeks)?
2. **How much time does it add or save** (in weeks/months)?
3. **What is the cost of NOT doing it** (in future dollars and time)?
Business stakeholders make ROI calculations constantly. Give them the numbers to make this calculation about your recommendation.
### Technique: Provide Justification, Not Dictation
- Never say "because I'm the architect" or "trust me, this is the right approach"
- Always explain WHY the recommendation serves the business goal
- Frame the architect as a business advisor who happens to have technical expertise
- The Ivory Tower anti-pattern (disconnected architect who dictates from above) destroys credibility permanently
## Audience Type 2: Other Architects
Architect-to-architect disagreements are often the most difficult because both parties understand the technical landscape and both have valid perspectives. These negotiations require intellectual honesty and structured analysis.
### Technique: Divide and Conquer
1. Start by establishing everything you agree on — this is usually 80% or more of the architecture
2. Isolate the specific point of disagreement — make it as narrow as possible
3. Debate only the isolated point, using evidence and trade-off analysis
4. This prevents the negotiation from becoming a referendum on either architect's overall competence
### Technique: Structured Trade-off Comparison
Present both approaches side by side across multiple quality attribute dimensions:
```markdown
| Dimension | Approach A (Event-driven) | Approach B (REST) |
|-----------|---------------------------|-------------------|
| Performance | Better under high load | Sufficient for current load |
| Complexity | Higher operational complexity | Simpler operations |
| Coupling | Loose coupling | Tighter coupling |
| Debugging | Harder to trace | Straightforward |
| Cost | Higher infrastructure cost | Lower infrastructure cost |
```
Let the comparison make the argument. If your approach wins on 4 of 5 dimensions, the evidence speaks.
### Technique: Intellectual Honesty
- Acknowledge when the other architect's approach has genuine advantages
- Admitting "your approach is better for debugging and operations — my concern is specifically about coupling during independent deployments" builds more credibility than winning every point
- Architects who never concede anything are exhausting and eventually ignored
## Audience Type 3: Developers
Developers are pragmatists. They've heard many "this will be great in theory" arguments that failed in practice. They trust working code and concrete examples over authority or slidedecks.
### Technique: Demonstration Defeats Discussion
- Build a small proof-of-concept that shows the specific problem your recommendation solves
- 30 minutes of working code beats 3 hours of debate
- Let the developer run the POC themselves — self-discovery is more convincing than being told
### Technique: Stay Connected to Code
- Architects who don't write any code lose credibility with developers instantly
- Maintain technical depth through: proof-of-concepts, architecture fitness functions, tooling, code reviews
- You don't need to write production code, but you need to be able to
### Technique: Explain the WHY
- "Use async messaging between these services" (developer complies reluctantly, finds workarounds)
- "Use async messaging because Service A doesn't need to wait for Service B's response, and synchronous calls here create a 2-second latency that blocks the UI" (developer understands, enforces the pattern, and extends it to similar cases independently)
## The 4 C's of Architecture
A meta-framework for all architect communication:
### Communication
- Adapt language to the audience
- Test comprehension: "Does this make sense?" followed by "Can you summarize what we've agreed on?"
- Watch for glazed eyes — they indicate you've lost the audience
### Collaboration
- Frame disagreements as joint problem-solving, not adversarial debate
- Use "we" language: "How do we solve this?" not "Here's what you should do"
- Involve the counterpart in the analysis, don't present conclusions
### Clarity
- Specific recommendations, not vague suggestions
- "We should split the database into two: user profiles and transactions" not "We should probably think about our database strategy"
- Include concrete next steps
### Conciseness
- Executives: 3-minute attention span for technical topics
- Developers: tune out when they sense filler or padding
- Architects: lose patience with circular arguments
- Rule of thumb: say it in half the words you think you need
## Essential vs Accidental Complexity
A useful negotiation tool when discussing technical debt or architecture changes:
- **Essential complexity** — inherent to the problem domain. A tax calculation system will always be complex because tax law is complex. No architecture choice removes this.
- **Accidental complexity** — introduced by poor design choices, workarounds, and technical debt. A tax calculation system that's complex because of spaghetti code and redundant data models has accidental complexity on top of essential complexity.
When negotiating tech debt remediation, frame it as: "We're not adding complexity. We're removing accidental complexity so the team can focus on the essential complexity of the business problem."
Design a service-based architecture with 4-12 coarse-grained domain services, including service decomposition, database partitioning strategy (shared vs doma...
---
name: service-based-architecture-designer
description: Design a service-based architecture with 4-12 coarse-grained domain services, including service decomposition, database partitioning strategy (shared vs domain-partitioned vs per-service), API layer design, and ACID vs BASE transaction decisions. Use this skill whenever the user is designing a service-based system, decomposing a monolith into coarse-grained services, deciding how many services to create, choosing a database topology for distributed services, deciding between shared database and per-service databases, evaluating whether to add an API layer, determining ACID vs eventual consistency needs, or comparing service-based architecture against microservices — even if they don't use the exact phrase "service-based architecture."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/service-based-architecture-designer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on:
- architecture-characteristics-identifier
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [13]
tags: [software-architecture, architecture, service-based, distributed, domain-services, database-partitioning, ACID, coarse-grained]
execution:
tier: 1
mode: full
inputs:
- type: none
description: "System description, domain requirements, team context, and data consistency needs — the skill guides the entire service-based architecture design process"
tools-required: [Read, Write]
tools-optional: [Grep, Glob]
mcps-required: []
environment: "Any agent environment. If a codebase exists, can analyze current architecture."
---
# Service-Based Architecture Designer
## When to Use
You need to design a service-based architecture or evaluate whether service-based is the right distributed style for a system. Service-based architecture is a hybrid of microservices — it uses coarse-grained domain services (typically 4-12, averaging ~7) rather than fine-grained single-purpose services. It is considered the most pragmatic distributed architecture style. Typical situations:
- New distributed system — "we need something beyond a monolith but microservices feels like overkill"
- Monolith decomposition — "our deployments take hours and we want independent deployability"
- Architecture comparison — "should we use service-based or microservices?"
- Database topology — "should our services share a database or have separate ones?"
- Transaction design — "we need ACID transactions across some of these workflows"
Before starting, verify:
- Has an architecture style already been selected? If the user hasn't decided on service-based yet, consider using `architecture-style-selector` first.
- Are driving architecture characteristics known? If not, use `architecture-characteristics-identifier` — you need to know what quality attributes drive the design.
- If the user explicitly asks for service-based design, proceed directly.
## Context & Input Gathering
### Input Sufficiency Check
This skill designs a complete service-based architecture. You can proceed with partial information and fill gaps during the process, but certain inputs directly determine the quality of the architecture.
### Required Context (must have — ask if missing)
- **System purpose and domain:** What business capabilities does this system provide?
-> Check prompt for: domain description, list of features/modules, business workflows
-> If missing, ask: "What does your system do? What are the main business capabilities or modules?"
- **Business workflows that cross domain boundaries:** Which operations span multiple domains?
-> Check prompt for: transaction descriptions, workflow dependencies, mentions of "X needs to update Y"
-> If missing, ask: "Which business operations need to coordinate across multiple areas? For example, does placing an order also need to update inventory and billing atomically?"
### Important Context (strongly recommended — ask if easy to obtain)
- **Team size and distributed experience:** How many developers? Have they built distributed systems before?
-> Check prompt for: team mentions, experience level, current architecture
-> If missing, ask: "How large is your development team, and have they built distributed systems before?"
- **Current deployment pain:** What deployment problems are you trying to solve?
-> Check prompt for: deployment frequency, deployment duration, risk mentions
-> If missing and relevant, ask: "How long do deployments take and how often do you deploy?"
- **Data consistency requirements:** Which operations require strict transactional guarantees?
-> Check prompt for: ACID mentions, consistency requirements, "must be atomic" language
-> If missing, ask: "Which operations require strict transactional consistency (all-or-nothing), and which can tolerate eventual consistency?"
### Observable Context (gather from environment)
- **Existing architecture:** If refactoring, scan for current structure
-> Look for: package structure, module boundaries, database schemas, existing service definitions
-> Reveals: natural domain boundaries, current coupling patterns
### Default Assumptions
- If team experience unknown -> assume moderate (can handle service-based but not full microservices)
- If database strategy not specified -> default to shared database (the most common and simplest starting point)
- If service count not specified -> target ~7 services (the average for service-based)
- If API layer not discussed -> recommend adding one if external consumers exist
### Sufficiency Threshold
```
SUFFICIENT: system purpose + business capabilities + cross-domain workflows are known
PROCEED WITH DEFAULTS: system purpose + capabilities are known, cross-domain workflows unclear
MUST ASK: system purpose OR business capabilities are missing
```
## Process
### Step 1: Identify Domain Services
**ACTION:** Decompose the system into 4-12 coarse-grained domain services based on business capabilities.
**WHY:** Service-based architecture uses "domain services" — coarse-grained portions of an application that encapsulate an entire business domain (like OrderService, PaymentService), NOT fine-grained single-purpose services (like OrderPlacement, OrderValidation). The coarse granularity is the defining characteristic that differentiates service-based from microservices, and it is what preserves ACID transactions and simplifies orchestration. Each domain service internally orchestrates its own sub-operations through class-level calls rather than remote service calls.
**Process:**
1. List all business capabilities the system must support
2. Group related capabilities into cohesive domains (aim for 4-12 groups, ~7 average)
3. Each group becomes a domain service — name it after the domain, not the technical function
4. Verify each service is coarse enough: it should contain multiple related sub-operations, not just one
**Granularity checks:**
- **Too many services (>12):** You are drifting toward microservices. Merge related services. More services = more network calls, more distributed transaction complexity, less ACID safety.
- **Too few services (<3):** You haven't decomposed enough. The benefits of independent deployability and fault isolation are negligible with 2-3 services.
- **Right size (4-12):** Each service represents a complete business domain with multiple internal components.
**IF** a service only does one thing -> merge it with a related service
**IF** a service does unrelated things -> split it into separate domain services
### Step 2: Design Internal Service Structure
**ACTION:** Define the internal architecture of each domain service.
**WHY:** Each domain service is itself a mini-application with its own internal structure. Two design approaches exist: layered (technical partitioning with API facade, business logic, persistence layers) and domain-partitioned (API facade with internal sub-domain components, similar to modular monolith). The choice affects how easily the service can evolve. Domain-partitioned internal design is preferred when the service is complex enough to warrant sub-domain separation, because it makes future decomposition easier if a service eventually needs to be split.
**For each service, define:**
1. **API facade layer:** Every domain service must have an API access facade that orchestrates business requests from the UI. This facade is responsible for receiving a single business request and breaking it into internal sub-operations.
2. **Internal structure:** Choose layered (API facade -> business logic -> persistence) for simpler services, or domain-partitioned (API facade -> sub-domain components) for complex services.
3. **Internal components:** List the key components within each service.
### Step 3: Select Database Topology
**ACTION:** Choose the database partitioning strategy for the system.
**WHY:** Database topology is the most consequential infrastructure decision in service-based architecture. A shared database preserves SQL joins and ACID transactions across all services — this is the primary structural advantage of service-based over microservices. However, a shared database creates coupling through schema changes: modifying a table can force redeployment of all services that use it. The database topology directly determines whether you get ACID transactions (shared) or must implement distributed transactions like SAGA (per-service). Choosing per-service databases prematurely eliminates the ACID advantage that makes service-based architecture attractive in the first place.
**Decision tree:**
| Strategy | When to use | Trade-offs |
|----------|------------|------------|
| **Single shared database** | Default choice. Multiple services need joins across domains. ACID transactions span service boundaries. Team is small. | Simple. Preserves ACID. But: schema changes can impact all services. Mitigate with logical partitioning. |
| **Logically partitioned (shared DB, domain-scoped schemas)** | Want shared DB benefits but need to control schema change impact. Multiple services exist. | Best of both worlds. Services own their logical partition. Federated shared libraries match partitions. Common tables still need coordination. |
| **Domain-partitioned databases** | 2-3 domain groups have clearly separate data with no cross-domain joins needed. | Partial isolation. Some services share a DB, others are separate. Moderate complexity. |
| **Per-service databases** | Each service's data is truly independent. No cross-service joins needed. Team is ready for eventual consistency. | Maximum isolation. But: lose ACID across services. Need SAGA pattern for distributed transactions. Avoid unless necessary. |
**IF** shared database -> implement logical partitioning through federated shared libraries (one entity library per logical domain + one common library)
**IF** per-service databases -> document which workflows now require distributed transactions and plan SAGA implementation
**Critical rule:** Make the logical partitioning in the database as fine-grained as possible while still maintaining well-defined data domains to better control database changes within a service-based architecture.
### Step 4: Determine UI Topology
**ACTION:** Select the user interface deployment strategy.
**WHY:** Service-based architecture supports three UI variants, and the choice affects the number of architecture quanta (independently deployable units with distinct characteristics). A single monolithic UI means the entire frontend shares one deployment and one set of architecture characteristics. Domain-based or service-based UIs enable independent frontend deployments, which matters when different parts of the application face different user groups with different availability, scalability, or security needs.
**Options:**
| UI Topology | When to use | Quanta impact |
|------------|------------|---------------|
| **Single monolithic UI** | One user group, simple frontend, single deployment pipeline | All services + UI = 1 quantum (if shared DB) |
| **Domain-based UIs** | Different user groups (e.g., customer-facing vs internal operations) | Multiple quanta possible — each UI + its services can be a separate quantum |
| **Service-based UIs** | Maximum frontend independence, micro-frontend approach | Multiple quanta — each UI is coupled only to its service |
### Step 5: Decide on API Layer
**ACTION:** Determine whether to add an API layer (reverse proxy or gateway) between the UI and services.
**WHY:** An API layer is optional in service-based architecture but valuable in specific scenarios. Without an API layer, the UI accesses services directly using a service locator pattern, API gateway, or proxy embedded in the UI. Adding a separate API layer creates a centralized place for cross-cutting concerns (security, metrics, auditing, rate limiting, service discovery) and is particularly important when exposing services to external consumers. However, it adds another deployment unit, network hop, and potential single point of failure.
**Add an API layer when:**
- External systems or third parties will consume the services
- You need centralized security, auditing, or rate limiting
- Service discovery is needed (services move across infrastructure)
- Cross-cutting concerns are duplicated across the UI and multiple services
**Skip the API layer when:**
- Only internal UIs consume the services
- The team is small and wants to minimize deployment units
- Cross-cutting concerns are handled within each service
### Step 6: Map Transaction Boundaries
**ACTION:** For each cross-domain workflow, determine whether ACID or BASE transactions are needed, and ensure the database topology supports them.
**WHY:** This is where service-based architecture's core advantage materializes. Because services are coarse-grained and typically share a database, most business operations that span sub-operations (like "place order + apply payment + update inventory") happen WITHIN a single domain service using regular ACID database transactions. In microservices, this same operation would span 3 separate services requiring distributed transactions (SAGA pattern), compensating transactions, and eventual consistency. The moment you split services too fine or split the database too aggressively, you lose this advantage and must deal with all the distributed transaction complexity that service-based architecture was designed to avoid.
**For each cross-domain workflow:**
1. List the domains involved
2. If all domains are within ONE service -> ACID transaction (simple, preferred)
3. If domains span MULTIPLE services with SHARED database -> ACID transaction still possible via shared DB
4. If domains span services with SEPARATE databases -> BASE transaction required (SAGA pattern needed)
**IF** many workflows require cross-service ACID transactions -> reconsider service boundaries. Services that frequently transact together may belong in the same domain service.
**IF** BASE transactions are unavoidable -> document the SAGA choreography/orchestration and compensating actions for each workflow.
### Step 7: Validate and Score
**ACTION:** Validate the design against service-based architecture characteristic ratings and check for anti-patterns.
**WHY:** Every architecture style has known strengths and weaknesses. Service-based architecture has no five-star ratings but achieves four stars in many vital areas. Validating against the ratings ensures you are not expecting the architecture to excel where it structurally cannot (like extreme elasticity at 2 stars), and checking for anti-patterns catches the most common design mistakes before they become entrenched.
**Service-based architecture ratings:**
| Characteristic | Rating | Notes |
|---------------|:------:|-------|
| Deployability | 4 | Independent service deployment without full system release |
| Elasticity | 2 | Coarse services replicate more functionality than needed to scale |
| Evolutionary | 3 | Good domain isolation, moderate coupling through shared DB |
| Fault tolerance | 4 | One service failing does not take down others |
| Modularity | 4 | Domain-partitioned, changes scoped to single service |
| Overall cost | 4 | Much cheaper than microservices, event-driven, or space-based |
| Performance | 3 | Fewer network calls than microservices, but still distributed |
| Reliability | 4 | Less network traffic, fewer distributed transactions |
| Scalability | 3 | Can scale individual services, but coarse granularity limits efficiency |
| Simplicity | 3 | Simpler than other distributed styles, but still distributed |
| Testability | 4 | Smaller test scope per service than monolith |
**Anti-pattern checks:**
- **Too many services (>12):** You have drifted into microservices territory without the operational infrastructure to support it. Merge services.
- **Too few services (<3):** Not enough decomposition to gain independent deployability benefits. Consider whether service-based is the right style.
- **Premature database splitting:** Splitting databases without implementing SAGA creates data inconsistency risks. Keep shared DB until you have proven you don't need cross-service ACID.
- **Inter-service communication:** Domain services in service-based architecture should NOT call each other. If Service A needs to call Service B, either merge them or route through the UI/API layer. Direct inter-service calls create the coupling that service-based architecture is designed to avoid.
- **Single shared entity library:** Using one monolithic shared library for all database entity objects means a table change forces redeployment of every service. Use federated domain-scoped libraries instead.
## Inputs
- System description with business capabilities
- Business workflows, especially those crossing domain boundaries
- Team size and distributed systems experience
- Data consistency requirements (which workflows need ACID)
- Current architecture (if migrating from monolith)
- Scalability and availability requirements per domain
## Outputs
### Service-Based Architecture Design
```markdown
# Service-Based Architecture Design: {System Name}
## Design Context
**System:** {what it does}
**Team:** {size and experience}
**Key drivers:** {why service-based was chosen}
## Domain Services ({count} services)
| # | Service | Domain | Key Components | Instances |
|---|---------|--------|---------------|:---------:|
| 1 | {ServiceName} | {domain} | {component list} | {1 or N} |
| ... | ... | ... | ... | ... |
### Service Detail: {ServiceName}
**Domain:** {what business capability this covers}
**Internal design:** {layered or domain-partitioned}
**Components:**
- {Component 1}: {responsibility}
- {Component 2}: {responsibility}
## Database Topology
**Strategy:** {shared / logically partitioned / domain-partitioned / per-service}
**Reasoning:** {why this strategy was chosen}
{If logically partitioned:}
**Logical partitions:**
| Partition | Tables | Used by services |
|-----------|--------|-----------------|
| {domain} | {tables} | {services} |
| common | {shared tables} | all services |
## User Interface Topology
**Strategy:** {single monolithic / domain-based / service-based}
**Reasoning:** {why this topology was chosen}
## API Layer
**Decision:** {include / omit}
**Reasoning:** {why}
## Transaction Boundaries
| Workflow | Domains involved | Services | Transaction type | Notes |
|----------|-----------------|----------|:----------------:|-------|
| {workflow} | {domains} | {services} | ACID / BASE | {notes} |
## Architecture Quanta
**Count:** {number}
**Reasoning:** {what determines the quantum boundaries}
## Characteristic Fit
| Characteristic | Rating | Meets needs? |
|---------------|:------:|:------------:|
| Deployability | 4 | {Yes/No} |
| Fault tolerance | 4 | {Yes/No} |
| ... | ... | ... |
## Anti-Pattern Check
- [ ] Service count in 4-12 range
- [ ] No inter-service direct calls
- [ ] Database topology supports required ACID transactions
- [ ] Federated entity libraries (not single shared library)
- [ ] No premature database splitting
## Getting Started
1. {First step}
2. {Second step}
3. {Third step}
```
## Key Principles
- **Coarse-grained is the point** — Service-based architecture uses 4-12 domain services averaging ~7, NOT dozens of fine-grained microservices. The coarse granularity is what preserves ACID transactions, simplifies orchestration (internal class calls vs remote service calls), and keeps operational cost low. If you find yourself creating more than 12 services, you are building microservices without the infrastructure to support them.
- **Shared database is a feature, not a compromise** — The shared database is what enables SQL joins and ACID transactions across domains. This is the primary structural advantage over microservices. Don't split the database unless you have a proven, specific reason. Premature database splitting eliminates the core benefit of choosing service-based architecture in the first place.
- **Services should NOT call each other** — In service-based architecture, domain services are self-contained and do not communicate with each other directly. All orchestration happens either within a single service (internal) or through the UI/API layer. If two services need to coordinate frequently, they probably belong together as one service.
- **Internal orchestration over external orchestration** — A business request like "place an order" is received by the OrderService's API facade, which internally orchestrates all sub-operations (create order, apply payment, update inventory) through class-level calls within that single service. In microservices, this same operation would require external orchestration across multiple remote services. This internal orchestration is what makes service-based simpler and more reliable.
- **Logical partitioning controls blast radius** — Even with a shared database, use federated domain-scoped entity libraries rather than a single monolithic shared library. When a table in the "invoicing" domain changes, only the invoicing entity library and the services that use it need updating — not every service in the system. This is the pragmatic middle ground between monolithic coupling and full database separation.
- **Start shared, split only when proven necessary** — Begin with a shared database and logical partitioning. Only split into separate databases when you have concrete evidence that shared schema changes are causing deployment coordination problems, AND you have a plan for distributed transactions (SAGA) for any workflows that cross the split boundary.
## Examples
**Scenario: Electronic device recycling system**
Trigger: "We process old electronics (phones, tablets). Customers get quotes online, mail devices in, we assess them, pay the customer, then recycle or resell. We also have internal reporting."
Process: Identified 7 domain services from the business flow: Quoting, Item Status, Receiving, Assessment, Accounting, Recycling, Reporting. Split UI into two quanta: customer-facing (Quoting, Item Status) and internal operations (Receiving, Assessment, Accounting, Recycling, Reporting). Used two separate databases — one for customer-facing operations (higher security, separate network zone) and one for internal operations. Only Quoting and Item Status services need to scale (customer traffic), others run as single instances.
Output: **7 domain services, 2 architecture quanta, domain-partitioned databases** (2 databases split by security boundary, not by service). Customer-facing services behind a firewall separation from internal services. ACID transactions preserved within each database boundary. Assessment service changes frequently (new device rules) but is isolated, enabling high deployability.
**Scenario: Insurance claims processing platform**
Trigger: "We need claims intake, adjudication, payment, fraud detection, and policy verification. Team of 20 developers, currently a monolith with 4-hour deployments."
Process: Identified 6 domain services: Claims Intake, Adjudication, Payment, Fraud Detection, Policy Verification, Reporting. Kept shared database because claims workflows require ACID: a claim submission must atomically create the claim record, initiate fraud check, and verify policy status. Logically partitioned the database into 6 domains + common. Added API layer because external partners (repair shops, medical providers) submit claims via API. Single monolithic UI (all internal users share the same portal). Fraud Detection needs higher throughput, so it runs multiple instances with load balancing.
Output: **6 domain services, shared logically-partitioned database, API layer included.** Key win: deployment time drops from 4 hours to ~30 minutes per service. ACID preserved for claims workflows. Federated entity libraries prevent cascading redeployments from schema changes.
**Scenario: E-learning platform**
Trigger: "Building a learning management system with course catalog, enrollment, content delivery, progress tracking, assessments, and certificates. Team of 10, first distributed system."
Process: Identified 6 domain services: Course Catalog, Enrollment, Content Delivery, Progress Tracking, Assessment, Certification. Shared database — enrollment needs ACID with progress tracking (enrolling a student must atomically create progress records). No API layer needed (internal platform only). Single UI. Default single instances per service since traffic is predictable (students access during class hours). Kept Assessment and Certification in separate services despite being related because assessment rules change frequently (high deployability need) while certification is stable.
Output: **6 domain services, single shared database with logical partitioning, no API layer.** Team's first distributed system — service-based is ideal because it's the simplest distributed style (simplicity 3, cost 4) while still gaining independent deployability (4) and fault tolerance (4). If Content Delivery later needs CDN-level scaling, it can be extracted into a separate quantum.
## References
- For topology variant details and decision matrices, see [references/topology-variants.md](references/topology-variants.md)
- For architecture style comparison (service-based vs alternatives), use `architecture-style-selector`
- For identifying driving quality attributes, use `architecture-characteristics-identifier`
- For documenting the architecture decision, use `architecture-decision-record-creator`
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
Install related skills from ClawhHub:
- `clawhub install bookforge-architecture-characteristics-identifier`
Or install the full book set from GitHub: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/topology-variants.md
# Topology Variants — Service-Based Architecture
Service-based architecture is one of the most flexible architecture styles because multiple topology dimensions can be independently varied: user interface deployment, database partitioning, API layer inclusion, and service granularity. This reference covers the decision space for each dimension.
## User Interface Variants
### Single Monolithic UI
```
┌─────────────────────────────────┐
│ User Interface │
└──┬──────┬──────┬──────┬─────────┘
│ │ │ │
┌──▼──┐┌──▼──┐┌──▼──┐┌──▼──┐
│Svc A││Svc B││Svc C││Svc D│
└──┬──┘└──┬──┘└──┬──┘└──┬──┘
│ │ │ │
┌──▼──────▼──────▼──────▼──┐
│ Database │
└───────────────────────────┘
```
**When to use:**
- Single user group with uniform needs
- Small team maintaining one frontend
- Simplest deployment model
**Architecture quanta:** 1 (if shared DB) — all services share the same deployment and characteristic profile.
**Trade-offs:**
- (+) Simplest to build and deploy
- (+) Consistent UX across all domains
- (-) UI deployment blocks all domains
- (-) Cannot scale or secure frontend sections independently
### Domain-Based UIs
```
┌───────────────┐ ┌───────────────┐
│ UI: Customer │ │ UI: Internal │
└──┬─────┬──────┘ └──┬──────┬─────┘
│ │ │ │
┌──▼──┐┌─▼──┐ ┌────▼─┐┌───▼──┐┌────────┐
│Svc A││Svc B│ │Svc C ││Svc D ││Svc E │
└──┬──┘└──┬──┘ └──┬───┘└──┬───┘└──┬─────┘
│ │ │ │ │
┌──▼──────▼────────▼───────▼───────▼──┐
│ Database │
└──────────────────────────────────────┘
```
**When to use:**
- Different user groups (customers vs internal staff vs partners)
- Different security requirements per user group
- Different availability/scalability needs per user group
**Architecture quanta:** Can be >1 — each UI + its services can form a separate quantum, especially if databases are also split.
**Trade-offs:**
- (+) Independent deployment per user group
- (+) Different security zones (customer-facing behind DMZ)
- (+) Can scale customer-facing independently
- (-) More deployment units to manage
- (-) Potential code duplication in shared UI components
### Service-Based UIs (Micro-Frontends)
```
┌───────┐┌───────┐┌───────┐┌───────┐
│ UI A ││ UI B ││ UI C ││ UI D │
└──┬────┘└──┬────┘└──┬────┘└──┬────┘
│ │ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│Svc A│ │Svc B│ │Svc C│ │Svc D│
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │
┌──▼────────▼────────▼────────▼──┐
│ Database │
└─────────────────────────────────┘
```
**When to use:**
- Maximum frontend independence
- Teams organized around domain services
- Each domain needs different frontend technology or release cadence
**Architecture quanta:** Multiple — each UI-service pair is a potential quantum.
**Trade-offs:**
- (+) Maximum team autonomy per domain
- (+) Independent deployment and technology per UI
- (-) Most complex frontend orchestration
- (-) Consistent UX across domains is harder
- (-) Requires micro-frontend framework/approach
### UI Decision Matrix
| Factor | Single UI | Domain-Based | Service-Based |
|--------|:---------:|:------------:|:-------------:|
| Deployment simplicity | Best | Moderate | Most complex |
| Team autonomy | Low | Moderate | High |
| Security zoning | None | Good | Good |
| Independent scaling | No | Per-group | Per-service |
| UX consistency | Easiest | Moderate | Hardest |
| Recommended team size | <15 | 10-30 | 20+ |
## Database Variants
### Single Shared Database (Default)
```
┌──────┐┌──────┐┌──────┐┌──────┐
│Svc A ││Svc B ││Svc C ││Svc D │
└──┬───┘└──┬───┘└──┬───┘└──┬───┘
│ │ │ │
│ single_shared_lib │
│ (all entity objects) │
│ │ │ │
┌──▼───────▼───────▼───────▼──┐
│ Database │
│ (all tables, one schema) │
└──────────────────────────────┘
```
**When to use:** Starting point. Multiple services need SQL joins. ACID transactions span services.
**Risk:** A table change forces a shared library update -> redeployment of ALL services, even those that don't use the changed table. This is the **single shared entity library anti-pattern**.
### Logically Partitioned Database (Recommended)
```
┌──────────┐┌──────────┐┌──────────┐┌──────────┐
│ Svc A ││ Svc B ││ Svc C ││ Svc D │
│ ││ ││ ││ │
│ a_ent_lib││ b_ent_lib││ c_ent_lib││ d_ent_lib│
│ common ││ common ││ a_ent_lib││ common │
│ ││ ││ common ││ │
└──┬───────┘└──┬───────┘└──┬───────┘└──┬───────┘
│ │ │ │
┌──▼───────────▼───────────▼───────────▼──┐
│ Database │
│ ┌───────┐┌───────┐┌───────┐┌──────────┐│
│ │ dom_a ││ dom_b ││ dom_c ││ common ││
│ └───────┘└───────┘└───────┘└──────────┘│
└──────────────────────────────────────────┘
```
**When to use:** Shared database benefits needed + want to control change blast radius.
**How it works:**
1. Database tables are logically grouped into domain partitions
2. Each domain partition has its own entity library (e.g., `invoicing_entities_lib`)
3. A `common_entities_lib` contains shared tables used by all services
4. Services include only the entity libraries they need
5. When a table changes, only the corresponding entity library is updated -> only services using that library need redeployment
**Common table management:** Lock common entity objects in version control. Restrict change access to the database team. Changes to common tables require coordination across all services.
**Best practice:** Make logical partitions as fine-grained as possible while maintaining well-defined data domains.
### Domain-Partitioned Databases
```
┌──────┐┌──────┐ ┌──────┐┌──────┐
│Svc A ││Svc B │ │Svc C ││Svc D │
└──┬───┘└──┬───┘ └──┬───┘└──┬───┘
│ │ │ │
┌──▼───────▼──┐ ┌──▼───────▼──┐
│ Database 1 │ │ Database 2 │
│ (domain AB) │ │ (domain CD) │
└─────────────┘ └─────────────┘
```
**When to use:**
- Clear domain groups with no cross-group joins needed
- Security boundaries require separate databases (e.g., customer-facing vs internal)
- Different backup/recovery requirements per domain group
**Critical check before splitting:** Verify that NO business workflows require ACID transactions across the database boundary. If they do, you need SAGA pattern for those workflows.
### Per-Service Databases
```
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Svc A │ │Svc B │ │Svc C │ │Svc D │
└──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘
│ │ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│DB A │ │DB B │ │DB C │ │DB D │
└─────┘ └─────┘ └─────┘ └─────┘
```
**When to use:** Each service's data is truly independent. No cross-service queries. Team is ready for eventual consistency and SAGA pattern.
**Warning:** This approaches microservices territory. If you need per-service databases, evaluate whether you should be doing microservices instead (with the full operational investment that requires).
### Database Decision Matrix
| Factor | Single shared | Logically partitioned | Domain-partitioned | Per-service |
|--------|:------------:|:--------------------:|:------------------:|:-----------:|
| ACID across services | Yes | Yes | Partial | No |
| SQL joins across domains | Yes | Yes | Within group only | No |
| Schema change blast radius | All services | Domain-scoped | Group-scoped | Single service |
| Operational complexity | Lowest | Low | Moderate | Highest |
| Data isolation | None | Logical | Physical (partial) | Physical (full) |
| SAGA required | No | No | For cross-group ops | For cross-service ops |
## API Layer Variant
### Without API Layer (Direct Access)
```
┌─────────────────────────────┐
│ User Interface │
│ (service locator / proxy) │
└──┬──────┬──────┬──────┬─────┘
│ │ │ │
┌──▼──┐┌──▼──┐┌──▼──┐┌──▼──┐
│Svc A││Svc B││Svc C││Svc D│
└─────┘└─────┘└─────┘└─────┘
```
UI embeds a service locator pattern, API gateway, or proxy to route requests to services.
### With API Layer
```
┌─────────────────────────────┐
│ User Interface │
└─────────────┬───────────────┘
│
┌─────────────▼───────────────┐
│ API Layer (Proxy/Gateway) │
│ (security, metrics, audit) │
└──┬──────┬──────┬──────┬─────┘
│ │ │ │
┌──▼──┐┌──▼──┐┌──▼──┐┌──▼──┐
│Svc A││Svc B││Svc C││Svc D│
└─────┘└─────┘└─────┘└─────┘
```
**Benefits of API layer:**
- Centralized cross-cutting concerns (security, metrics, auditing, rate limiting)
- Service discovery and routing
- External consumer access management
- API versioning and transformation
**Cost of API layer:**
- Additional deployment unit
- Additional network hop (latency)
- Potential single point of failure
- More infrastructure to maintain
## ACID vs BASE Transaction Decision
### ACID (Within Service or Shared Database)
Use ACID when the business operation must be all-or-nothing and the database topology supports it.
**Example — Order Checkout (single service with shared DB):**
1. OrderService receives checkout request
2. Within a single database transaction:
- Create order record
- Apply payment
- Update inventory
- Generate invoice
3. If payment fails -> automatic rollback of all changes
4. Customer sees consistent state immediately
This is the service-based architecture advantage: the coarse-grained OrderService handles the entire workflow internally.
### BASE (Across Separate Databases)
Required when business operations cross database boundaries. Uses SAGA pattern.
**Example — Same Order Checkout if databases were split:**
1. OrderService creates order (commits to Order DB)
2. OrderService sends message to PaymentService
3. PaymentService attempts payment (commits to Payment DB)
4. If payment fails:
- PaymentService sends "payment failed" event
- OrderService must execute compensating transaction (delete order)
- InventoryService must execute compensating transaction (restore inventory)
5. Customer may see temporarily inconsistent state
**Key insight:** If you find yourself implementing SAGA for many core business workflows in a service-based architecture, your services may be too fine-grained or your databases may be split prematurely. Consider merging services or consolidating databases to restore ACID.
## Service Granularity Guidelines
| Service count | Assessment | Action |
|:------------:|-----------|--------|
| 1-2 | Not service-based | Too few services; consider modular monolith instead |
| 3 | Borderline | Barely enough decomposition; validate benefits outweigh distribution costs |
| 4-7 | Sweet spot | Good balance of independence and simplicity |
| 8-12 | Acceptable upper range | Verify each service represents a distinct domain; watch for drift toward microservices |
| 13-20 | Warning zone | You may be building microservices without the infrastructure; consider merging related services |
| 20+ | Not service-based | This is microservices; invest in full microservices operational infrastructure |
## Communication Protocols
Service-based architecture typically uses synchronous communication from the UI to services:
| Protocol | When to use |
|----------|------------|
| **REST** | Default choice. Simple, well-understood, good tooling |
| **gRPC** | High-throughput internal communication, binary efficiency needed |
| **Messaging** | Asynchronous operations, event notifications between UI and services |
| **SOAP** | Legacy integration, WS-* standards required |
**Important:** Communication is UI-to-service or API-layer-to-service. Services should NOT communicate directly with each other in service-based architecture.
Plan and facilitate collaborative risk storming sessions for architecture teams. Use this skill whenever the user wants to run a risk identification workshop...
---
name: risk-storming-facilitator
description: Plan and facilitate collaborative risk storming sessions for architecture teams. Use this skill whenever the user wants to run a risk identification workshop, organize a risk storming exercise, plan a collaborative risk assessment session, facilitate architecture risk discovery with a team, prepare a risk workshop agenda, coordinate group risk identification, or run a team-based architecture risk review. Also triggers when the user mentions "risk storming," "collaborative risk session," "team risk workshop," "group risk identification," wants to prepare pre-work materials for a risk meeting, or asks "how should I run a risk session with my team?" — even if they don't use the exact term "risk storming."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/risk-storming-facilitator
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [20]
tags: [software-architecture, architecture, risk, risk-storming, facilitation, collaboration, governance]
depends-on:
- architecture-risk-assessor
execution:
tier: 1
mode: plan-only
inputs:
- type: none
description: "Architecture context from the user — system description, team composition, and risk concerns. Agent produces the facilitation plan; human runs the actual session."
tools-required: [Read, Write]
tools-optional: [Grep, Glob]
mcps-required: []
environment: "Any agent environment. Agent produces facilitation artifacts; human executes the session."
---
# Risk Storming Facilitator
## When to Use
You need to prepare and facilitate a collaborative risk identification session with your architecture team. This is a PLAN-ONLY skill — the agent creates all facilitation materials and the human runs the actual session.
Typical triggers:
- The user wants to run a risk storming session with their team before a launch or major milestone
- The user adopted a new technology and wants the team to collaboratively identify risks
- The user needs to assess risks across a specific dimension (availability, performance, security, etc.) with multiple stakeholders
- The user wants to structure a risk workshop and needs an agenda, pre-work materials, and discussion guide
- The user is an architect preparing for a collaborative risk review with senior developers and tech leads
Before starting, verify:
- Is there an architecture diagram or system description available? (Risk storming requires a visual representation of the architecture)
- Has the user identified which risk dimension to focus on? (Sessions work best with ONE dimension at a time)
- Who will participate? (Should include architects, senior developers, AND tech leads — not just architects)
## Context
### Required Context (must have before proceeding)
- **Architecture description or diagram:** What system is being assessed? What are the components, services, and their relationships?
-> Check prompt for: service names, architecture diagrams, component descriptions, technology stack
-> Check environment for: architecture docs, C4 diagrams, docker-compose files, README files with system overviews
-> If still missing, ask: "Can you describe the architecture you want to risk-storm? I need at minimum the major components/services and how they connect."
- **Risk dimension to focus on:** Which area of risk should the session address?
-> Check prompt for: mentions of availability, performance, scalability, security, data loss, single points of failure, unproven technology
-> If still missing, ask: "Which risk dimension should this session focus on? Common choices: (a) availability, (b) performance, (c) scalability, (d) security, (e) data loss/integrity, (f) unproven technology, (g) single points of failure. I recommend ONE dimension per session for focused results."
### Observable Context (gather from environment if available)
- **Team composition:** Who will participate?
-> Check prompt for: team size, role descriptions, mentions of developers, tech leads, architects
-> If unavailable: default to "the architecture team" and recommend including senior developers and tech leads
- **Technology stack:** What technologies are in use?
-> Look for: package.json, requirements.txt, Dockerfile, infrastructure configs
-> If unavailable: rely on user description
- **Previous risk assessments:** Has the system been risk-assessed before?
-> Look for: risk reports, incident history, post-mortems
-> If unavailable: treat as first risk storming session
### Default Assumptions
- If no risk dimension specified -> recommend starting with the dimension most relevant to the user's stated concerns
- If no participants listed -> recommend 4-8 participants including at least 1 architect, 2+ senior developers, and 1+ tech lead
- If no architecture diagram exists -> recommend creating one before the session (risk storming requires a visual artifact)
- If session format not specified -> default to in-person with physical Post-it notes; provide virtual alternative
### Sufficiency Threshold
```
SUFFICIENT when ALL of these are true:
- Architecture description with identifiable components is known
- Risk dimension for the session is selected
- Participant roles are known or can be defaulted
PROCEED WITH DEFAULTS when:
- Architecture description is known
- Risk dimension can be inferred from the user's concerns
- Team details can use reasonable defaults
MUST ASK when:
- No architecture description exists (cannot risk-storm without knowing what to assess)
- The user's concern is too vague to select a risk dimension
- The user wants to cover "all risks" in one session (this is an anti-pattern; redirect to single dimension)
```
## Process
### Step 1: Select the Risk Dimension
**ACTION:** Help the user choose ONE risk dimension for the session. Common dimensions:
- **Unproven technology** — new or unfamiliar technologies in the stack
- **Performance** — latency, throughput, response time under load
- **Scalability** — ability to handle increased load (horizontal/vertical)
- **Availability** — uptime, resilience to failures, including transitive dependencies
- **Data loss/integrity** — risk of losing, corrupting, or leaking data
- **Single points of failure** — components whose failure takes down the entire system
- **Security** — unauthorized access, data breaches, compliance violations
**WHY:** Restricting each session to a single dimension produces dramatically better results than trying to assess everything at once. When participants evaluate multiple dimensions simultaneously, they lose focus, conflate different risk types, and produce shallow analysis. A focused session on availability reveals risks that a broad "assess everything" session misses entirely. Run separate sessions for separate dimensions.
**IF** the user already specified a dimension -> confirm and proceed
**IF** the user wants multiple dimensions -> plan separate sessions (one per dimension), prioritize the most urgent first
**IF** the user says "I don't know which" -> ask about recent incidents, upcoming launches, or technology changes to identify the most pressing dimension
### Step 2: Map the Architecture Components
**ACTION:** Enumerate all architecture components that participants will assess. Create a clear component list with brief descriptions.
**WHY:** The component list defines WHAT gets risk-assessed. Missing a component means missing its risks entirely. The list also becomes the basis for the architecture diagram that participants will annotate with Post-it notes during the session.
**IF** an architecture diagram exists -> extract components from it
**IF** a codebase is available -> scan for service boundaries, deployables, infrastructure components
**ELSE** -> ask the user to enumerate components
**HANDOFF TO HUMAN** -- The agent produces the component list; the human validates it is complete and accurate before using it in the session.
### Step 3: Identify Participants
**ACTION:** Build a participant list. The session MUST include people beyond just architects:
- **Architects** (1-2): Provide the high-level structural perspective
- **Senior developers** (2-4): Provide implementation-level risk knowledge that architects miss
- **Tech leads** (1-2): Bridge the gap between architecture vision and implementation reality
**WHY:** No single architect can identify all risks. Senior developers see implementation-level risks that architects overlook — like a developer who rates Redis cache as risk 9 because they've never used it, revealing a critical unknown-technology risk the architect missed. Tech leads understand operational constraints that change risk profiles. The diversity of perspectives IS the point of risk storming.
**IF** the user already has a participant list -> validate it includes developers, not just architects
**IF** the team is small (< 4) -> the session can still work, but note that fewer perspectives means fewer risks discovered
### Step 4: Prepare Pre-Work Materials
**ACTION:** Generate the following materials to send to participants 1-2 days before the collaborative session:
1. **Session invitation** containing:
- Architecture diagram (or link to where it's stored)
- The ONE risk dimension being assessed
- Date, time, and location (physical or virtual)
- Brief explanation of the risk matrix (impact x likelihood, 1-9 scale)
- Instructions for individual risk identification (Phase 1)
2. **Risk matrix reference card:**
- Impact: Low (1), Medium (2), High (3)
- Likelihood: Low (1), Medium (2), High (3)
- Score = Impact x Likelihood
- Color coding: 1-2 green (low), 3-4 yellow (medium), 6-9 red (high)
- Critical rule: unproven/unknown technology = automatic 9
3. **Individual assessment worksheet** (one per participant):
- Architecture component list
- For each component: assess impact and likelihood for the chosen risk dimension
- Record the composite score (impact x likelihood)
- Prepare a Post-it note with the color matching the score (green/yellow/red) and the score number written on it
**WHY:** Sending materials 1-2 days ahead gives participants time to individually analyze the architecture without group influence. The noncollaborative nature of Phase 1 is essential — it prevents anchoring bias where one vocal participant's assessment dominates everyone's thinking. Individual assessment first, THEN collaborative discussion, produces the richest set of risks.
**HANDOFF TO HUMAN** -- The agent generates all pre-work documents; the human sends them to participants.
### Step 5: Create the Session Agenda
**ACTION:** Produce a structured agenda for the collaborative session (Phases 2 and 3):
```
RISK STORMING SESSION AGENDA
Dimension: {selected dimension}
Duration: 60-90 minutes
PHASE 2: CONSENSUS (30-40 minutes)
[00:00-05:00] Opening — restate the risk dimension and ground rules
[05:00-15:00] Post-it placement — each participant places their
color-coded Post-it notes on the architecture diagram
where they identified risk
[15:00-35:00] Disagreement discussion — focus on areas where
participants DIFFER in their ratings:
- Where one person sees high risk and another sees none
- Where scores differ by 2+ points
- Single-person risks (only one person identified it)
[35:00-40:00] Consolidation — agree on final ratings for each
risk area
PHASE 3: MITIGATION (20-40 minutes)
[40:00-55:00] Mitigation brainstorm — for each high-risk area (6-9),
identify architecture changes that would reduce the risk
[55:00-70:00] Cost negotiation — estimate the cost of each mitigation
and determine if the risk reduction justifies the cost
[70:00-80:00] Action items — assign owners and deadlines for
agreed mitigations
[80:00-90:00] Wrap-up — summarize findings, schedule next risk
storming session (different dimension)
```
**WHY:** The agenda front-loads disagreement discussion because that is where the highest-value insights emerge. When two participants rate the same component differently, the ensuing discussion reveals knowledge that neither participant had alone — like the developer who rates Redis as risk 9 because they've never used it, while the architect rated it as risk 1 assuming everyone knew Redis. The disagreement IS the insight.
### Step 6: Create the Discussion Guide
**ACTION:** Produce a facilitation guide with specific questions to drive productive disagreement discussion:
**For areas where ratings differ:**
- "You rated this as {high} while you rated it as {low}. Can each of you explain your reasoning?"
- "What information or experience led you to that score?"
- "Is there something about the implementation that changes the risk assessment?"
**For areas where only one person identified risk:**
- "You're the only one who flagged this. What do you see that the rest of us might be missing?"
- "Have you had direct experience with this type of risk?"
**For unproven technology:**
- "Has anyone on the team used {technology} in a production system before?"
- "If no one has production experience, this is automatically rated 9 per the unknown-technology rule."
**For the mitigation phase:**
- "What architecture change would reduce this risk score from {current} to {target}?"
- "What would that change cost in terms of money, time, or complexity?"
- "Is the risk reduction worth the cost? If not, what is a cheaper alternative that partially mitigates the risk?"
**WHY:** Facilitators often struggle to drive productive discussion. These prepared questions prevent the session from becoming either a silent agreement fest or an unfocused debate. The questions are designed to surface the knowledge gaps between participants — which is exactly where undiscovered risks hide.
**HANDOFF TO HUMAN** -- The agent produces the discussion guide; the human uses it to facilitate the actual session.
### Step 7: Create the Mitigation Template
**ACTION:** Produce a template for documenting mitigations discovered during Phase 3:
```markdown
## Risk Mitigation Record
### Risk Area: {component} - {dimension}
- **Consensus Risk Score:** {score} ({impact} x {likelihood})
- **Identified by:** {participant names}
- **Rationale:** {why this risk level}
### Proposed Mitigation
- **Change:** {specific architecture change}
- **Expected post-mitigation score:** {new score}
- **Estimated cost:** {money, time, complexity}
- **Alternative (if cost rejected):** {cheaper partial mitigation}
- **Alternative cost:** {reduced cost}
- **Owner:** {who will implement}
- **Deadline:** {when}
- **Status:** Proposed / Approved / Implemented
```
**WHY:** Mitigations without documentation become forgotten promises. The template captures not just the mitigation but the cost negotiation — which is critical because stakeholders often reject the first proposed mitigation as too expensive. Having a documented alternative at a lower cost point (like splitting a database without clustering, reducing cost from $20,000 to $8,000) gives the architect negotiating flexibility.
**HANDOFF TO HUMAN** -- The agent produces the template; the human fills it in during the session.
### Step 8: Compile the Complete Facilitation Package
**ACTION:** Assemble all artifacts into a single facilitation package:
1. Session invitation with architecture diagram and risk dimension
2. Risk matrix reference card
3. Individual assessment worksheets
4. Session agenda
5. Discussion guide with prepared questions
6. Mitigation record template
7. Recommendation for the NEXT session (different risk dimension)
**WHY:** A complete package lets the facilitator focus on running the session rather than scrambling for materials. The recommendation for the next session reinforces that risk storming is continuous — not a one-time event. Each dimension reveals different risks, and the architecture should be re-stormed after major changes or at regular intervals.
For the detailed facilitation protocol with timing and role assignments, see [references/facilitation-protocol.md](references/facilitation-protocol.md).
## Inputs
- Architecture description or diagram with identifiable components
- Risk dimension to focus on (or user concerns to derive one from)
- Participant list or team description (roles and approximate size)
- Optionally: previous risk assessments, incident history, technology stack details
## Outputs
### Risk Storming Facilitation Package
The agent produces all of the following artifacts; the human uses them to run the actual session:
1. **Pre-work materials** — invitation, risk matrix card, individual worksheets
2. **Session agenda** — timed agenda for the 60-90 minute collaborative session
3. **Discussion guide** — prepared questions for driving productive disagreement discussion
4. **Mitigation template** — structured template for documenting mitigations and cost negotiations
5. **Next steps recommendation** — which dimension to storm next, when to re-storm
## Key Principles
- **One dimension per session** -- Assessing multiple risk dimensions in a single session dilutes focus and produces shallow results. A dedicated availability session reveals risks that a broad "assess all risks" session misses entirely. If the user wants to cover multiple dimensions, plan separate sessions and prioritize the most urgent first.
- **Disagreements are the insight, not the problem** -- When participants rate the same component differently, that disagreement reveals knowledge asymmetry — one person knows something the others don't. The facilitator's job is to mine these disagreements, not to resolve them quickly. Spend 60% of consensus time on areas where ratings differ.
- **Include developers, not just architects** -- Senior developers see implementation-level risks that architects miss. A developer who rates an unfamiliar technology as risk 9 reveals a critical unknown that the architect assumed everyone knew. Tech leads bridge the gap between design intent and operational reality. The diversity of perspectives is the entire point.
- **Unknown technology = automatic risk 9** -- For any technology that the team hasn't used in production, always assign the highest risk score (9). The risk matrix cannot be applied meaningfully because the team cannot assess likelihood for something they've never operated. This rule prevents the systematic underestimation of unfamiliar technology risk.
- **Mitigation requires cost negotiation** -- Every mitigation costs something. When stakeholders reject the first proposal as too expensive, have a cheaper alternative ready that partially mitigates the risk. The goal is risk reduction the business can afford, not perfect risk elimination at any cost.
- **Risk storming is continuous** -- A single session assesses one dimension. The architecture should be re-stormed after major changes, technology adoptions, or at regular intervals (e.g., quarterly). Each session reveals different risks and progressively improves the architecture.
## Examples
**Scenario: Pre-launch availability risk storming for microservices payment system**
Trigger: "We're about to launch a new microservices payment system. I want to do a risk assessment with my team focused on availability."
Process: Selected availability as the risk dimension. Mapped 6 services (API gateway, payment processor, notification service, user service, audit logger, database cluster). Recommended participants: lead architect, 3 senior backend developers, platform tech lead, SRE lead. Generated pre-work package with architecture diagram, risk matrix reference, and individual worksheets. Created 75-minute agenda with 35 minutes for consensus (focusing on where ratings diverge) and 30 minutes for mitigation planning. Prepared discussion questions targeting single points of failure, database availability, and third-party payment provider SLAs. Created mitigation template with cost negotiation section.
Output: Complete facilitation package with 7 artifacts. Recommended follow-up sessions for performance and security dimensions.
**Scenario: Unproven technology risk storming for Kafka adoption**
Trigger: "Our team just adopted Kafka for event-driven architecture. Nobody has used Kafka in production before. I need to facilitate a risk session with 4 senior devs and 2 tech leads."
Process: Selected unproven technology as the risk dimension (automatically the highest priority when the team has zero production experience). Mapped architecture components that interact with Kafka (event producers, consumers, topic management, schema registry, dead letter queues). Noted that per the unknown-technology rule, ALL Kafka-related components receive automatic risk score 9. Generated pre-work emphasizing that participants should identify specific areas where Kafka's behavior is unknown to them. Created agenda with extended discussion time (45 minutes) since disagreements will center on "what we don't know we don't know." Prepared questions focused on production operation gaps: "Who will handle partition rebalancing at 3am?" and "What happens when a consumer falls behind?" Created mitigation template emphasizing de-risking steps: PoC with production load, vendor support contracts, team training.
Output: Complete facilitation package tuned for unproven-technology sessions. Flagged that every Kafka component starts at risk 9 and mitigations focus on de-risking through knowledge and operational readiness.
**Scenario: Performance risk storming for high-throughput API gateway**
Trigger: "We have performance concerns with our API gateway handling 10,000 req/s. I need to run a risk session with the platform team of 8 people."
Process: Selected performance as the risk dimension. Mapped API gateway subcomponents (load balancer, rate limiter, auth middleware, request router, response cache, logging pipeline, upstream service connections). Recommended splitting the 8-person team into the full session since all are relevant. Generated pre-work with current performance baselines (if available) alongside the architecture diagram. Created 90-minute agenda with 40 minutes for consensus and 40 minutes for mitigation, given the technical depth needed. Prepared discussion questions targeting bottleneck identification: "At 10,000 req/s, which component hits its limit first?" and "What happens to latency when the response cache misses?" Created mitigation template with performance-specific fields: current throughput, target throughput, expected improvement per mitigation.
Output: Complete facilitation package for performance-focused session. Recommended load testing validation after implementing mitigations.
## References
- For the detailed three-phase facilitation protocol with timing, role assignments, and virtual session adaptations, see [references/facilitation-protocol.md](references/facilitation-protocol.md)
- For risk matrix scoring details, invoke the `architecture-risk-assessor` skill
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
Install related skills from ClawhHub:
- `clawhub install bookforge-architecture-risk-assessor`
Or install the full book set from GitHub: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/facilitation-protocol.md
# Risk Storming Facilitation Protocol
> Detailed protocol for running the three-phase risk storming exercise.
> Read this when you need timing details, role assignments, or virtual session adaptations.
## Overview
Risk storming has three phases executed in strict order:
| Phase | Name | Mode | Duration | Purpose |
|-------|------|------|----------|---------|
| 1 | Identification | Individual (noncollaborative) | 1-2 days before session | Each participant independently rates risks |
| 2 | Consensus | Collaborative (all together) | 30-40 minutes | Align on risk ratings through disagreement discussion |
| 3 | Mitigation | Collaborative (all together) | 20-40 minutes | Identify architecture changes and negotiate costs |
**Total session time:** 60-90 minutes (Phases 2 and 3 only; Phase 1 is async pre-work)
## Phase 1: Identification (Pre-Session, Noncollaborative)
### Timing
- Send invitation **1-2 days** before the collaborative session
- Participants complete individual assessment **before** arriving at the session
### What the Facilitator Sends
1. **Architecture diagram** — a clear visual of the system's components and connections. Use the most current version. For in-person sessions, print a large version (A1/A0) for wall posting. For virtual sessions, share a digital version that participants can annotate.
2. **Risk dimension** — the ONE dimension being assessed. State it explicitly:
- "This session focuses on **availability** risk"
- NOT "assess all the risks you can think of"
3. **Risk matrix reference:**
```
Likelihood of risk occurring
Low(1) Med(2) High(3)
Impact Low(1) | 1 | 2 | 3 |
Med(2) | 2 | 4 | 6 |
High(3) | 3 | 6 | 9 |
1-2 = Low risk (GREEN Post-it)
3-4 = Medium risk (YELLOW Post-it)
6-9 = High risk (RED Post-it)
RULE: Unknown/unproven technology = automatic 9
```
4. **Individual assessment instructions:**
- Review the architecture diagram
- For each component, assess impact and likelihood for the given risk dimension
- Calculate the composite score (impact x likelihood)
- Prepare Post-it notes: one per risk area identified
- Write the score on the Post-it and use the matching color (green/yellow/red)
- If multiple dimensions are being assessed in one session (not recommended), write the dimension name next to the score
### Why Noncollaborative First
The individual phase MUST be noncollaborative to prevent:
- **Anchoring bias** — one vocal person's assessment dominates everyone's thinking
- **Groupthink** — participants converge on "safe" ratings to avoid conflict
- **Knowledge hiding** — junior developers may not speak up in front of senior architects
When each person arrives with their own independent assessment, the differences between assessments become the most valuable data in the session.
## Phase 2: Consensus (Collaborative, 30-40 Minutes)
### Setup (5 minutes)
- Post the architecture diagram on the wall (or display on a large screen)
- Restate the risk dimension: "Today we're focusing on {dimension}"
- Ground rules:
- Every Post-it goes on the diagram, no matter how "wrong" it might seem
- We discuss disagreements, not agreements
- No rank pulls — a developer's risk assessment is as valid as an architect's
### Post-it Placement (10 minutes)
- Each participant places their Post-it notes on the architecture diagram at the location where they identified risk
- For virtual sessions: the facilitator collects each participant's risks and places them on the shared digital diagram
- Do NOT discuss yet — just place
### Reading the Diagram
After placement, the facilitator identifies areas of interest:
| Pattern | What it means | Action |
|---------|---------------|--------|
| Multiple Post-its, same color, same area | Agreement — everyone sees the same risk | Brief confirmation, move on quickly |
| Multiple Post-its, DIFFERENT colors, same area | Disagreement — someone knows something others don't | **This is where you spend time** |
| Single Post-it on an area | One person sees a risk no one else identified | Explore — this person may have unique knowledge |
| No Post-its on an area | Everyone agrees there's no significant risk | Note and move on |
### Disagreement Discussion (15-20 minutes)
This is the highest-value part of the entire exercise. Spend **60% of consensus time** here.
**For each disagreement area, ask:**
1. Start with the outlier: "You rated {component} as {high risk}. The rest of us rated it {low/medium}. What do you see that we're missing?"
2. Probe for experience: "Have you encountered this type of failure before? In what context?"
3. Check for unknown unknowns: "Is there something about {component}'s implementation that changes the risk picture?"
4. Resolve: after discussion, the group agrees on a consolidated rating
**Example from the book (ELB disagreement):**
- Two participants: medium risk (3) for the Elastic Load Balancer
- One participant: high risk (6)
- The outlier explains: "If the ELB goes down, the ENTIRE system is inaccessible"
- Group agrees: yes, impact is high (3), but likelihood is low (1) because ELBs are highly available
- Consolidated rating: medium (3)
- Insight gained: the third participant revealed an availability concern the others hadn't considered
**Example from the book (Redis cache, unknown technology):**
- One participant: high risk (9) for Redis cache
- All other participants: no risk identified
- The outlier explains: "What is Redis? I've never heard of it."
- Per the unknown-technology rule: automatic 9
- Insight: this reveals a team knowledge gap that IS a risk, regardless of Redis's actual reliability
### Consolidation (5 minutes)
- Remove duplicate Post-its
- Replace disagreement clusters with a single Post-it showing the agreed score
- The final diagram should have one Post-it per risk area with the consensus score
## Phase 3: Mitigation (Collaborative, 20-40 Minutes)
### Mitigation Brainstorm (15-20 minutes)
For each high-risk area (score 6-9), the group identifies architecture changes that would reduce the risk.
**Key questions:**
- "What architecture change would reduce this from {current score} to an acceptable level?"
- "Can we reduce the impact, the likelihood, or both?"
- "Are there proven patterns that address this type of risk?" (e.g., circuit breakers for availability, message queues for throughput)
**Common mitigation patterns by dimension:**
| Dimension | Common mitigations |
|-----------|-------------------|
| Availability | Database clustering, service replication, circuit breakers, fallback services, SLA/SLO verification for external dependencies |
| Performance | Caching layers, async processing, queue-based load leveling, CDN, connection pooling |
| Scalability | Horizontal scaling, event-driven decoupling, database sharding, read replicas |
| Security | API gateway splitting by role, network segmentation, encryption at rest/in transit, authentication/authorization layers |
| Data integrity | Database replication, write-ahead logging, idempotent operations, backup strategies |
| Unproven tech | PoC with production load, team training, vendor support, rollback strategy, parallel run with proven alternative |
### Cost Negotiation (10-15 minutes)
Every mitigation has a cost. The facilitator helps the group negotiate:
1. **Present the full mitigation:** "Database clustering plus splitting into separate physical databases would reduce risk from 6 to 2. Estimated cost: $20,000."
2. **If stakeholder rejects:** "What about splitting the database without clustering? Cost drops to $8,000, and we still mitigate most of the risk (from 6 to 3)."
3. **Document the trade-off:** Record both options, their costs, and the risk reduction each provides.
**Why this matters:** The book's database clustering example shows the real-world negotiation: $20,000 for full mitigation was rejected, but $8,000 for partial mitigation was accepted. Having a cheaper alternative ready is a facilitation skill that prevents "all or nothing" outcomes.
### Action Items (5 minutes)
- Each agreed mitigation gets an owner and a deadline
- Document in the mitigation record template
- Schedule the next risk storming session (different dimension)
## Virtual Session Adaptations
When running risk storming remotely:
| In-Person | Virtual Equivalent |
|-----------|-------------------|
| Large printed architecture diagram on wall | Shared digital diagram (Miro, FigJam, Lucidchart) |
| Physical Post-it notes (green/yellow/red) | Digital sticky notes with color coding |
| Participants walk up and place Post-its | Facilitator collects risks from each participant and places them, OR participants annotate directly |
| Verbal discussion around the diagram | Video call with screen sharing |
| Writing on Post-its during discussion | Updating digital stickies in real-time |
**Critical adaptation:** In virtual sessions, the facilitator must be more active in soliciting input from quiet participants. The "walk up and place Post-its" physical action forces participation; digital sessions need the facilitator to call on each person directly.
## Timing Guide by Team Size
| Team Size | Phase 2 (Consensus) | Phase 3 (Mitigation) | Total |
|-----------|---------------------|----------------------|-------|
| 3-4 people | 25 minutes | 20 minutes | 45 minutes |
| 5-6 people | 35 minutes | 25 minutes | 60 minutes |
| 7-8 people | 40 minutes | 30 minutes | 70 minutes |
| 9+ people | 45 minutes | 35 minutes | 80 minutes |
For teams larger than 8, consider splitting into two parallel sessions with different facilitators, then merging findings.
## Common Facilitation Mistakes
| Mistake | Why it fails | Correction |
|---------|-------------|------------|
| Assessing multiple dimensions in one session | Participants lose focus, conflate risk types | One dimension per session |
| Skipping Phase 1 (individual assessment) | Anchoring bias dominates, fewer risks discovered | Always do individual assessment BEFORE the collaborative session |
| Spending equal time on agreements and disagreements | Agreements don't reveal new information | Spend 60% of time on disagreements |
| Only inviting architects | Misses implementation-level risks | Include senior developers and tech leads |
| Treating it as a one-time event | Architecture risk changes continuously | Schedule recurring sessions, especially after major changes |
| Proposing only one mitigation option | Stakeholders reject expensive options and nothing happens | Always have a cheaper alternative ready |
Assess code modularity health using quantitative metrics — cohesion (LCOM), coupling (afferent/efferent), abstractness, instability, distance from main seque...
---
name: modularity-health-evaluator
description: Assess code modularity health using quantitative metrics — cohesion (LCOM), coupling (afferent/efferent), abstractness, instability, distance from main sequence, and connascence taxonomy. Use this skill whenever the user asks about module quality, code coupling analysis, cohesion measurement, class decomposition, package dependency analysis, LCOM scores, afferent/efferent coupling, connascence, zone of pain, zone of uselessness, extracting microservices from a monolith, evaluating module boundaries, dependency analysis, or wants to know if a class or package is well-structured — even if they don't use the term "modularity."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/modularity-health-evaluator
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [3]
tags: [software-architecture, modularity, cohesion, coupling, connascence, LCOM, metrics, refactoring]
depends-on: []
execution:
tier: 2
mode: hybrid
inputs:
- type: codebase
description: "A software project or module descriptions to analyze for modularity health"
- type: none
description: "Alternatively, a textual description of classes, packages, and their dependencies"
tools-required: [Read, Write]
tools-optional: [Grep, Glob, Bash]
mcps-required: []
environment: "Best results inside a codebase directory. Can also work from user-provided descriptions."
---
# Modularity Health Evaluator
## When to Use
You need to assess the structural health of code modules — classes, packages, components, or services — using quantitative modularity metrics. Typical triggers:
- The user has a class with too many methods or responsibilities and wants to evaluate it
- The user has a utility package that everything depends on and wants to understand the risk
- The user is planning to extract microservices and needs to evaluate which modules are cleanly bounded
- The user mentions coupling, cohesion, or dependency problems
- The user sees cascading breakage when changing one module and wants to diagnose why
- The user wants to evaluate whether a codebase is ready for architectural migration
Before starting, verify:
- Is there a specific module, class, or package to evaluate? (At minimum, a description of the component and its dependencies)
- Does the user have access to dependency analysis tooling, or will this be a manual/descriptive assessment?
## Context & Input Gathering
### Required Context (must have before proceeding)
- **Target module(s):** What class, package, or component needs evaluation?
-> Check prompt for: class names, package names, module descriptions, dependency complaints
-> Check environment for: src/ directories, package structures, build files
-> If still missing, ask: "Which specific class, package, or module would you like me to evaluate for modularity health?"
- **Dependency information:** What depends on this module, and what does it depend on?
-> Check prompt for: import lists, dependency descriptions, "everything depends on X" statements
-> Check environment for: import statements, package.json, pom.xml, go.mod, build.gradle
-> If still missing, ask: "Can you describe what other modules depend on this one (incoming), and what this module depends on (outgoing)?"
### Observable Context (gather from environment if available)
- **Codebase structure:** How is the code organized?
-> Look for: directory structure, package naming conventions, module boundaries
-> If unavailable: rely on user description
- **Class/method details:** Method counts, field usage, abstract vs concrete elements
-> Look for: source files, interface definitions, abstract classes
-> If unavailable: ask the user for approximate counts
- **Existing metrics:** Any existing static analysis reports (SonarQube, JDepend, etc.)
-> Look for: build reports, CI artifacts, quality gate configs
-> If unavailable: compute metrics manually from code or descriptions
### Default Assumptions
- If no specific metrics tools available -> perform manual analysis from code structure and user descriptions
- If method-field mapping is unknown -> estimate LCOM from class description and responsibility count
- If exact dependency counts unknown -> use the user's qualitative description to estimate Ca/Ce ranges
- If abstractness ratio unknown -> assume low abstractness (A ~ 0.1) for typical application code unless interfaces are explicitly described
### Sufficiency Threshold
```
SUFFICIENT when ALL of these are true:
- At least one target module is identified with enough detail to assess
- Dependency direction (incoming/outgoing) is known or estimable
- Module responsibilities are described or observable from code
PROCEED WITH DEFAULTS when:
- Target module is identified
- Dependencies are partially known
- Exact counts can be estimated from qualitative descriptions
MUST ASK when:
- No target module is identified (cannot assess modularity without a subject)
- The user's description is too vague to estimate any metrics
- It's ambiguous whether the problem is cohesion, coupling, or both
```
## Process
### Step 1: Identify and Catalog Target Modules
**ACTION:** List all modules (classes, packages, components) to be evaluated. For each, document: name, primary responsibility, approximate method/function count, and any known dependencies.
**WHY:** Modularity assessment requires concrete targets. Vague discussions about "coupling" without naming specific modules produce generic advice. By naming each module and its responsibility, you establish the scope for all subsequent metrics. A module's responsibility statement also serves as the baseline for evaluating cohesion — if the responsibility can't be stated in one sentence, the module likely has poor cohesion.
**IF** a codebase is available -> **AGENT: EXECUTES** — scan directory structure, count files per package, identify key classes
**ELSE** -> use the user's description to build the module catalog
### Step 2: Evaluate Cohesion
**ACTION:** For each target module, assess cohesion type and estimate LCOM:
**Cohesion type** (ranked best to worst):
1. **Functional** — every part relates to a single, well-defined function. All elements are essential.
2. **Sequential** — one part's output feeds the next part's input (pipeline within the module).
3. **Communicational** — parts operate on the same data to produce different outputs.
4. **Procedural** — parts must execute in a specific order but aren't otherwise related.
5. **Temporal** — parts are grouped because they run at the same time (e.g., initialization code).
6. **Logical** — parts are logically related but functionally different (e.g., StringUtils).
7. **Coincidental** — parts have no relationship; they're in the same module by accident.
**LCOM estimation:**
- Count the instance variables (fields) in the class
- For each method, note which fields it accesses
- LCOM = the sum of sets of methods NOT sharing fields. Higher LCOM = worse cohesion.
- Practical shortcut: if most methods use only 2-3 of 12 fields, LCOM is high (bad). If most methods share most fields, LCOM is low (good).
**WHY:** Cohesion determines whether a module is a natural grouping or an artificial one. Low cohesion (logical or coincidental) means the module is really multiple modules crammed together — any change risks unintended side effects on unrelated functionality. LCOM gives a structural measure: a class where methods don't share fields is structurally disconnected, meaning each field/method group could be its own class. This is the single most important indicator for "should I split this class?"
**IF** source code is available -> **AGENT: EXECUTES** — analyze field-method relationships, calculate approximate LCOM
**ELSE** -> estimate from the user's description of what the class/module does
### Step 3: Measure Coupling (Afferent and Efferent)
**ACTION:** For each target module, count:
- **Afferent coupling (Ca):** Number of modules that depend on THIS module (incoming connections — "who uses me?")
- **Efferent coupling (Ce):** Number of modules THIS module depends on (outgoing connections — "who do I use?")
Mnemonic: **a**fferent = **a**pproaching (incoming), **e**fferent = **e**xiting (outgoing).
**WHY:** Coupling metrics reveal risk exposure. High Ca means many modules break if you change this one — it's a high-responsibility position. High Ce means this module is fragile because it depends on many others — any upstream change can break it. Neither is inherently bad, but the combination determines the module's stability profile (Step 4). A utility package with Ca=50 and Ce=0 is stable but painful to modify. A service with Ca=0 and Ce=15 is volatile but safe to change.
**IF** codebase is available -> **AGENT: EXECUTES** — analyze imports, build dependency graph, count Ca and Ce per module
**ELSE** -> estimate from user's description of dependency directions
### Step 4: Calculate Derived Metrics
**ACTION:** From the coupling measurements, calculate three derived metrics:
1. **Instability: I = Ce / (Ca + Ce)**
- Range: 0 to 1
- I = 0: maximally stable (only incoming dependencies, no outgoing — e.g., a foundational library)
- I = 1: maximally unstable (only outgoing dependencies, no incoming — e.g., a leaf application)
2. **Abstractness: A = abstract_elements / total_elements**
- Range: 0 to 1
- Count interfaces, abstract classes as abstract elements
- Count all classes/modules as total elements
- A = 0: fully concrete (no abstractions)
- A = 1: fully abstract (all interfaces, no implementations)
3. **Distance from Main Sequence: D = |A + I - 1|**
- Range: 0 to 1
- D = 0: perfectly balanced (on the ideal main sequence line)
- D = 1: maximally imbalanced
**WHY:** These derived metrics reveal architectural health that raw coupling counts miss. The main sequence represents the ideal balance: highly stable modules should be abstract (so dependents rely on interfaces, not implementations), and highly unstable modules should be concrete (they change freely since nobody depends on them). Distance from the main sequence quantifies how far a module deviates from this ideal, identifying the two danger zones.
### Step 5: Identify Zone Placement
**ACTION:** Plot each module on the Abstractness-Instability graph and classify:
- **Zone of Pain** (lower-left: I near 0, A near 0): Highly stable AND highly concrete. Many modules depend on it, but it has no abstractions. Painful to change because changes ripple to all dependents, and there are no interfaces to provide flexibility.
-> Examples: utility libraries, core data models, shared database schemas
-> Signal: "We're afraid to touch this because everything breaks"
- **Zone of Uselessness** (upper-right: I near 1, A near 1): Highly unstable AND highly abstract. Few modules depend on it, and it's all interfaces with no concrete use.
-> Examples: over-engineered frameworks nobody uses, abandoned abstraction layers
-> Signal: "Nobody actually uses this interface hierarchy"
- **Main Sequence** (diagonal from upper-left to lower-right): The healthy zone. Stable modules are abstract (changeable via interfaces). Unstable modules are concrete (free to change).
**WHY:** Zone placement converts abstract metrics into actionable diagnosis. A module in the zone of pain needs abstraction (introduce interfaces so dependents don't couple to concrete implementation). A module in the zone of uselessness needs evaluation for removal or consolidation. Modules near the main sequence are healthy. This visualization makes the metrics tangible for stakeholders who can't interpret raw Ca/Ce numbers.
### Step 6: Analyze Connascence
**ACTION:** For each significant coupling relationship between modules, classify the connascence type:
**Static connascence** (source-level, weaker, easier to fix):
- **Name (CoN):** Components agree on names. Weakest, most desirable.
- **Type (CoT):** Components agree on types. Standard in typed languages.
- **Meaning/Convention (CoM):** Components agree on value meanings (e.g., magic numbers, status codes).
- **Position (CoP):** Components agree on parameter order.
- **Algorithm (CoA):** Components must use the same algorithm (e.g., hashing on both client and server).
**Dynamic connascence** (runtime, stronger, harder to fix):
- **Execution (CoE):** Order of execution matters between components.
- **Timing (CoT):** Timing of execution matters (race conditions).
- **Values (CoV):** Multiple values must change together (distributed transactions).
- **Identity (CoI):** Components must reference the same entity instance.
**Three guidelines for improvement:**
1. Minimize overall connascence by encapsulating
2. Minimize connascence that crosses module boundaries
3. Maximize connascence within module boundaries
**Rules from Jim Weirich:**
- **Rule of Degree:** Convert strong connascence to weaker forms
- **Rule of Locality:** As distance between modules increases, use weaker connascence forms
**WHY:** Connascence provides a vocabulary for discussing coupling quality, not just quantity. Two modules with the same Ca/Ce scores can have very different coupling health. Connascence of Name is trivially refactorable (rename a method). Connascence of Values across a distributed system (distributed transactions) is architecturally significant and may require redesigning service boundaries. Identifying connascence type tells you HOW hard the coupling is to address, while Ca/Ce tell you HOW MUCH coupling exists.
**HANDOFF TO HUMAN** for runtime connascence analysis — dynamic connascence (execution order, timing, values, identity) requires production observation, distributed tracing, or load testing to fully assess.
### Step 7: Synthesize Assessment and Recommend Actions
**ACTION:** Produce the Modularity Health Report combining all metrics, zone placements, and connascence findings. For each module, provide:
- Overall health rating (Healthy / At Risk / Unhealthy)
- Primary concern (cohesion, coupling, zone placement, or connascence)
- Specific refactoring recommendation with expected improvement
**WHY:** Individual metrics are useful but can be misleading in isolation. A high LCOM is meaningless if the class has low coupling and few dependents. The synthesis step weighs all factors together and prioritizes what to fix first. The recommendation must be specific enough to act on — "improve cohesion" is useless; "extract the notification methods (notifyCustomer, sendEmail, formatNotification) into a NotificationService" is actionable.
**IF** the user's goal is microservice extraction -> prioritize recommendations that create clean boundaries (low Ce, functional cohesion, weak cross-boundary connascence)
**IF** the user's goal is code quality improvement -> prioritize LCOM reduction and connascence simplification
## Inputs
- Target module(s) to evaluate: class names, package names, or component descriptions
- Dependency information: what depends on the module and what it depends on
- Optionally: source code access, existing static analysis reports, module responsibility descriptions
## Outputs
### Modularity Health Report
```markdown
# Modularity Health Report: {System/Component Name}
## Assessment Scope
- **Date:** {date}
- **Modules assessed:** {count}
- **Assessment method:** {codebase analysis / description-based / hybrid}
- **Tools used:** {JDepend, SonarQube, manual analysis, etc.}
## Module Catalog
| Module | Responsibility | Methods | Fields | Ca | Ce |
|--------|---------------|:-------:|:------:|:--:|:--:|
| {name} | {one-line responsibility} | {count} | {count} | {count} | {count} |
## Cohesion Assessment
| Module | Cohesion Type | LCOM Estimate | Rating |
|--------|-------------|:-------------:|:------:|
| {name} | {functional/sequential/.../coincidental} | {low/medium/high} | {good/warning/poor} |
### Cohesion Details
**{Module name}:** {detailed cohesion analysis with field-method groupings}
## Coupling & Derived Metrics
| Module | Ca | Ce | Instability (I) | Abstractness (A) | Distance (D) | Zone |
|--------|:--:|:--:|:---------------:|:-----------------:|:-------------:|------|
| {name} | {n} | {n} | {0-1} | {0-1} | {0-1} | {pain/uselessness/healthy} |
## Zone Placement Map
```
Abstractness (A)
1 | Zone of /
| Uselessness /
| / [Module C]
| /
| / <-- Main Sequence
| /
| /
| [Module B]
| / [Module A: Zone of Pain]
0 +----------------------------> 1
Instability (I)
```
## Connascence Analysis
| From -> To | Type | Strength | Across Boundary? | Concern Level |
|-----------|------|----------|:----------------:|:-------------:|
| {module A -> module B} | {CoN/CoT/CoM/...} | {weak/moderate/strong} | {yes/no} | {low/medium/high} |
## Health Summary
| Module | Health | Primary Concern | Recommended Action |
|--------|:------:|-----------------|-------------------|
| {name} | {Healthy/At Risk/Unhealthy} | {concern} | {specific action} |
## Prioritized Refactoring Recommendations
1. **{Highest priority}** — {specific action} -> Expected improvement: {metric change}
2. **{Second priority}** — {specific action} -> Expected improvement: {metric change}
3. **{Third priority}** — {specific action} -> Expected improvement: {metric change}
```
## Key Principles
- **Cohesion is subjective, coupling is structural** — Cohesion type requires judgment about whether groupings are "functional" or "logical." LCOM provides a structural proxy but can't distinguish essential complexity from accidental grouping. Use LCOM to flag suspects, then apply cohesion type analysis to confirm. Never rely on a single metric.
- **High coupling is not inherently bad — directionality matters** — A stable foundation library with Ca=100 and Ce=0 is healthy architecture. The problem is when high Ca combines with low abstractness (zone of pain) or when high Ce makes a module fragile. Always interpret coupling in context of instability and abstractness.
- **Connascence strength increases with distance** — Connascence of Name within a module is perfectly fine. Connascence of Values across distributed services is an architectural crisis. The same form of coupling becomes more dangerous as the distance between modules increases. Always evaluate connascence relative to module boundaries.
- **Metrics require interpretation, not just calculation** — All code-level metrics have limitations. LCOM detects structural lack of cohesion but can't distinguish essential complexity from poor design. Cyclomatic complexity can't distinguish inherent problem complexity from accidental code complexity. Calculate the metrics, then apply architectural judgment to interpret what they mean in context.
- **Zone of pain is more dangerous than zone of uselessness** — Code in the zone of uselessness wastes effort but doesn't block progress. Code in the zone of pain actively resists change and creates cascading failures. Prioritize moving modules out of the zone of pain (by introducing abstractions) over cleaning up the zone of uselessness.
- **Prefer weaker forms of connascence** — When you can't eliminate coupling, downgrade it. Convert connascence of meaning (magic numbers) to connascence of name (named constants). Convert connascence of algorithm (shared hashing) to connascence of type (shared library). Each downgrade makes the coupling cheaper to maintain and less likely to cause bugs.
## Examples
**Scenario: God class evaluation**
Trigger: "I have a CustomerService class with 35 methods, 12 instance variables, and methods spanning registration, billing, notifications, and reporting. Is this well-designed?"
Process: Cataloged the class. Identified 4 distinct responsibility groups from the method descriptions. Assessed cohesion as logical (methods related by entity, not by function). Estimated LCOM as high — registration methods use fields A,B,C; billing methods use fields D,E,F; notification methods use fields G,H; reporting methods use fields I,J,K,L. Few fields shared across groups. Measured Ca=15 (many dependents), Ce=8. Calculated I=0.35, A=0.0, D=0.65 — deep in zone of pain. Recommended extracting into 4 focused services: CustomerRegistrationService, BillingService, NotificationService, CustomerReportingService. Expected improvement: each new class achieves functional cohesion, LCOM drops to low, and D approaches 0.
Output: Modularity health report showing the class is Unhealthy with a specific 4-way decomposition plan.
**Scenario: Utility package dependency analysis**
Trigger: "Our utils package has 200 classes and every other package depends on it. We're afraid to change anything."
Process: Identified the utils package as having Ca=high (every package depends on it), Ce=low (it depends on nothing), I=0.0, A~0.05 (almost no interfaces). Plotted directly in zone of pain. Cohesion type: coincidental — date formatters, database helpers, and email templates have no functional relationship. Analyzed connascence: mostly CoN (name-based) from the rest of the codebase, but some CoM (magic numbers shared via utility constants). Recommended: (1) Extract date/time utilities into a DateTimeUtils package with an interface, (2) Move database helpers into a persistence package close to where they're used, (3) Extract email templates into a notification package. Each extraction reduces Ca on the remaining utils and moves code to functionally cohesive homes.
Output: Modularity health report with 3-phase decomposition plan and expected zone migration from pain to main sequence.
**Scenario: Microservice extraction readiness assessment**
Trigger: "We have 15 top-level packages in our monolith. Which are ready to extract as microservices?"
Process: For each package, measured Ca, Ce, calculated I and A. Assessed cohesion type. Analyzed cross-package connascence. Found 4 packages with functional cohesion, low Ce, and weak cross-boundary connascence (good candidates). Found 3 packages in zone of pain with high Ca (extract last — need abstraction first). Found 2 packages with high CoV (shared transactions — cannot extract without addressing data consistency). Ranked all 15 by extraction readiness score. Recommended extraction order: start with the 4 clean candidates, then refactor the zone-of-pain packages by introducing interfaces, then address shared-transaction packages using saga pattern or shared database strategy.
Output: Ranked extraction readiness report with specific blockers and prerequisites for each package.
## References
- For the complete modularity metrics reference with formulas, cohesion taxonomy, connascence types, and zone definitions, see [references/modularity-metrics-reference.md](references/modularity-metrics-reference.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/modularity-metrics-reference.md
# Modularity Metrics Reference
This reference provides the complete metrics framework for evaluating code modularity. Read this when you need exact formulas, threshold values, or the full connascence taxonomy.
## Cohesion Types (Ranked Best to Worst)
| Rank | Type | Definition | Example | Signal |
|:----:|------|-----------|---------|--------|
| 1 | **Functional** | Every part relates to a single function; all parts are essential | A `PaymentProcessor` class that only handles payment authorization, capture, and refund | Methods share most fields; removing any method breaks the class |
| 2 | **Sequential** | One part's output becomes the next part's input | A data pipeline class where `parse()` feeds `validate()` feeds `transform()` | Clear input-output chain between methods |
| 3 | **Communicational** | Parts operate on the same data for different purposes | A class that both saves a customer record to the database and sends a confirmation email using the same customer data | Methods share input data but produce different outputs |
| 4 | **Procedural** | Parts must execute in a specific order | A deployment class where `stopServer()` must run before `deployArtifact()` before `startServer()` | Order matters but methods don't share data |
| 5 | **Temporal** | Parts are grouped by timing, not function | A `SystemStartup` class that initializes logging, opens DB connections, and starts background threads | Methods run at the same time but are functionally unrelated |
| 6 | **Logical** | Parts are logically related but functionally different | `StringUtils` with `toUpperCase()`, `trim()`, `parseDate()`, `formatCurrency()` | Methods operate on similar types but serve different purposes |
| 7 | **Coincidental** | Parts have no meaningful relationship | A `Helpers` class with random methods thrown in for convenience | No discernible reason why these methods are together |
### Cohesion Assessment Heuristic
Ask: "If I described this module's responsibility, how many sentences would it take?"
- 1 sentence -> likely functional cohesion
- 2-3 sentences with "and" -> likely communicational or sequential
- 4+ sentences or uses "and/or" -> likely logical or coincidental
## LCOM (Lack of Cohesion in Methods)
### Definition
LCOM measures the structural cohesion of a class by analyzing how methods share instance variables (fields).
**LCOM = The sum of sets of methods NOT shared via fields**
Practical interpretation:
- **LCOM = 0:** Perfect structural cohesion. All methods access the same fields.
- **LCOM low:** Good cohesion. Most methods share fields.
- **LCOM high:** Poor cohesion. Methods form separate groups that don't share state. The class is likely multiple classes combined.
### How to Estimate LCOM Without Tooling
1. List all instance variables (fields) in the class
2. For each method, note which fields it accesses
3. Group methods by shared field access
4. If you get multiple disconnected groups -> LCOM is high
**Quick check:** If a class has N fields but each method only uses 2-3 of them, and different methods use different subsets, LCOM is high. The class should probably be split along the field-access boundaries.
### Visual Interpretation
```
Class X (Low LCOM - Good) Class Y (High LCOM - Bad) Class Z (Mixed)
Fields: A, B, C Fields: A, B, C Fields: A, B, C
m1() uses A, B, C m1() uses A m1() uses A, B
m2() uses A, B m2() uses B m2() uses A, B
m3() uses B, C m3() uses C m3() uses C
-> All connected -> All disconnected -> Two groups: {m1,m2} and {m3}
-> Keep as one class -> Split into 3 classes -> Consider extracting m3+C
```
### Limitations
LCOM measures **structural** lack of cohesion only. It cannot determine if a grouping is **logically** cohesive. A `MathUtils` class with static methods sharing no state has high LCOM but may be a reasonable design choice. Always combine LCOM with cohesion type analysis.
## Coupling Metrics
### Afferent Coupling (Ca) — Incoming
- **What it measures:** The number of external modules that depend on THIS module
- **Mnemonic:** **A**fferent = **A**pproaching = incoming arrows
- **High Ca means:** Many things break if you change this module. It has high responsibility.
- **Example:** A `DateUtils` class used by 50 other classes has Ca = 50
### Efferent Coupling (Ce) — Outgoing
- **What it measures:** The number of external modules THIS module depends on
- **Mnemonic:** **E**fferent = **E**xiting = outgoing arrows
- **High Ce means:** This module is fragile. Changes in any dependency can break it.
- **Example:** A `ReportGenerator` that imports from 12 other packages has Ce = 12
### Coupling Interpretation Table
| Ca | Ce | Profile | Risk |
|:--:|:--:|---------|------|
| High | Low | Foundation/library | Stable but painful to change |
| Low | High | Leaf/application | Volatile but safe to change |
| High | High | Hub/bottleneck | Dangerous — fragile AND high-impact |
| Low | Low | Isolated | Low risk but check if it's actually used |
## Derived Metrics
### Instability
```
I = Ce / (Ca + Ce)
```
| I Value | Meaning | Description |
|:-------:|---------|-------------|
| 0.0 | Maximally stable | Only dependents, no dependencies. Foundation code. |
| 0.5 | Balanced | Equal incoming and outgoing. |
| 1.0 | Maximally unstable | Only dependencies, no dependents. Leaf code. |
**Key insight:** Instability is not inherently bad. Leaf code SHOULD be unstable (I near 1) because it can change freely without affecting others. Foundation code SHOULD be stable (I near 0) because many things depend on it. The problem arises when stability doesn't match abstractness (see Distance).
### Abstractness
```
A = sum(abstract_elements) / sum(total_elements)
```
Where abstract elements = interfaces + abstract classes, total elements = all classes/modules.
| A Value | Meaning | Description |
|:-------:|---------|-------------|
| 0.0 | Fully concrete | No interfaces or abstract classes. All implementation. |
| 0.5 | Balanced | Half abstract, half concrete. |
| 1.0 | Fully abstract | All interfaces/abstract classes. No implementations. |
### Distance from Main Sequence
```
D = |A + I - 1|
```
| D Value | Meaning | Description |
|:-------:|---------|-------------|
| 0.0 | On the main sequence | Perfect balance of abstractness and instability. |
| 0.5 | Moderate deviation | Somewhat imbalanced. Worth investigating. |
| 1.0 | Maximum deviation | Severely imbalanced. In a danger zone. |
**Target:** D < 0.3 is generally healthy. D > 0.5 requires investigation.
## The Main Sequence Graph
```
Abstractness (A)
1.0 | ZONE OF USELESSNESS
| (high A, high I) /
| Too abstract, /
| nobody uses it /
| /
0.5 | / <-- Main Sequence (ideal)
| /
| /
| /
| / ZONE OF PAIN
| / (low A, low I)
0.0 | / Too concrete, hard to change
+----+----+----+----+----+-> Instability (I)
0.0 0.2 0.4 0.6 0.8 1.0
```
### Zone of Pain (Bottom-Left Corner)
- **Metrics:** I near 0 (very stable) + A near 0 (very concrete)
- **What it means:** The module has many dependents (stable) but no abstractions (concrete). Changing it forces changes in all dependents. There's no interface to provide flexibility.
- **Common examples:** Utility libraries, core domain models, shared database schemas, configuration classes
- **Symptom:** "We're afraid to touch this because everything breaks"
- **Fix:** Introduce interfaces/abstractions so dependents couple to the interface, not the implementation. This increases A without changing I, moving toward the main sequence.
- **Caveat:** Some zone-of-pain modules are acceptable. The Java String class is concrete (A=0) and maximally stable (Ca=enormous), but it rarely needs to change. Non-volatile concrete libraries can safely live in this zone.
### Zone of Uselessness (Top-Right Corner)
- **Metrics:** I near 1 (very unstable) + A near 1 (very abstract)
- **What it means:** The module is all interfaces with no concrete use, and nobody depends on it. It was probably designed speculatively or is an abandoned abstraction layer.
- **Common examples:** Over-engineered framework interfaces, abandoned plugin systems, "just in case" abstractions
- **Symptom:** "Nobody actually implements this interface"
- **Fix:** Either find concrete uses or delete it. Dead abstractions add cognitive load without value.
## Connascence Taxonomy
### Static Connascence (Source-Level, Weaker)
| Type | Abbreviation | Definition | Example | Refactoring Ease |
|------|:------------:|-----------|---------|:----------------:|
| **Name** | CoN | Components agree on the name of an entity | Method names, variable names | Trivial (IDE rename) |
| **Type** | CoT | Components agree on the type of an entity | Parameter types, return types | Easy (type refactoring) |
| **Meaning** | CoM / CoC | Components agree on the meaning of values | `int TRUE = 1; int FALSE = 0;` or status code conventions | Moderate (replace magic values with named constants) |
| **Position** | CoP | Components agree on the order of values | Parameter order in method calls | Moderate (use named parameters or builder pattern) |
| **Algorithm** | CoA | Components must use the same algorithm | Client and server both use SHA-256 for auth tokens | Hard (algorithm change requires coordinated update) |
### Dynamic Connascence (Runtime, Stronger)
| Type | Abbreviation | Definition | Example | Refactoring Ease |
|------|:------------:|-----------|---------|:----------------:|
| **Execution** | CoE | Order of execution matters | Must call `init()` before `send()` | Hard (requires API redesign) |
| **Timing** | CoT | Timing of execution matters | Race conditions between threads | Very hard (requires synchronization redesign) |
| **Values** | CoV | Multiple values must change together | Distributed transaction: all or nothing | Very hard (may require saga pattern) |
| **Identity** | CoI | Components must reference the same entity | Two services sharing a distributed queue | Very hard (requires shared state management) |
### Connascence Strength Diagram
```
WEAKER (prefer) STRONGER (avoid across boundaries)
| |
Name -> Type -> Meaning -> Position -> Algorithm -> Execution -> Timing -> Values -> Identity
|<--- Static (compile-time) --->| |<--- Dynamic (runtime) ----------->|
|<--- Easier to detect/fix ---->| |<--- Harder to detect/fix -------->|
```
### Connascence Properties
**Strength:** Weaker forms are easier to refactor. Always convert strong connascence to weaker forms when possible.
**Locality:** The same connascence form is more acceptable within a module boundary than across boundaries. Connascence of Meaning within one class is a code smell. Connascence of Meaning across microservices is an architectural crisis.
**Degree:** The number of components affected. Connascence of Values between 2 components is manageable. Between 20 components, it's a systemic problem.
### Three Guidelines for Improving Modularity via Connascence
1. **Minimize overall connascence** by breaking the system into encapsulated elements
2. **Minimize cross-boundary connascence** — coupling that crosses module boundaries should be as weak as possible
3. **Maximize within-boundary connascence** — strong coupling within a cohesive module is fine
### Weirich's Rules
- **Rule of Degree:** Convert strong forms of connascence into weaker forms
- **Rule of Locality:** As the distance between software elements increases, use weaker forms of connascence
## Unifying Coupling and Connascence
Structured Design coupling (afferent/efferent) and connascence are complementary views:
- **Coupling** tells you HOW MUCH coupling exists (Ca/Ce counts)
- **Connascence** tells you WHAT KIND of coupling exists (how they're coupled)
Both are needed for a complete picture. High Ca with CoN (name-based coupling) is far less concerning than moderate Ca with CoV (value-based coupling across a distributed system).
## Practical Tooling
| Language | Coupling Analysis | LCOM Analysis | Connascence |
|----------|------------------|---------------|-------------|
| Java | JDepend, SonarQube, ArchUnit | SonarQube LCOM4, Checkstyle | Manual / code review |
| .NET | NDepend, SonarQube | NDepend LCOM | Manual / code review |
| Python | import analysis, pydeps | Manual (dynamic typing limits) | Manual / code review |
| JavaScript/TypeScript | madge, dependency-cruiser | ESLint complexity rules | Manual / code review |
| Go | go vet, staticcheck | Manual (composition over inheritance) | Manual / code review |
| General | Graphviz for dependency visualization | Structural analysis from imports | Always requires human judgment |
Right-size microservice boundaries using granularity disintegrators (forces to split: service scope, code volatility, scalability, fault tolerance, security,...
---
name: microservice-granularity-optimizer
description: >
Right-size microservice boundaries using granularity disintegrators (forces to split: service scope, code volatility, scalability, fault tolerance, security, extensibility) and integrators (forces to combine: database transactions, workflow/choreography coupling, shared code, data relationships). Includes choreography vs orchestration selection and the saga pattern for distributed transactions. Use this skill whenever the user is splitting a monolith into microservices, deciding how fine-grained services should be, experiencing too many inter-service calls or latency from over-splitting, dealing with distributed transaction problems across microservices, choosing between choreography and orchestration for service communication, implementing the saga pattern, debugging a distributed monolith, or evaluating whether services should be merged or split further -- even if they don't use the exact phrase "microservice granularity."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/microservice-granularity-optimizer
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on:
- component-identifier
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [17]
tags: [software-architecture, architecture, microservices, granularity, bounded-context, disintegrators, integrators, choreography, orchestration, saga, distributed-transactions]
execution:
tier: 2
mode: hybrid
inputs:
- type: none
description: "System description with current or proposed service boundaries, inter-service communication patterns, and transaction requirements -- the skill guides the granularity optimization process"
tools-required: [Read, Write]
tools-optional: [Grep, Glob]
mcps-required: []
environment: "Any agent environment. If a codebase exists, can analyze current service structure."
---
# Microservice Granularity Optimizer
## When to Use
You need to determine the right size for microservice boundaries, evaluate whether existing services should be split or merged, or select communication patterns (choreography vs orchestration) for inter-service coordination. The most common mistake in microservices is making services too small -- as Martin Fowler noted, "microservice" is a label, not a description. Typical situations:
- Splitting a monolith -- "we want to decompose into microservices, but how small should each service be?"
- Over-splitting diagnosis -- "we split too fine and now every request requires 5+ inter-service calls"
- Distributed transaction pain -- "we need atomic operations across services but SAGA is killing us"
- Communication design -- "should our services use choreography or orchestration?"
- Granularity evaluation -- "we have 30 microservices for a system that could be 8 -- did we over-decompose?"
- Merge vs split decision -- "these two services always change together and share data -- should we merge them?"
Before starting, verify:
- Has microservices been confirmed as the architecture style? If the user hasn't decided yet, consider using `architecture-style-selector` first.
- Are components identified? If not, use `component-identifier` to establish initial service candidates.
- If the user has existing microservices and is troubleshooting granularity, proceed directly.
## Context & Input Gathering
### Input Sufficiency Check
This skill optimizes microservice granularity. You can proceed with partial information, but certain inputs directly determine the quality of the recommendation.
### Required Context (must have -- ask if missing)
- **Current or proposed service boundaries:** What services exist (or are planned) and what does each one do?
-> Check prompt for: service names, service responsibilities, API descriptions, domain descriptions
-> If missing, ask: "What are your current (or proposed) microservices and what does each one handle?"
- **Inter-service communication pain points:** Which services call each other frequently? Where is latency or complexity highest?
-> Check prompt for: mentions of "too many calls," latency, coupling, distributed transactions, data consistency
-> If missing, ask: "Which services communicate with each other most frequently, and are there pain points (latency, failed transactions, tight coupling)?"
### Important Context (strongly recommended -- ask if easy to obtain)
- **Transaction requirements:** Which operations must be atomic across service boundaries?
-> Check prompt for: ACID mentions, consistency requirements, "must be atomic" language, order/payment/inventory workflows
-> If missing, ask: "Are there business operations that need to be all-or-nothing across multiple services? For example, does placing an order need to atomically update inventory and charge payment?"
- **Scalability differences:** Do different parts of the system need to scale independently?
-> Check prompt for: traffic patterns, peak load descriptions, "X gets 100x more traffic than Y"
-> If missing and relevant, ask: "Do any parts of your system need to scale independently? For example, does search need to handle 100x more traffic than checkout?"
### Observable Context (gather from environment)
- **Existing service structure:** If a codebase exists, scan for service boundaries
-> Look for: Docker/Kubernetes configs, API gateway routes, service directories, inter-service client libraries
-> Reveals: actual granularity, communication patterns, shared dependencies
### Default Assumptions
- If communication pattern unknown -> assume choreography-first (aligns with microservices philosophy)
- If transaction requirements unknown -> assume most operations are eventually consistent, flag any that look like they need atomicity
- If team size unknown -> assume moderate (can handle 5-15 services)
### Sufficiency Threshold
```
SUFFICIENT: service boundaries (current or proposed) + communication pain points + transaction requirements are known
PROCEED WITH DEFAULTS: service boundaries are known, pain points and transactions unclear
MUST ASK: service boundaries are missing entirely
```
## Process
### Step 1: Map Current Service Boundaries
**ACTION:** Document each service's scope, responsibilities, and data ownership.
**WHY:** You cannot optimize granularity without understanding the current state. Each microservice should model a bounded context -- a domain or workflow that includes everything necessary to operate: classes, subcomponents, and database schemas. The bounded context concept from domain-driven design is the philosophical foundation of microservices. Services that share databases, classes, or schemas are not truly bounded and will suffer from coupling that defeats the purpose of microservices.
**For each service, document:**
1. Name and domain purpose
2. Key entities and data it owns
3. Which other services it calls (and why)
4. Which other services call it (and why)
5. Database(s) it uses -- shared or dedicated
**IF** services share a database -> flag as coupling risk (violates data isolation principle)
**IF** multiple services need the same entity class (e.g., Address) -> this is expected; microservices prefer duplication over coupling
### Step 2: Apply Granularity Disintegrators
**ACTION:** Evaluate each service against the six disintegrator forces. Each force that applies pushes toward splitting the service into smaller ones.
**WHY:** Disintegrators are the forces that justify making services smaller. Without a structured evaluation, architects split services based on gut feeling, which usually leads to either over-splitting (too many tiny services) or under-splitting (hidden monolith). Each disintegrator has a specific, testable reason for splitting -- if none apply, the service is the right size.
**The six disintegrators:**
1. **Service scope and function** -- Is the service doing too many unrelated things?
-> Test: Can you describe the service's purpose in one sentence without using "and"?
-> If the service handles "order management AND inventory tracking AND shipping," it may be doing too much.
-> Split when: the service contains multiple distinct business domains that could operate independently.
2. **Code volatility** -- Do different parts of the service change at very different rates?
-> Test: Over the last 6 months, did some components change weekly while others haven't changed in months?
-> Split when: a frequently-changing component is locked into a service with stable components, forcing unnecessary redeployments.
3. **Scalability** -- Do different parts need different scaling profiles?
-> Test: Does one component need 10 instances while another needs only 1?
-> Split when: scaling the service up also scales components that don't need it, wasting resources.
4. **Fault tolerance** -- Should a failure in one part NOT bring down another part?
-> Test: If the recommendation engine fails, should order processing still work?
-> Split when: a non-critical component's failure takes down a critical component.
5. **Security** -- Do different parts require different security levels or access controls?
-> Test: Does one component handle PCI/PII data while another handles public data?
-> Split when: the entire service must run at the highest security level because one component requires it.
6. **Extensibility** -- Is one part of the service likely to grow in directions that don't affect the rest?
-> Test: Are new features being added to one component while others remain stable?
-> Split when: extension points are concentrated in one area and coupling to the rest slows development.
**Score each service:** For each disintegrator, mark whether it applies (YES/NO) with specific evidence. A service with 3+ disintegrators applying is a strong candidate for splitting.
### Step 3: Apply Granularity Integrators
**ACTION:** Evaluate whether services that seem like they should be separate actually need to stay together (or be merged).
**WHY:** Integrators are the counter-forces to disintegrators -- they are reasons NOT to split (or reasons to merge services that were split too aggressively). The most common microservices mistake is ignoring integrators: splitting services that need ACID transactions, that always change together, or that share critical code. The result is a distributed monolith -- all the complexity of microservices with none of the benefits.
**The four integrators:**
1. **Database transactions** -- Do these services need ACID transactions between them?
-> Test: Must operations across these services be all-or-nothing? If payment fails, must the order be rolled back atomically?
-> Merge when: you find yourself building SAGA patterns for operations that were trivial database transactions before splitting. The best advice for distributed transactions is: DON'T. Fix the granularity instead.
-> **Critical rule:** "Don't do transactions in microservices -- fix granularity instead!" Transaction boundaries are one of the strongest indicators of incorrect granularity.
2. **Workflow and choreography** -- Do these services require extensive back-and-forth communication?
-> Test: Does completing one business operation require 4+ synchronous calls between services?
-> Merge when: the inter-service communication overhead exceeds the benefit of separation. If Service A cannot do anything useful without calling Service B, they may belong together.
3. **Shared code** -- Do these services share significant business logic (not just utilities)?
-> Test: Is the same business rule implemented in multiple services? When the rule changes, do multiple services need updating?
-> Merge when: shared code represents shared business logic that changes together. (Shared infrastructure code like logging is fine to duplicate -- it's shared domain logic that indicates coupling.)
4. **Data relationships** -- Do these services constantly need each other's data?
-> Test: Does Service A frequently query Service B just to get reference data it needs for every operation?
-> Merge when: the data dependency is so frequent that every request to A triggers a call to B, effectively creating a runtime monolith.
**For each pair of closely-related services:** Evaluate all four integrators. If 2+ integrators apply strongly, the services should likely be merged.
### Step 4: Resolve Conflicts and Decide
**ACTION:** When disintegrators say "split" but integrators say "keep together," make a judgment call based on the trade-offs.
**WHY:** Almost every granularity decision involves tension between splitting forces and combining forces. There is no formula -- this is where architectural judgment matters. The key insight is that integrators generally win over disintegrators when they apply, because the cost of fighting integrators (distributed transactions, excessive communication, duplicated business logic) is usually higher than the cost of a slightly-too-large service.
**Decision framework:**
- **Strong disintegrators + weak integrators** -> Split the service
- **Weak disintegrators + strong integrators** -> Keep together (or merge)
- **Strong disintegrators + strong integrators** -> This is the hard case. Consider:
- Can the transaction boundary be redesigned to avoid the ACID need?
- Can an event-driven approach replace synchronous coupling?
- Would the saga pattern be acceptable for this specific workflow?
- Is the operational cost of SAGA worth the deployment/scaling independence?
**IF** transaction integrator is the blocker -> strongly favor keeping services together. Distributed transactions are the #1 source of microservices complexity.
**IF** choreography integrator is the blocker -> consider whether orchestration could reduce the coupling enough to justify the split.
### Step 5: Select Communication Pattern
**ACTION:** For each inter-service communication path, choose between choreography and orchestration.
**WHY:** Choreography (broker-style, no central coordinator) and orchestration (mediator-style, central coordinator) represent fundamentally different trade-offs. Choreography preserves the decoupled philosophy of microservices and is the natural default. Orchestration creates coupling but simplifies complex multi-step workflows. The choice directly affects how well the architecture handles errors, maintains decoupling, and manages workflow complexity.
**Choreography (default for microservices):**
- Each service calls other services as needed, no central coordinator
- Resembles the broker pattern in event-driven architecture
- Preserves bounded context philosophy -- each service is autonomous
- Trade-off: error handling and coordination are distributed across services
- Best for: simple workflows, high decoupling needs, services that can operate independently
- Risk: complex workflows become "front controller" patterns where one service becomes an accidental mediator
**Orchestration:**
- A dedicated mediator service coordinates the workflow across other services
- Creates coupling through the mediator, but centralizes coordination logic
- Trade-off: coupling through mediator vs. distributed complexity without one
- Best for: complex multi-step business processes (e.g., "place order" involving payment, inventory, shipping)
- Risk: mediator becomes a bottleneck or single point of failure
**Decision criteria:**
| Factor | Choreography | Orchestration |
|--------|:---:|:---:|
| Workflow complexity | Simple (2-3 services) | Complex (4+ services) |
| Error handling | Distributed, each service handles its own | Centralized in mediator |
| Decoupling | Maximum | Moderate (mediator creates coupling) |
| Visibility | Low (tracing across services needed) | High (mediator has full view) |
| Performance | Fewer bottlenecks | Mediator can bottleneck |
| Domain isomorphism | Natural fit for microservices | Better for structured workflows |
### Step 6: Design Saga Pattern (if needed)
**ACTION:** For any remaining cross-service transactions that cannot be eliminated by adjusting granularity, design the saga (compensating transaction) pattern.
**WHY:** Even with optimal granularity, some distributed transactions are unavoidable -- for example, when two services genuinely need different architecture characteristics (one needs extreme scalability, the other needs strict security isolation) yet must participate in a business transaction. The saga pattern coordinates these transactions through a do/undo mechanism: each service operation has a compensating "undo" operation. A mediator service tracks the transaction state and triggers compensating actions if any step fails. This is significantly more complex than ACID transactions and should be used sparingly.
**For each saga:**
1. List the participating services and their operations
2. Define the "do" operation for each service
3. Define the "undo" (compensating) operation for each service
4. Choose: choreographed saga (each service triggers the next) or orchestrated saga (mediator coordinates)
5. Define the pending state management (how do you track which steps completed?)
6. Define error handling: what happens if the undo operation itself fails?
**Critical warnings:**
- A few transactions across services is sometimes necessary; if it's the dominant feature of the architecture, mistakes were made
- If you find yourself designing SAGAs for most of your workflows, your services are too granular -- go back to Step 2-4 and merge services
- The undo operations are often significantly more complex than the do operations
- Asynchronous SAGAs with contingent requests create race conditions that are extremely difficult to debug
### Step 7: Validate and Score
**ACTION:** Validate the final granularity against microservices architecture characteristic ratings and check for anti-patterns.
**WHY:** Every architecture style has known strengths and weaknesses. Validating against the ratings ensures you are leveraging the architecture's strengths (scalability, elasticity, evolutionary, modularity) and not fighting its weaknesses (performance, cost, simplicity). Checking for anti-patterns catches the most common design mistakes.
**Microservices architecture ratings:**
| Characteristic | Rating | Notes |
|---------------|:------:|-------|
| Deployability | 4 | Small, independent deployment units |
| Elasticity | 5 | Fine-grained scaling per service |
| Evolutionary | 5 | High decoupling enables incremental change |
| Fault tolerance | 4 | Independent services fail independently |
| Modularity | 5 | Each service is a bounded context |
| Overall cost | 1 | Expensive: infrastructure, orchestration, monitoring |
| Performance | 2 | Network calls replace method calls |
| Reliability | 4 | Redundancy via service discovery |
| Scalability | 5 | Each service scales independently |
| Simplicity | 1 | Most complex distributed architecture |
| Testability | 4 | Small test scope per service |
**Anti-pattern checks:**
- **Distributed monolith:** Services share databases, deploy together, or can't function independently. You have all the complexity of microservices with none of the benefits.
- **Over-granular services:** Every API call requires 3+ inter-service hops. Latency is unacceptable. Most workflows need SAGAs. Services can't do anything useful alone.
- **Entity trap:** Services modeled after database entities (UserService, OrderService, ProductService) rather than business capabilities/workflows.
- **Shared database:** Multiple services read/write the same database tables, creating hidden coupling through schema changes.
- **Front controller:** One "choreographed" service has become an accidental orchestrator, handling coordination for most workflows while also maintaining its own domain logic.
- **Transaction spaghetti:** More than 30% of workflows require SAGA patterns -- indicates granularity is wrong.
## Inputs
- Current or proposed service boundaries with responsibilities
- Inter-service communication patterns and pain points
- Transaction requirements (which operations must be atomic)
- Scalability and fault tolerance differences between services
- Existing codebase structure (if migrating from monolith)
## Outputs
### Granularity Optimization Report
```markdown
# Microservice Granularity Report: {System Name}
## Current State
**Services:** {count}
**Key pain points:** {latency, transactions, coupling, etc.}
## Disintegrator/Integrator Analysis
### Service: {ServiceName}
**Current scope:** {what it does}
| Disintegrator | Applies? | Evidence |
|--------------|:--------:|----------|
| Service scope/function | {Y/N} | {evidence} |
| Code volatility | {Y/N} | {evidence} |
| Scalability | {Y/N} | {evidence} |
| Fault tolerance | {Y/N} | {evidence} |
| Security | {Y/N} | {evidence} |
| Extensibility | {Y/N} | {evidence} |
| **Disintegrator score** | **{count}/6** | |
### Service Pair: {ServiceA} + {ServiceB}
| Integrator | Applies? | Evidence |
|-----------|:--------:|----------|
| Database transactions | {Y/N} | {evidence} |
| Workflow coupling | {Y/N} | {evidence} |
| Shared code | {Y/N} | {evidence} |
| Data relationships | {Y/N} | {evidence} |
| **Integrator score** | **{count}/4** | |
**Decision:** {split / merge / keep as-is}
**Reasoning:** {why}
## Recommended Service Boundaries
| # | Service | Domain | Owns Data | Scales to |
|---|---------|--------|-----------|:---------:|
| 1 | {name} | {domain} | {entities} | {instances} |
## Communication Design
| Workflow | Services involved | Pattern | Reasoning |
|----------|------------------|---------|-----------|
| {workflow} | {services} | Choreography / Orchestration | {why} |
## Saga Patterns (if any)
### Saga: {WorkflowName}
| Step | Service | Do operation | Undo operation |
|:----:|---------|-------------|----------------|
| 1 | {service} | {do} | {undo} |
| 2 | {service} | {do} | {undo} |
**Coordination:** {choreographed / orchestrated}
**Pending state:** {how tracked}
## Anti-Pattern Check
- [ ] No shared databases between services
- [ ] No distributed monolith (services deploy independently)
- [ ] Not over-granular (<30% of workflows need SAGA)
- [ ] No entity trap (services model workflows, not entities)
- [ ] No accidental front controllers
- [ ] Each service can function with limited degradation if others fail
## Characteristic Fit
| Characteristic | Rating | Meets needs? |
|---------------|:------:|:------------:|
| Deployability | 4 | {Yes/No} |
| Elasticity | 5 | {Yes/No} |
| ... | ... | ... |
```
## Key Principles
- **"Microservice" is a label, not a description** -- The originators chose the name to contrast with "gigantic services" (SOA), not to prescribe size. Many developers treat "micro" as a commandment and create services that are too fine-grained. The purpose of service boundaries is to capture a domain or workflow, and those boundaries might be large for some parts of the system.
- **Don't do transactions in microservices -- fix granularity instead** -- This is the single most important guideline. Architects who build microservices and then find themselves wiring transactions across service boundaries have almost certainly made their services too granular. Transaction boundaries are one of the strongest indicators of incorrect granularity. Merge the services that transact together.
- **Bounded context is the architectural unit** -- Each microservice should model a complete bounded context: its own classes, subcomponents, and database schemas. Nothing is shared with other services. Duplication (e.g., an Address class in multiple services) is preferred over coupling. This is the opposite of traditional enterprise thinking where reuse was paramount.
- **Data isolation is non-negotiable** -- Each service must own its data exclusively. No shared databases, no shared schemas, no shared tables. When services need each other's data, they communicate through well-defined APIs or events, never through direct database access. Shared databases are integration points that create hidden coupling.
- **Choreography is the natural default** -- Because microservices favors decoupling, choreography (no central coordinator) is the natural fit. Orchestration should only be introduced for complex multi-step workflows where distributed error handling would be too complex. Even then, the orchestrator should be a thin coordination service, not a place for business logic.
- **Integrators generally win over disintegrators** -- When splitting forces conflict with combining forces, the combining forces usually carry more weight. The cost of fighting integrators (SAGA complexity, excessive communication, duplicated business logic) almost always exceeds the cost of a slightly-too-large service.
- **Iterate on granularity** -- Architects rarely discover the correct granularity, data dependencies, and communication styles on their first pass. Expect to iterate. Start with slightly larger services and split only when a specific disintegrator provides clear justification.
## Examples
**Scenario: Over-split order processing system**
Trigger: "We split our Order service into OrderCreation, OrderValidation, OrderPricing, OrderFulfillment, and OrderNotification. Now placing a single order requires 5 synchronous calls in sequence, latency tripled, and we need distributed transactions between OrderCreation and OrderPricing. How do we fix this?"
Process: Applied disintegrator analysis to each sub-service. Only OrderNotification had independent scalability needs (disintegrator: scalability). All others shared the same code volatility, security level, and fault tolerance requirements. Applied integrator analysis: OrderCreation + OrderPricing had strong database transaction integrator (pricing must be atomic with order creation). OrderCreation + OrderValidation + OrderFulfillment had strong workflow coupling integrator (always called in sequence, never independently). Recommendation: merge OrderCreation, OrderValidation, OrderPricing, and OrderFulfillment back into a single OrderService. Keep OrderNotification separate (different scalability profile, can be async, fault tolerant to delays).
Output: **Reduced from 5 services to 2 (OrderService + NotificationService).** Eliminated 4 synchronous inter-service calls per order. Eliminated distributed transaction between creation and pricing (now a single ACID transaction). OrderNotification communicates via async events (choreography). Latency dropped by ~60%.
**Scenario: Monolith decomposition with mixed granularity needs**
Trigger: "We have a monolithic e-commerce application. We want to move to microservices. The system handles: product catalog, search, user accounts, shopping cart, order processing, payment, inventory management, shipping, reviews, and analytics."
Process: Applied disintegrators systematically. Product Catalog and Search have different scalability needs (search gets 100x more traffic) -- split. Payment has unique security requirements (PCI compliance) -- split. Analytics has different code volatility (changes daily vs monthly for core) -- split. Reviews and Ratings are extensible independently -- split. Applied integrators: Order Processing + Payment have a strong transaction integrator (payment must be atomic with order), but the security disintegrator for Payment is stronger -- keep separate, use orchestrated saga. Shopping Cart + Order Processing have weak integrator (cart is pre-order, separate lifecycle). Inventory + Shipping have moderate data relationship integrator but different scalability needs.
Output: **8 services:** ProductCatalog, Search, UserAccounts, ShoppingCart, OrderProcessing, Payment, InventoryShipping (merged due to data integrator), ReviewsRatings, Analytics. One saga pattern for Order-Payment. Choreography for most communication; orchestration for the order placement workflow (OrderProcessing orchestrates Payment and InventoryShipping).
**Scenario: Distributed transaction redesign**
Trigger: "We're designing a new e-commerce platform with microservices. When a customer places an order, we need to: reserve inventory, charge payment, and create a shipment. If payment fails after inventory is reserved, we need to release it. How do we handle this?"
Process: First evaluated whether these truly need to be separate services (integrator analysis). Inventory and Shipping have different scalability needs (inventory checks happen at browse-time too, not just checkout). Payment has PCI security isolation requirements. These disintegrators justify keeping them separate despite the transaction integrator. Designed an orchestrated saga: OrderService acts as the saga mediator. Step 1: Reserve inventory (do: reserve, undo: release). Step 2: Charge payment (do: charge, undo: refund). Step 3: Create shipment (do: create, undo: cancel). If Step 2 fails, OrderService calls inventory.release(). Each service enters "pending" state until the saga completes. Chose orchestration over choreography because the error handling for compensating transactions is too complex to distribute.
Output: **Orchestrated saga with 3 participants.** OrderService mediates. Each service implements do/undo operations. Pending state tracked in OrderService database. Explicit error handling: if undo itself fails, alert + manual resolution queue. Warning noted: if more than ~30% of workflows need sagas, the granularity should be reconsidered.
## References
- For detailed disintegrator/integrator decision matrices and trade-off tables, see [references/disintegrators-integrators.md](references/disintegrators-integrators.md)
- For identifying initial service candidates, use `component-identifier`
- For choosing between microservices and service-based architecture, use `architecture-style-selector`
- For documenting granularity decisions, use `architecture-decision-record-creator`
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
Install related skills from ClawhHub:
- `clawhub install bookforge-component-identifier`
Or install the full book set from GitHub: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:meta/value-contributions/microservice-granularity-optimizer.json
{
"skill_name": "microservice-granularity-optimizer",
"version": 1,
"source_book": "fundamentals-of-software-architecture",
"source_chapters": [17],
"value_contributions": [
{
"id": "VC-1",
"category": "framework",
"name": "Disintegrator/Integrator Analysis",
"description": "Provides a structured 6-disintegrator + 4-integrator framework for evaluating whether services should be split or merged, replacing gut-feel decisions with evidence-based analysis.",
"without_skill_behavior": "Architects split services based on entity names or arbitrary size rules ('each service should be <500 LOC'). No systematic evaluation of splitting vs. combining forces.",
"with_skill_behavior": "Each service is evaluated against 6 disintegrators (scope, volatility, scalability, fault tolerance, security, extensibility) and each service pair against 4 integrators (transactions, workflow coupling, shared code, data relationships). Decisions are evidence-based with clear thresholds.",
"measurable_impact": "Prevents over-splitting (reduces 15+ service architectures to 5-8 right-sized services) and under-splitting (identifies monolithic services that should be decomposed)."
},
{
"id": "VC-2",
"category": "anti-pattern-prevention",
"name": "Transaction Boundary Detection",
"description": "Identifies distributed transaction anti-patterns early by applying the rule 'Don't do transactions in microservices -- fix granularity instead.' Services that need ACID transactions between them are flagged for merging.",
"without_skill_behavior": "Architects split services and then discover they need distributed transactions. They implement SAGA patterns for operations that should be simple ACID transactions, adding massive complexity.",
"with_skill_behavior": "Transaction requirements are evaluated as the strongest integrator before splitting. Services that transact together are kept together. SAGAs are used only when genuinely necessary (different security/scalability requirements force separation).",
"measurable_impact": "Eliminates unnecessary SAGA implementations. In testing, without-skill outputs proposed SAGAs for workflows that with-skill outputs handled as single-service ACID transactions."
},
{
"id": "VC-3",
"category": "decision-framework",
"name": "Choreography vs. Orchestration Selection",
"description": "Provides structured criteria for choosing between choreography (broker-style, decoupled) and orchestration (mediator-style, coordinated) for each inter-service workflow, based on workflow complexity, error handling needs, and decoupling priority.",
"without_skill_behavior": "Architects default to one pattern for all workflows (usually orchestration via API gateway), or mix patterns inconsistently without reasoning about trade-offs.",
"with_skill_behavior": "Each workflow is evaluated: simple 2-3 service interactions use choreography (preserves decoupling), complex 4+ service workflows with compensating transactions use orchestration (centralizes error handling). The front controller anti-pattern is detected and corrected.",
"measurable_impact": "Produces architectures with appropriate communication patterns per workflow rather than one-size-fits-all, reducing unnecessary coupling while managing complexity."
},
{
"id": "VC-4",
"category": "anti-pattern-prevention",
"name": "Distributed Monolith Detection",
"description": "Detects the distributed monolith anti-pattern: services that share databases, deploy together, or cannot function independently -- giving all the complexity of microservices with none of the benefits.",
"without_skill_behavior": "Architects create services that share databases or have tight runtime coupling, resulting in coordinated deployments and cascading failures. The architecture is a monolith with network hops added.",
"with_skill_behavior": "Data isolation is enforced as non-negotiable. Each service owns its data exclusively. Shared database dependencies are flagged. Services are validated for independent deployability and fault isolation.",
"measurable_impact": "Prevents the most common and costly microservices anti-pattern. Ensures the architecture actually delivers on the promised benefits of microservices."
},
{
"id": "VC-5",
"category": "framework",
"name": "Saga Pattern Design with Compensating Transactions",
"description": "When distributed transactions are genuinely unavoidable, provides a structured approach to designing saga patterns with do/undo operations, pending state management, and error handling for failed compensations.",
"without_skill_behavior": "SAGAs are designed ad-hoc, often missing compensating operations for edge cases, lacking pending state management, and having no plan for when undo operations themselves fail.",
"with_skill_behavior": "Each saga is designed with explicit do/undo operations per service, a coordination choice (choreographed vs orchestrated), pending state tracking, and error escalation for failed compensations. Includes the critical warning: if >30% of workflows need sagas, the granularity is wrong.",
"measurable_impact": "Produces complete, production-ready saga designs rather than hand-wavy 'use saga pattern' recommendations."
},
{
"id": "VC-6",
"category": "quality-validation",
"name": "Microservices Characteristic Ratings Validation",
"description": "Validates the final architecture against the known microservices characteristic ratings (Deploy=4, Elast=5, Evol=5, FaultTol=4, Mod=5, Cost=1, Perf=2, Rel=4, Scale=5, Simple=1, Test=4) to ensure the design leverages strengths and doesn't fight weaknesses.",
"without_skill_behavior": "No validation against architectural style ratings. Architectures may attempt to optimize for characteristics where microservices is structurally weak (e.g., performance, simplicity, cost).",
"with_skill_behavior": "Design is scored against all 11 characteristics. If the system needs high performance (rating: 2 stars) or simplicity (1 star), the architect is warned that microservices structurally cannot deliver this and should consider alternative styles.",
"measurable_impact": "Prevents selection of microservices for systems where its structural weaknesses conflict with requirements."
}
]
}
FILE:references/disintegrators-integrators.md
# Granularity Disintegrators and Integrators -- Deep Reference
> Source: Chapter 17, Fundamentals of Software Architecture (Richards & Ford)
> Read this file when you need detailed trade-off analysis for specific disintegrator/integrator combinations.
## Disintegrators (Forces Pushing Toward Smaller Services)
Disintegrators are reasons to break a service into smaller pieces. Each has a specific test and threshold. A service should only be split when at least one disintegrator clearly applies with concrete evidence.
### 1. Service Scope and Function
**What it tests:** Is the service responsible for too many unrelated business capabilities?
**How to evaluate:**
- Can you describe the service in one sentence without "and"?
- Does the service span multiple bounded contexts?
- Would different teams naturally own different parts of this service?
**Split threshold:** The service handles 3+ distinct business capabilities that could operate independently.
**Example:**
- BAD (too broad): "CustomerService handles registration, preferences, authentication, notification settings, and analytics tracking"
- GOOD (split): CustomerProfile (registration, preferences), AuthService (authentication), NotificationService (settings + delivery), AnalyticsService (tracking)
**Counter-check with integrators:** Before splitting, verify that the proposed sub-services don't have strong transaction or workflow integrators between them.
### 2. Code Volatility
**What it tests:** Do different parts of the service change at significantly different rates?
**How to evaluate:**
- Look at git history: which directories/packages change most frequently?
- Over 6 months, does one component change weekly while another hasn't changed?
- Do deployments of stable components happen only because an unrelated component changed?
**Split threshold:** One component changes 5x+ more frequently than another, causing unnecessary redeployments of stable code.
**Example:**
- A ReportingService contains both report generation (stable, changes quarterly) and report template management (volatile, changes weekly for new template types). Split: ReportEngine (stable) + ReportTemplates (volatile).
**Counter-check:** If the volatile and stable components share significant business logic (shared code integrator), splitting may force you to maintain that logic in two places.
### 3. Scalability
**What it tests:** Do different parts of the service need different numbers of instances or different resource profiles?
**How to evaluate:**
- Does one component need 10 instances while another needs 1?
- Does one component need GPUs while another needs only CPU?
- Do traffic patterns differ (one spikes during events, another is steady)?
**Split threshold:** One component needs 3x+ more instances or fundamentally different resources than another component in the same service.
**Example:**
- A ProductService handles both product catalog browsing (high traffic, needs 20 instances) and product data import (low traffic, needs 1 instance, runs nightly). Split: ProductCatalog (scales horizontally) + ProductImport (single instance, batch).
**Counter-check:** If both components read/write the same database tables, splitting creates data coupling that may negate the scaling benefit.
### 4. Fault Tolerance
**What it tests:** Should a failure in one part of the service NOT affect another part?
**How to evaluate:**
- If Component A fails, must Component B keep working?
- Are there different availability SLAs for different parts?
- Does a non-critical feature bring down a critical one when it fails?
**Split threshold:** A non-critical component's failure has caused (or could cause) downtime in a critical component.
**Example:**
- An OrderService handles both order placement (critical, must be 99.99% available) and order recommendations (nice-to-have, can degrade). If the recommendation engine has a memory leak, it shouldn't crash order placement. Split: OrderProcessing (critical) + OrderRecommendations (degradable).
**Counter-check:** If order processing needs recommendation data to function (data relationship integrator), the split may not achieve true fault isolation.
### 5. Security
**What it tests:** Do different parts require different security levels, access controls, or compliance boundaries?
**How to evaluate:**
- Does one component handle PCI/PII/HIPAA data while another handles public data?
- Do different components need different network zones or encryption levels?
- Would a security audit scope be smaller if the component were isolated?
**Split threshold:** The entire service must run at the highest security level because one component requires it, forcing unnecessary security overhead on other components.
**Example:**
- A UserService handles both user profiles (moderate security) and payment methods (PCI DSS scope). Keeping them together puts the entire service in PCI scope. Split: UserProfile (standard security) + PaymentMethods (PCI-isolated).
### 6. Extensibility
**What it tests:** Is one part of the service likely to grow significantly while other parts remain stable?
**How to evaluate:**
- Are new features concentrated in one component?
- Is one component's API surface growing while another is stable?
- Would adding new functionality to one component benefit from independent deployment?
**Split threshold:** One component is receiving 5x+ more feature requests than others, and the shared deployment pipeline slows down delivery.
---
## Integrators (Forces Pushing Toward Larger Services)
Integrators are reasons to keep services together (or merge services that were split). They represent costs that are incurred when services are separated. Integrators generally win over disintegrators when they apply strongly.
### 1. Database Transactions
**Strength: VERY HIGH** -- This is the most powerful integrator.
**What it tests:** Do operations across these services need ACID guarantees?
**How to evaluate:**
- Must Operation A and Operation B either both succeed or both fail?
- Would eventual consistency cause business-visible problems (double charges, phantom inventory, inconsistent records)?
- Are you building SAGA patterns for operations that were simple database transactions before splitting?
**Merge threshold:** If you need ACID transactions between two services, they almost certainly should be one service. "Don't do transactions in microservices -- fix granularity instead!"
**When to override:** Only when a strong disintegrator (usually security or extreme scalability difference) makes merging impractical AND the business can tolerate eventual consistency with compensating transactions.
**The saga alternative (last resort):**
If services genuinely cannot be merged despite transaction needs, implement the saga pattern:
| Aspect | Choreographed Saga | Orchestrated Saga |
|--------|-------------------|-------------------|
| Coordination | Each service triggers the next | Central mediator coordinates |
| Coupling | Lower | Higher (through mediator) |
| Error handling | Distributed across services | Centralized in mediator |
| Visibility | Low (requires distributed tracing) | High (mediator has full state) |
| Complexity | Higher for complex workflows | Higher for simple workflows |
| Best for | 2-3 service sagas | 4+ service sagas |
**Saga implementation pattern:**
```
For each service operation in the saga:
1. Define the "do" operation (forward action)
2. Define the "undo" operation (compensating action)
3. Define the "pending" state (in-flight marker)
On success: all services commit, clear pending state
On failure at step N:
1. Record failure
2. Call undo on steps N-1, N-2, ..., 1 (reverse order)
3. Report failure to caller
4. If undo itself fails: alert + manual resolution queue
```
### 2. Workflow and Choreography Coupling
**Strength: HIGH**
**What it tests:** Do these services require extensive back-and-forth communication to complete business operations?
**How to evaluate:**
- Count the inter-service calls for common workflows. If a single business operation requires 4+ synchronous calls between two specific services, they are tightly coupled.
- Does Service A frequently wait for Service B's response before it can continue? (Synchronous dependency)
- If Service B is down, can Service A do anything useful?
**Merge threshold:** If Service A calls Service B on every request, and neither can function without the other, they are one service pretending to be two.
**The front controller anti-pattern:**
In choreography, one service often becomes the "first service called" for complex workflows. This service ends up coordinating other services while also maintaining its own domain logic, making it an accidental mediator. If a choreographed service is spending more than ~30% of its logic on coordination, consider either:
- Merging the coordinated services back together, OR
- Extracting the coordination into an explicit orchestrator service (removing it from the domain service)
### 3. Shared Code
**Strength: MODERATE**
**What it tests:** Do these services share significant business logic?
**How to evaluate:**
- Is the same business rule implemented in multiple services?
- When a business rule changes, how many services need updating?
- Is there a shared library containing business logic (not just utilities)?
**Important distinction:**
- **Shared infrastructure code** (logging, monitoring, circuit breakers) -> This is fine. Use the sidecar pattern. Don't merge services because they share operational concerns.
- **Shared business logic** (pricing rules, validation logic, domain calculations) -> This indicates the services may belong together. The shared business logic represents a single bounded context that was split across services.
**Merge threshold:** If changing a business rule requires coordinated updates to 3+ services, the rule likely belongs in a single service that owns that business logic.
### 4. Data Relationships
**Strength: MODERATE**
**What it tests:** Do these services constantly need each other's data?
**How to evaluate:**
- Does Service A frequently query Service B just to get reference data?
- Could Service A cache Service B's data, or does it always need the latest?
- What percentage of Service A's requests require data from Service B?
**Merge threshold:** If >50% of Service A's operations require real-time data from Service B, they have a data relationship that suggests they should be one service.
**Alternatives to merging:**
- **Data replication:** Service B publishes events when its data changes; Service A maintains a read-only copy. Works for reference data that doesn't change frequently.
- **Shared data service:** Extract the shared data into its own service that both consume. Works when the data is a clear domain concept (e.g., "configuration" used by many services).
---
## Disintegrator vs. Integrator Conflict Resolution
When both forces apply to the same service, use this priority framework:
| Conflict | Typical resolution |
|----------|-------------------|
| Scalability disintegrator + Transaction integrator | Keep together. Scale the combined service. Transactions trump scalability unless the scalability difference is extreme (100x+). |
| Security disintegrator + Transaction integrator | Split. Security isolation is hard to compromise. Use saga pattern for the transaction. Accept the complexity cost. |
| Code volatility disintegrator + Shared code integrator | Keep together. Deploy more frequently. The cost of maintaining shared business logic across services usually exceeds the cost of redeploying stable code. |
| Fault tolerance disintegrator + Workflow coupling integrator | Split, but use async communication. The faulting service can degrade gracefully while the critical service continues. Use circuit breakers. |
| Scope disintegrator + Data relationship integrator | Evaluate: can the data relationship be handled by replication or events? If yes, split. If the data must be real-time consistent, keep together. |
## Choreography vs. Orchestration Decision Matrix
| Decision factor | Favors Choreography | Favors Orchestration |
|----------------|:---:|:---:|
| Number of services in workflow | 2-3 | 4+ |
| Error handling complexity | Simple (retry/fail) | Complex (compensating transactions) |
| Workflow visibility needs | Low | High (audit trail, monitoring) |
| Decoupling priority | High | Moderate |
| Domain/architecture isomorphism | Microservices (natural fit) | Structured workflows (better fit) |
| Performance sensitivity | Higher (no mediator bottleneck) | Lower (mediator adds latency) |
| Team autonomy | Teams work independently | Teams coordinate through mediator contract |
**Default recommendation:** Start with choreography. Introduce orchestration only when choreography complexity becomes unmanageable (typically when a single workflow spans 4+ services with compensating transactions).
## The Sidecar Pattern and Service Mesh
**Problem:** Microservices duplicates infrastructure concerns (logging, monitoring, circuit breakers) across services. Each team implementing these independently leads to inconsistency.
**Solution:** The sidecar pattern extracts operational concerns into a separate component that deploys alongside each service. The sidecar handles logging, monitoring, circuit breakers, and other cross-cutting concerns. A shared infrastructure team maintains the sidecar; when they upgrade the monitoring tool, every service gets the upgrade automatically.
**Service mesh:** When all sidecars connect to a shared service plane, the result is a service mesh -- a consistent operational interface across all microservices. The service mesh provides:
- Unified logging and monitoring
- Consistent circuit breaker behavior
- Service discovery
- mTLS between services
- Traffic management and routing
**Key principle:** The sidecar pattern is the correct way to handle operational reuse in microservices. Domain logic is duplicated (each service has its own Address class). Operational logic is shared (via sidecar). This is the opposite of SOA, which tried to share everything.
Choose between broker and mediator event-driven topologies based on workflow control needs, error handling requirements, and performance trade-offs. Use this...
---
name: event-driven-topology-selector
description: Choose between broker and mediator event-driven topologies based on workflow control needs, error handling requirements, and performance trade-offs. Use this skill whenever the user is designing an event-driven system, choosing between choreography and orchestration, deciding how events should flow between processors, debating broker vs mediator, building async workflows, evaluating event-driven error handling strategies, or comparing request-based vs event-based communication models — even if they don't use the terms "broker" or "mediator."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/event-driven-topology-selector
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on: []
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [14]
tags: [software-architecture, architecture, event-driven, broker, mediator, choreography, orchestration, async, messaging, error-handling]
execution:
tier: 1
mode: full
inputs:
- type: none
description: "System description and event processing requirements — the skill guides topology selection"
tools-required: [Read, Write]
tools-optional: [Grep, Glob]
mcps-required: []
environment: "Any agent environment."
---
# Event-Driven Topology Selector
## When to Use
You are designing or evaluating an event-driven architecture and need to choose between broker topology (decentralized event chains) and mediator topology (centralized event orchestration). Typical situations:
- Building a new event-driven system and need to decide how events flow
- Evaluating whether existing event workflows need central coordination
- Debugging error handling problems in an async system — events are being lost or workflows get stuck
- Comparing choreography vs orchestration for inter-service communication
- Deciding whether a use case is better served by request-based or event-based processing
- System has a mix of simple and complex workflows — need to choose the right topology for each
Before starting, verify:
- Has the team already decided on event-driven architecture? If not, this skill selects the TOPOLOGY within event-driven, not whether to use event-driven at all.
- Does the system have async processing needs? If everything is synchronous request-reply, event-driven may not be the right style — consider request-based model first.
## Context & Input Gathering
### Input Sufficiency Check
This skill depends on understanding the WORKFLOW characteristics, not just the system description. The same system may need different topologies for different workflows.
### Required Context (must have — ask if missing)
- **System description and use cases:** What does the system do? What events need to be processed?
- Check prompt for: system purpose, event types, processing steps, workflow descriptions
- If missing, ask: "What does your system do, and what events or workflows need to be processed asynchronously?"
- **Workflow dependencies:** Are processing steps independent or do they depend on each other?
- Check prompt for: step ordering, conditional logic, rollback needs, parallel vs sequential
- If missing, ask: "When an event occurs, do the processing steps depend on each other (step B needs step A's result), or can they all happen independently in parallel?"
- **Error handling requirements:** What happens when a processing step fails?
- Check prompt for: rollback, compensation, retry, notification, data consistency needs
- If missing, ask: "When a processing step fails (e.g., payment declined), do you need to (a) roll back previous steps, (b) retry automatically, (c) just log and continue, or (d) halt everything until resolved?"
- **WHY this is critical:** Error handling is the single biggest differentiator between broker and mediator. Broker topology has no built-in error handling — failed events are silently lost unless you build custom recovery.
### Observable Context (gather from environment)
- **Existing messaging infrastructure:** What message brokers or event systems are in place?
- Look for: RabbitMQ, Kafka, ActiveMQ, AWS SQS/SNS configs, event bus implementations
- Reveals: whether infrastructure already favors one topology
- **Current event patterns:** Are there existing event handlers or processors?
- Look for: event handler classes, message consumers, saga implementations
- Reveals: current topology direction and complexity level
### Default Assumptions
- If error handling requirements unknown, assume they ARE important (safer to recommend mediator and simplify than to recommend broker and discover you need coordination later)
- If workflow complexity unknown, assume moderate complexity (some dependencies between steps)
- If performance requirements unspecified, assume standard (not sub-millisecond)
### Sufficiency Threshold
```
SUFFICIENT: system description + workflow dependencies + error handling needs are known
MUST ASK: error handling requirements are unknown (this drives the entire topology decision)
PROCEED WITH DEFAULTS: workflow dependencies partially known but error handling is clear
```
## Process
### Step 1: Determine If Event-Based Model Is Appropriate
**ACTION:** Evaluate whether the use case is better served by a request-based or event-based processing model.
**WHY:** Not everything should be event-driven. Request-based models are better when processing is data-driven, deterministic, and needs a direct response. Event-based models are better when processing is reactive, requires high responsiveness, and the system must adapt to situations as they arise. Choosing the wrong model wastes the entire topology analysis.
| Dimension | Request-Based | Event-Based |
|-----------|:---:|:---:|
| Communication style | Synchronous | Asynchronous |
| Data access | Request-reply (ask for data) | Fire-and-forget (react to events) |
| Determinism | High — same request gives same path | Lower — event chains are dynamic |
| Responsiveness | Moderate (bound by slowest step) | High (immediate acknowledgment) |
| Typical use case | "Get me the order history" | "A bid was placed, react to it" |
| Workflow control | Easy (caller controls the flow) | Hard (no single controller in broker) |
| Error handling | Straightforward (caller gets error) | Complex (no caller waiting) |
**IF** the use case is purely data-retrieval with synchronous needs, recommend request-based model. Stop here.
**ELSE** proceed to Step 2.
### Step 2: Map the Workflow Characteristics
**ACTION:** For each identified workflow/use case, map its characteristics across the 7 comparison dimensions.
**WHY:** Broker and mediator topologies have opposite strengths. Mapping the workflow against these dimensions prevents gut-feel decisions and reveals which trade-offs matter most for THIS specific system.
For each workflow, evaluate:
| Dimension | Favors Broker | Favors Mediator |
|-----------|:---:|:---:|
| **Workflow control** | No coordination needed — events flow freely | Steps must execute in specific order with conditions |
| **Error handling** | Errors are tolerable or self-healing | Failures require rollback, compensation, or retry coordination |
| **Recoverability** | System can recover organically | Must be able to recover to a known state |
| **Restart capability** | No need to restart a failed workflow | Must restart workflows from point of failure |
| **Scalability need** | Maximum throughput is critical | Moderate throughput is acceptable |
| **Performance need** | Sub-millisecond or very high performance | Standard latency is acceptable |
| **Fault tolerance** | Individual processor failure is acceptable | Single processor failure must not break the chain |
### Step 3: Select the Topology
**ACTION:** Based on the dimension mapping, recommend broker, mediator, or hybrid topology.
**WHY:** The choice is fundamentally a trade-off between workflow control and error handling capability (mediator) versus high performance and scalability (broker). Neither is inherently better — it depends entirely on which dimensions the system prioritizes.
**Decision logic:**
**IF** workflow steps are independent AND error handling is not critical AND performance/scalability are top priorities:
- **Recommend BROKER topology**
- Processors are self-contained, events chain through channels
- No central coordinator — maximum decoupling and performance
- Each processor advertises what it did; other processors react
**IF** workflow steps have dependencies AND error handling/recoverability are important AND workflow must be coordinated:
- **Recommend MEDIATOR topology**
- Central mediator orchestrates the processing steps
- Mediator knows the workflow, manages state, handles errors
- Processing events are commands (things to do) not events (things that happened)
**IF** system has BOTH types of workflows:
- **Recommend HYBRID topology**
- Use mediator for complex workflows requiring coordination
- Use broker for simple, independent event chains
- Route through a simple mediator that classifies events and delegates
### Step 4: Determine Mediator Complexity Level (If Mediator Selected)
**ACTION:** If mediator topology was selected, determine the appropriate mediator implementation complexity.
**WHY:** Mediators range from simple source-code routers to full BPM engines. Over-engineering the mediator wastes months; under-engineering it creates a bottleneck that can't handle the workflow complexity. Matching mediator complexity to workflow complexity is critical.
| Mediator Type | Use When | Implementation |
|---------------|----------|----------------|
| **Simple mediator** | Linear workflows, basic error handling, routing logic | Source code (e.g., Apache Camel, Spring Integration, custom code) |
| **Hardcoded mediator** | Complex conditional workflows, multiple dynamic paths, structured error handling | BPEL engine (e.g., Apache ODE, Oracle BPEL Process Manager) |
| **Complex mediator (BPM)** | Long-running transactions, human intervention points, complex state machines | BPM engine (e.g., jBPM, Camunda) |
**Classify each event type:** Determine if it's simple, hard, or complex. Route through the simple mediator first — it classifies and delegates to the appropriate mediator type. This delegation model handles mixed-complexity events efficiently.
### Step 5: Address Error Handling and Data Loss Prevention
**ACTION:** Design the error handling strategy based on the selected topology.
**WHY:** Asynchronous event-driven architectures have THREE points where data loss can occur in the async communication chain. Protecting only one point still leaves the system vulnerable at the other two. Most architects only think about the message queue and forget about the send and acknowledgment links.
**The three data loss points:**
1. **Message send (producer to queue):** Event is created but never reaches the queue
- **Mitigation:** Synchronous send with broker acknowledgment. Use persistent message queues. The producer waits for confirmation that the message was persisted before proceeding.
2. **Message processing (queue to consumer):** Event is dequeued but consumer crashes before processing
- **Mitigation:** Client acknowledge mode (not auto-acknowledge). The message stays on the queue until the consumer explicitly acknowledges successful processing. If the consumer crashes, the message is re-delivered.
3. **Post-processing (consumer to database):** Event is processed but the database write fails
- **Mitigation:** Use the last participant support pattern — the database commit and the message acknowledgment happen in the same transaction scope. If the DB fails, the message is not acknowledged and will be redelivered.
**For broker topology error handling:**
- Implement the **workflow event pattern**: a dedicated error-handling event processor monitors for failures and can trigger compensating actions
- Use **dead letter queues** for events that fail repeatedly — prevents infinite retry loops and allows manual inspection
**For mediator topology error handling:**
- The mediator itself manages error state — it knows which step failed and can stop the workflow
- Mediator persists workflow state, enabling restart from point of failure
- Compensating transactions can be orchestrated by the mediator (e.g., reverse payment if shipping fails)
### Step 6: Produce the Topology Recommendation
**ACTION:** Compile the complete topology recommendation with rationale.
**WHY:** The recommendation must be specific enough to implement. A vague "use mediator" without explaining the error handling strategy, data loss prevention, and mediator complexity level leaves the team to figure out the hard parts on their own.
## Inputs
- System description and event processing use cases
- Workflow dependencies and ordering requirements
- Error handling and data consistency requirements
- Performance and scalability targets (if known)
## Outputs
### Event-Driven Topology Recommendation
```markdown
# Event-Driven Topology Recommendation: {System Name}
## Request-Based vs Event-Based Assessment
**Model selected:** {Request-based / Event-based / Mixed}
**Rationale:** {why this model fits}
## Workflow Analysis
| Workflow | Steps | Dependencies | Error Handling Need | Topology |
|----------|-------|:---:|:---:|:---:|
| {workflow 1} | {step list} | Independent / Dependent | Low / Medium / High | Broker / Mediator |
| {workflow 2} | ... | ... | ... | ... |
## Topology Decision
### Selected: {Broker / Mediator / Hybrid}
**Primary driver:** {the dimension that tipped the decision}
### 7-Dimension Trade-off Assessment
| Dimension | This System's Need | Broker | Mediator | Fit |
|-----------|-------------------|:---:|:---:|:---:|
| Workflow control | {need level} | Low | High | {which fits} |
| Error handling | {need level} | Low | High | {which fits} |
| Recoverability | {need level} | Low | High | {which fits} |
| Restart capability | {need level} | Low | High | {which fits} |
| Scalability | {need level} | High | Moderate | {which fits} |
| Performance | {need level} | High | Moderate | {which fits} |
| Fault tolerance | {need level} | High | Low | {which fits} |
## Mediator Complexity (if applicable)
**Level:** {Simple / Hardcoded / Complex (BPM)}
**Implementation suggestion:** {specific technology recommendation}
**Rationale:** {why this complexity level}
## Error Handling Strategy
**Data loss prevention:**
- Message send: {mitigation}
- Message processing: {mitigation}
- Post-processing: {mitigation}
**Error recovery pattern:** {workflow event pattern / dead letter queue / mediator-managed / combination}
## Architecture Characteristics Impact
- Performance: {stars}/5
- Scalability: {stars}/5
- Fault tolerance: {stars}/5
- Evolutionary: {stars}/5
- Testability: {stars}/5
```
## Key Principles
- **The choice is workflow control vs performance** — Broker topology maximizes performance, scalability, and decoupling. Mediator topology maximizes workflow control, error handling, and recoverability. Neither is inherently better. The decision hinges on which of these your system values more.
- **Events vs commands reveal the topology** — In broker topology, processing events describe what HAPPENED (order-created, payment-applied). In mediator topology, processing events are COMMANDS telling processors what to DO (place-order, apply-payment). If your events are naturally commands with expected outcomes, you need a mediator.
- **Error handling is the deal-breaker** — If a processing step can fail and the failure requires coordinated recovery (rollback, compensation, retry), broker topology cannot handle this without significant custom infrastructure. The mediator exists precisely for this scenario. When in doubt about error handling needs, lean toward mediator.
- **Protect all three links in the async chain** — Data loss can occur at message send, message processing, and post-processing. Most architects only protect the message queue itself (persistence) but forget about the send confirmation and the consumer acknowledgment. All three must be addressed.
- **Hybrid is often the right answer** — Real systems rarely have uniformly simple or uniformly complex workflows. A simple mediator that classifies incoming events and delegates simple ones to broker-style processing while routing complex ones through a full mediator gives the best of both worlds.
- **Match mediator complexity to workflow complexity** — Using a BPM engine for simple routing wastes months of effort. Using source-code routing for complex workflows with human intervention points creates unmaintainable spaghetti. Classify your events (simple/hard/complex) and pick the mediator type accordingly.
## Examples
**Scenario: Order fulfillment with payment rollback**
Trigger: "We're building an order fulfillment system. When a customer places an order, we need to validate inventory, charge payment, send confirmation email, update warehouse, and notify shipping. If payment fails, we need to rollback the inventory reservation."
Process: Mapped workflow — steps have dependencies (payment must succeed before fulfillment). Error handling is critical (payment failure requires inventory rollback). This is a coordinated workflow with compensation requirements. Evaluated 7 dimensions: workflow control = high need, error handling = high need, recoverability = high need. Performance and scalability are standard. All three critical dimensions favor mediator.
Output: **Mediator topology.** Simple mediator implementation (source code, e.g., custom orchestrator or Apache Camel). 5-step workflow: (1) create order, (2) process order (email + payment + inventory in parallel), (3) fulfill order, (4) ship order, (5) notify customer. Mediator waits for acknowledgment from parallel step 2 processors before proceeding. If payment fails at step 2, mediator triggers inventory rollback and halts workflow. Data loss prevention: persistent queues with synchronous send, client-acknowledge mode, last-participant-support for DB writes.
**Scenario: Social media fan-out with independent processors**
Trigger: "Users post content that needs to: update feeds, notify followers, run content moderation, update search index, and generate analytics. These are all independent."
Process: Mapped workflow — all steps are independent (no ordering, no dependencies). Error handling is low priority (if search indexing fails, it can retry independently without affecting other steps). Evaluated 7 dimensions: workflow control = not needed, error handling = low (each processor handles its own), scalability = high (viral posts need fan-out), performance = high (real-time feed updates). All critical dimensions favor broker.
Output: **Broker topology.** Post-created initiating event fans out to 5 independent event processors. Each processor publishes its own processing event (feed-updated, followers-notified, etc.) for extensibility. No mediator needed — processors are self-contained. Dead letter queues for each processor to catch persistent failures. Per-processor scaling based on load.
**Scenario: Mixed workloads — trading platform with compliance**
Trigger: "Trade events need sub-millisecond processing. We also have compliance reporting that aggregates trades daily with complex rules."
Process: Identified two distinct workflows. Trade execution: independent, performance-critical, fault-tolerant — classic broker. Compliance reporting: complex rules, conditional paths, must complete all steps, needs audit trail — classic mediator. Recommended hybrid topology.
Output: **Hybrid topology.** Trade execution path uses broker topology for maximum performance — trade-executed events fan out to position tracking, risk calculation, and P&L processors independently. Compliance reporting path uses mediator topology — daily compliance mediator orchestrates trade aggregation, rule evaluation, exception flagging, and report generation in sequence. Simple event router at entry point classifies events by type and delegates to the appropriate topology. Trade path uses Kafka for high-throughput; compliance path uses RabbitMQ with a lightweight orchestrator.
## References
- For the detailed broker vs mediator comparison table with full trade-off analysis, see [references/broker-vs-mediator-comparison.md](references/broker-vs-mediator-comparison.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/broker-vs-mediator-comparison.md
# Broker vs Mediator Topology — Detailed Comparison
Read this reference when you need the full trade-off details for broker and mediator topologies, including architecture characteristics ratings, implementation guidance, and anti-pattern warnings.
## Topology Overview
### Broker Topology
**Architecture components:** Initiating event, event broker (message broker), event processors, processing events.
**How it works:**
1. An initiating event enters the system (e.g., PlaceOrder)
2. The event is sent to an event channel in the event broker
3. A single event processor picks up the initiating event and processes it
4. That processor advertises what it did by publishing a processing event
5. Other processors react to the processing event, do their work, and publish their own processing events
6. This chain continues until no processor is interested in the latest processing event
**Key metaphor:** A relay race. Each runner (processor) takes the baton, runs their leg, and hands off. Once they hand off, they're done. No coach (mediator) tells them what to do.
**Event broker implementation:** Usually federated (multiple domain-based clustered instances). Uses topics (publish-subscribe) for the fire-and-forget broadcasting model. Technologies: RabbitMQ, ActiveMQ, HornetQ, Kafka.
**Best practice:** Every event processor should advertise what it did, even if no other processor currently cares. This provides architectural extensibility — new processors can tap into existing event streams without modifying the producers.
### Mediator Topology
**Architecture components:** Initiating event, event queue, event mediator, event channels, event processors.
**How it works:**
1. An initiating event enters the system and is placed on an event queue
2. The event mediator picks up the event from the queue
3. The mediator knows the workflow steps required to process this event
4. The mediator generates processing events (commands) sent to dedicated event channels (point-to-point queues)
5. Event processors listen on their dedicated channels, process the command, and respond back to the mediator
6. The mediator waits for acknowledgment before proceeding to the next step
7. Steps within a workflow group execute concurrently; steps across groups execute serially
**Key metaphor:** An orchestra conductor. The conductor (mediator) knows the score (workflow), directs each section (processors) when to play, and coordinates the overall performance.
**Processing event semantics:** In the mediator topology, processing events are COMMANDS — things that NEED to happen (place-order, send-email, apply-payment). In broker topology, they are EVENTS — things that HAVE happened (order-created, email-sent, payment-applied). This semantic difference is fundamental.
## Trade-Off Comparison Table
### Broker Topology
| Advantage | Detail |
|-----------|--------|
| Highly decoupled event processors | Processors don't know about each other. Adding/removing processors requires no changes to others. |
| High scalability | Each processor scales independently. No bottleneck from a central coordinator. |
| High responsiveness | Events are processed as they arrive. No waiting for a coordinator to schedule. |
| High performance | No intermediary adds latency. Direct event-channel-to-processor communication. |
| High fault tolerance | One processor failing doesn't affect others. The broker provides back-pressure. |
| Disadvantage | Detail |
|-------------|--------|
| Workflow control | No component knows the overall workflow state. No one knows when a business transaction is "complete." |
| Error handling | If a processor crashes mid-processing, no component is aware. The business process gets stuck with no automatic recovery. Other processors continue as if everything is fine. |
| Recoverability | Because no component owns the workflow state, there's no way to recover to a known good state. |
| Restart capabilities | Cannot restart from point of failure. The initiating event has already been consumed and processed. Re-submitting it would duplicate work. |
| Data inconsistency | With no coordination, processors can get out of sync. Inventory may be decremented even when payment failed. |
### Mediator Topology
| Advantage | Detail |
|-----------|--------|
| Workflow control | The mediator knows the complete workflow. It knows which steps are done, which are pending, and what comes next. |
| Error handling | The mediator receives error responses from processors. It can stop the workflow, trigger compensation, or retry. |
| Recoverability | The mediator persists workflow state. If the mediator crashes, it can recover and resume from its persisted state. |
| Restart capabilities | Workflows can restart from the point of failure. The mediator knows exactly where the workflow stopped. |
| Better data consistency | The mediator coordinates steps, ensuring downstream steps don't execute until upstream steps succeed. |
| Disadvantage | Detail |
|-------------|--------|
| More coupling of event processors | Processors are coupled to the mediator's command structure. Adding a new step means changing the mediator. |
| Lower scalability | The mediator is a potential bottleneck. Although you can have multiple mediators per domain, each mediator instance is a single point of coordination. |
| Lower performance | The mediator adds an intermediary hop. Processing events go: processor -> mediator -> next processor, rather than processor -> broker -> next processor directly. |
| Lower fault tolerance | If the mediator goes down, all workflows it manages are halted. This is a single point of failure per domain. |
| Modeling complex workflows | Declaratively modeling dynamic, conditional processing in a mediator (especially BPEL) is difficult. Many workflows end up as hybrid (mediator for general flow, broker for dynamic parts). |
## 7-Dimension Direct Comparison
| Dimension | Broker | Mediator | Notes |
|-----------|:---:|:---:|-------|
| Workflow control | Low | High | The defining trade-off. Broker has NO workflow awareness. |
| Error handling | Low | High | Broker: errors are silent. Mediator: errors are caught and managed. |
| Recoverability | Low | High | Broker: no state to recover. Mediator: persists workflow state. |
| Restart | Low | High | Broker: can't restart. Mediator: restarts from point of failure. |
| Scalability | High | Moderate | Broker: no bottleneck. Mediator: coordinator can bottleneck. |
| Performance | High | Moderate | Broker: direct communication. Mediator: extra hop through coordinator. |
| Fault tolerance | High | Low | Broker: isolated failures. Mediator: coordinator is SPOF. |
## Architecture Characteristics Ratings (Event-Driven Overall)
| Characteristic | Rating | Notes |
|---------------|:---:|-------|
| Partitioning type | Technical | Events and processors organized by processing type |
| Number of quanta | 1 to many | Depends on topology and processor isolation |
| Deployability | 2/5 | Complex deployment due to event contracts and processor dependencies |
| Elasticity | 3/5 | Processors can scale independently (better in broker) |
| Evolutionary | 5/5 | Excellent — new processors added without changing existing ones |
| Fault tolerance | 5/5 | Processor isolation prevents cascading failures (better in broker) |
| Modularity | 4/5 | Good separation of processing concerns |
| Overall cost | 3/5 | Moderate — messaging infrastructure adds cost |
| Performance | 5/5 | Async processing eliminates blocking (better in broker) |
| Reliability | 3/5 | Moderate — async complexity can reduce reliability |
| Scalability | 5/5 | Excellent processor-level scaling (better in broker) |
| Simplicity | 1/5 | Low — async event flows are inherently complex to reason about |
| Testability | 2/5 | Low — testing async event chains is difficult |
## Mediator Complexity Levels — Detailed Guide
### Simple Mediator (Source Code)
**When to use:**
- Events require simple error handling and orchestration
- Workflows are mostly linear with basic conditional branching
- No long-running transactions or human intervention points
**Implementation options:** Apache Camel, Mule ESB, Spring Integration, custom source code (Java, C#, etc.)
**How it works:** Message flows and routes are written in programming code. The mediator intercepts events, classifies them (simple/hard/complex), and either processes them directly or delegates to more capable mediators.
**Advantages:** Easy to write and maintain. Fast to develop. Good for 80% of events.
### Hardcoded Mediator (BPEL)
**When to use:**
- Complex conditional processing with multiple dynamic paths
- Structured error handling with redirection and multicasting
- Workflows that are well-defined but have many branches
**Implementation options:** Apache ODE, Oracle BPEL Process Manager
**How it works:** Uses Business Process Execution Language (BPEL), an XML-like structure describing processing steps, error handling, redirection, and multicasting. Usually created via graphical interface tools.
**Advantages:** Declarative workflow definition. Good tooling for visualization. Handles complex branching well.
**Limitations:** Does NOT handle long-running transactions with human intervention. BPEL is powerful but relatively complex to learn.
### Complex Mediator (BPM)
**When to use:**
- Long-running transactions requiring human intervention
- Workflows that pause and wait for external actions (approvals, manual reviews)
- Complex state machines with many possible states
**Implementation options:** jBPM, Camunda, Activiti
**How it works:** Uses Business Process Management engines that support human task management, timers, complex state transitions, and long-running process instances.
**Example:** A stock trade where the mediator must pause processing, notify a senior trader for manual approval (because the trade exceeds a threshold), and wait for the approval before continuing.
**Advantages:** Handles human-in-the-loop workflows. Persistent process state. Resume from any point.
**Limitations:** Heavy infrastructure. Overkill for simple event routing.
### Mediator Delegation Model
Given that real systems have a MIX of simple, hard, and complex events, the recommended approach is:
1. ALL events enter through a **Simple Event Mediator** (source code)
2. The simple mediator classifies each event as simple, hard, or complex
3. Simple events are processed directly by the simple mediator
4. Hard events are forwarded to the **BPEL mediator**
5. Complex events are forwarded to the **BPM mediator**
This delegation model ensures each event type is processed by the mediator with the appropriate capability level, without over-engineering simple cases or under-engineering complex ones.
## Error Handling Patterns
### Workflow Event Pattern (Broker Topology)
Since broker topology has no central coordinator for error handling, the **workflow event pattern** provides a mechanism:
1. A dedicated **workflow processor** monitors the event flow
2. When it detects a failure (processor doesn't emit expected processing event within timeout), it generates a corrective workflow event
3. Other processors react to the corrective event (e.g., reverse inventory, issue refund)
**Limitation:** This effectively rebuilds mediator-like coordination on top of broker topology. If you need extensive use of this pattern, you probably need a mediator.
### Dead Letter Queues
For events that fail processing repeatedly:
1. After N retry attempts, move the failed event to a **dead letter queue**
2. Dead letter queues are monitored for manual inspection and resolution
3. Prevents infinite retry loops that waste resources
4. Provides an audit trail of failed events
### Data Loss Prevention — The Three-Link Chain
Every asynchronous message exchange has three potential failure points:
```
[Producer] ---(Link 1)---> [Message Queue] ---(Link 2)---> [Consumer] ---(Link 3)---> [Database]
```
| Link | Failure Mode | Prevention |
|------|-------------|------------|
| **Link 1: Send** | Producer sends message but it doesn't reach the queue (network failure, broker down) | Use **synchronous send** with broker acknowledgment. Producer blocks until broker confirms message persistence. |
| **Link 2: Processing** | Message is dequeued but consumer crashes before completing processing | Use **client acknowledge mode** (not auto-acknowledge). Message stays on queue until consumer explicitly acknowledges. On crash, message is redelivered to another consumer instance. |
| **Link 3: Post-processing** | Consumer processes message but database write fails | Use **last participant support** pattern. Database commit and message acknowledgment are in the same transaction scope. If DB fails, message is not acknowledged and will be redelivered. |
**Critical insight:** Most architects protect only Link 2 (persistent message queues). This still leaves data loss at Link 1 (fire-and-forget sends) and Link 3 (process-then-ack without DB transaction). ALL THREE links must be protected for reliable event processing.
## Request-Based vs Event-Based Model Comparison
Use this table when deciding whether a use case should be event-driven at all, before selecting broker vs mediator.
| Dimension | Request-Based | Event-Based |
|-----------|:---:|:---:|
| **Communication** | Synchronous request-reply | Asynchronous fire-and-forget |
| **Coupling** | Higher (caller knows the callee) | Lower (publisher doesn't know subscribers) |
| **Data pattern** | Pull: ask for data | Push: react to occurrences |
| **Determinism** | High: same request, same processing path | Lower: event chains are dynamic |
| **Responsiveness** | Bound by total processing time | Immediate acknowledgment |
| **Scalability** | Limited by synchronous chain | High: processors scale independently |
| **Error handling** | Simple: caller gets error response | Complex: no caller waiting for response |
| **Workflow visibility** | Clear: follow the request path | Opaque: follow event chains across processors |
**Use request-based when:** Retrieving data, synchronous operations, well-defined request-response patterns, when the caller needs an immediate answer.
**Use event-based when:** Reacting to situations, high responsiveness needed, fire-and-forget operations, multiple independent processors need to react to the same occurrence.
Evaluate whether a system should adopt distributed architecture by systematically checking against the 8 Fallacies of Distributed Computing and assessing tea...
---
name: distributed-feasibility-checker
description: Evaluate whether a system should adopt distributed architecture by systematically checking against the 8 Fallacies of Distributed Computing and assessing team/operational readiness. Use this skill whenever the user is considering microservices, debating monolith vs distributed, hearing "let's use microservices," evaluating operational readiness for distribution, or experiencing growing pains with a monolith — even if they don't mention "distributed computing fallacies."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/distributed-feasibility-checker
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
depends-on: []
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [9, 17, 18]
tags: [software-architecture, architecture, distributed-systems, microservices, monolith, fallacies, feasibility]
execution:
tier: 1
mode: full
inputs:
- type: none
description: "System description and team context — the skill guides the evaluation"
tools-required: [Read, Write]
tools-optional: [Grep, Glob]
mcps-required: []
environment: "Any agent environment. If a codebase exists, can scan for distribution indicators."
---
# Distributed Architecture Feasibility Checker
## When to Use
Someone is proposing or considering distributed architecture (microservices, service-based, event-driven) and you need to evaluate whether it's actually justified and feasible. Typical situations:
- "Let's move to microservices" — but has anyone checked if the team is ready?
- Growing pains with a monolith — but is distribution the right solution?
- CTO or tech lead pushing for distribution based on industry hype
- Pre-requisite sanity check before `architecture-style-selector`
- Post-quantum-analysis: you've identified multiple quanta, now check if the team can actually operate distributed
Before starting, verify:
- Is there a genuine architectural problem to solve? (If the monolith is working fine, distribution adds cost without benefit)
- Has quantum analysis been done? If not, distribution may not even be needed (use `architecture-quantum-analyzer` first)
## Context & Input Gathering
### Input Sufficiency Check
This skill critically depends on TEAM context, not just technical requirements. A system that technically needs distribution may still fail if the team can't operate it.
### Required Context (must have — ask if missing)
- **System description:** What does the system do? What's the current architecture?
→ Check prompt for: system purpose, current state (monolith/distributed/greenfield)
→ If missing, ask: "What does your system do, and what's your current architecture? (monolith, some services, greenfield?)"
- **Team size and distributed experience:** How many developers? Have they operated distributed systems before?
→ Check prompt for: team size, experience mentions, technology familiarity
→ If missing, ask: "How many developers do you have, and has your team operated distributed systems (microservices, message queues, service mesh) before?"
→ **WHY this is critical:** Team experience is the #1 predictor of distributed architecture success. A team that's never run microservices will struggle regardless of technical merit.
- **Motivation for considering distribution:** Why are they thinking about this?
→ Check prompt for: scaling issues, deployment pain, team autonomy needs, hype
→ If missing, ask: "What specific problem is driving you toward distributed architecture? (a) scaling bottleneck, (b) deployment takes too long, (c) teams stepping on each other, (d) someone said we should, (e) other?"
### Observable Context (gather from environment)
- **Current infrastructure:** What deployment and monitoring tools exist?
→ Look for: docker-compose, k8s manifests, CI/CD configs, monitoring configs
→ Reveals: operational maturity level
- **Service communication:** Are there already distributed calls?
→ Look for: HTTP client imports, message queue configs, gRPC definitions
→ Reveals: whether distribution has already started
### Default Assumptions
- If team experience unknown → assume NO distributed experience (safer to overestimate the challenge)
- If monitoring tools unknown → assume basic logging only (no distributed tracing)
- If motivation unclear → probe before proceeding — distribution without clear motivation is the biggest risk
### Sufficiency Threshold
```
SUFFICIENT: system description + team size + team experience + motivation are known
MUST ASK: team experience is unknown (this is NEVER safe to default)
```
## Process
### Step 1: Understand the Motivation
**ACTION:** Clarify WHY distribution is being considered. Categorize the motivation.
**WHY:** The most dangerous path to distributed architecture is "because everyone else is doing it." Valid motivations have specific, measurable problems. Invalid motivations are based on hype, resume-driven development, or "Netflix does it." Categorizing the motivation early prevents wasted analysis.
| Motivation | Validity | Next step |
|-----------|:---:|----------|
| Specific scaling bottleneck in one part | Valid | Quantify the bottleneck |
| Deployment takes too long (all-or-nothing) | Valid | Check if modular monolith solves it first |
| Teams blocking each other on shared code | Valid | Check if code ownership solves it first |
| "Everyone uses microservices now" | **Invalid** | Push back — this isn't a problem statement |
| "Our CTO read an article" | **Invalid** | Ask what specific problem they're trying to solve |
| Technology exploration / learning | Partially valid | Be honest about the cost of learning in production |
### Step 2: Evaluate Against the 8 Fallacies
**ACTION:** Systematically evaluate the project against each of the 8 Fallacies of Distributed Computing. For each, assess: does the team understand this risk? Do they have mitigations?
**WHY:** The 8 fallacies are assumptions that developers make about distributed systems that are FALSE. Every distributed system must contend with all 8. Teams that haven't thought about them will be surprised — and surprises in distributed systems cause outages. Using these as a checklist transforms abstract knowledge into a concrete readiness assessment.
For each fallacy, evaluate:
| # | Fallacy | The false assumption | Reality check question |
|---|---------|---------------------|----------------------|
| 1 | **The Network Is Reliable** | Network calls always succeed | Do you have timeouts and circuit breakers? What happens when Service B is unreachable? |
| 2 | **Latency Is Zero** | Remote calls are as fast as local | What's your average and 95th-percentile latency? How many chained service calls per request? |
| 3 | **Bandwidth Is Infinite** | Send as much data as you want | Are you sending entire objects when you only need a few fields? (Stamp coupling) |
| 4 | **The Network Is Secure** | Internal network is safe | Does distribution multiply your attack surface? How many new network endpoints? |
| 5 | **The Topology Never Changes** | Network layout is fixed | What happens when ops upgrades routers on the weekend? Do your services use hardcoded IPs? |
| 6 | **There Is Only One Administrator** | One team controls everything | How many teams manage infrastructure? Who coordinates deployments? |
| 7 | **Transport Cost Is Zero** | Network calls are free | What's the actual infrastructure cost of service mesh, load balancers, API gateways? |
| 8 | **The Network Is Homogeneous** | All network equipment is the same | Do you run multi-cloud? Different hardware vendors? |
### Step 3: Assess Operational Readiness
**ACTION:** Evaluate whether the team has the operational maturity to run distributed systems.
**WHY:** Distribution doesn't just change how you code — it fundamentally changes how you operate. Distributed logging, distributed tracing, distributed transactions, independent deployments, service discovery, contract versioning — these are operational capabilities that don't exist in monolith-land. A team without these capabilities will build a distributed system they can't debug, can't deploy safely, and can't monitor.
| Capability | Question | Ready if... | Not ready if... |
|-----------|---------|-------------|-----------------|
| **Distributed logging** | Can you correlate logs across services? | Have ELK/Datadog with correlation IDs | Console.log to stdout per service |
| **Distributed tracing** | Can you trace a request across service boundaries? | Have Jaeger/Zipkin/Datadog APM | No tracing infrastructure |
| **CI/CD per service** | Can you deploy one service without deploying all? | Per-service pipelines with independent versioning | Single pipeline deploying everything |
| **Service discovery** | How do services find each other? | Service mesh, DNS-based, or registry | Hardcoded URLs in config |
| **Contract management** | How do you handle API changes between services? | Versioned APIs, consumer-driven contract tests | No versioning strategy |
| **Monitoring & alerting** | Can you detect when one service is degrading? | Per-service health checks, SLO dashboards | Aggregate-only monitoring |
### Step 4: Check for Simpler Alternatives
**ACTION:** Before recommending distribution, verify that simpler solutions don't solve the problem.
**WHY:** Distribution is the most expensive solution to almost any problem. A modular monolith with good code boundaries solves many of the same problems (team autonomy, code organization, independent development) without the operational overhead. The book explicitly states that monolith advantages are REAL — simpler deployment, simpler testing, simpler debugging, lower cost. Distribution should be the LAST option, not the first.
| Problem | Simpler alternative | When it's NOT enough |
|---------|-------------------|---------------------|
| Deployment takes too long | Modular monolith with independent module builds | Different modules need different deployment frequencies |
| Teams stepping on each other | Code ownership + branch-by-abstraction | Teams need different technology stacks |
| One part needs to scale | Separate the hot path only (strangler fig) | 3+ parts need independent scaling |
| "It's too complex" | Better module boundaries, cleaner interfaces | Genuine bounded contexts with different data models |
### Step 5: Produce the Feasibility Assessment
**ACTION:** Compile a structured go/no-go assessment with specific recommendations.
**WHY:** The value of this skill is the structured, honest assessment — not a blanket "yes" or "no" to microservices. Some teams are ready. Some aren't. Some should start with a single service extraction, not full distribution. The assessment should be specific enough to act on.
## Inputs
- System description and current architecture
- Team size, experience, and operational capabilities
- Motivation for considering distribution
## Outputs
### Distributed Architecture Feasibility Assessment
```markdown
# Feasibility Assessment: {System Name}
## Motivation Analysis
**Stated motivation:** {what the team says}
**Validated motivation:** {Valid / Invalid / Partially valid}
**Underlying problem:** {the real problem, which may differ from stated motivation}
## 8 Fallacies Evaluation
| # | Fallacy | Team awareness | Mitigations in place | Risk level |
|---|---------|:---:|:---:|:---:|
| 1 | Network Is Reliable | Yes/No | {specific mitigations or "none"} | Low/Med/High |
| 2 | Latency Is Zero | Yes/No | {mitigations} | Low/Med/High |
| 3 | Bandwidth Is Infinite | Yes/No | {mitigations} | Low/Med/High |
| 4 | Network Is Secure | Yes/No | {mitigations} | Low/Med/High |
| 5 | Topology Never Changes | Yes/No | {mitigations} | Low/Med/High |
| 6 | Only One Administrator | Yes/No | {mitigations} | Low/Med/High |
| 7 | Transport Cost Is Zero | Yes/No | {mitigations} | Low/Med/High |
| 8 | Network Is Homogeneous | Yes/No | {mitigations} | Low/Med/High |
**Fallacy readiness score:** {X}/8 mitigated
## Operational Readiness
| Capability | Status | Gap |
|-----------|:---:|-----|
| Distributed logging | Ready/Not ready | {what's missing} |
| Distributed tracing | Ready/Not ready | {what's missing} |
| CI/CD per service | Ready/Not ready | {what's missing} |
| Service discovery | Ready/Not ready | {what's missing} |
| Contract management | Ready/Not ready | {what's missing} |
| Monitoring & alerting | Ready/Not ready | {what's missing} |
**Operational readiness score:** {X}/6 capabilities in place
## Simpler Alternatives Considered
| Alternative | Solves the problem? | Why/why not |
|------------|:---:|-------------|
| Modular monolith | Yes/No/Partially | {reasoning} |
| Single service extraction | Yes/No/Partially | {reasoning} |
| Better code boundaries | Yes/No/Partially | {reasoning} |
## Recommendation
**{Go / No-Go / Conditional Go}**
- {Primary reasoning}
- {If conditional: what must be done first}
## If Proceeding: Readiness Roadmap
1. {First capability to build before distributing}
2. {Second capability}
3. {Suggested first service to extract}
```
## Key Principles
- **Distribution is a trade-off, not an upgrade** — Distributed architecture gains scalability and team autonomy but pays with operational complexity, debugging difficulty, and infrastructure cost. It's not inherently better than monolith — it's different, with different trade-offs. The 8 fallacies are the price of admission.
- **Team readiness trumps technical need** — A technically justified distributed architecture operated by an unprepared team produces worse outcomes than a monolith. Team experience with distributed operations, monitoring, and debugging is the #1 success predictor.
- **Check for simpler solutions first** — A modular monolith with clean boundaries solves 80% of the problems people think require microservices, at 20% of the operational cost. Distribution should be the LAST option, not the first.
- **Monolith is not a dirty word** — The book explicitly defends monolith advantages: simpler deployment, simpler testing, simpler debugging, lower operational cost. Many successful systems run as monoliths. Don't recommend distribution to be "modern."
- **The distributed monolith is the worst outcome** — Adopting microservices but keeping all the coupling of a monolith gives you the operational overhead of distribution with none of the benefits. This is the most common result of premature distribution.
- **Latency is the deal-breaker** — Fallacy #2 is the primary factor in whether distribution is feasible. If your request chains 10 service calls at 100ms each, you've added 1 second of latency. Know your numbers before committing.
## Examples
**Scenario: Startup wanting microservices**
Trigger: "We're 5 developers building a SaaS. Should we start with microservices?"
Process: Asked about motivation — "our CTO says it's best practice." Invalid motivation — no specific problem. Evaluated: team has no distributed experience, no monitoring beyond basic logging, single CI pipeline. Checked simpler alternatives: modular monolith solves all current needs. Fallacy check: 0/8 mitigated. Operational readiness: 0/6.
Output: **No-Go.** Recommended modular monolith with clean domain boundaries. Distribution adds operational cost the 5-person team can't absorb. Revisit when: team hits 15+ developers, or specific parts need independent scaling proven by data.
**Scenario: Growing monolith with real pain**
Trigger: "We have 40 developers, deployments take 2 hours, and the payment module keeps bringing down the whole site during Black Friday."
Process: Valid motivation — specific scaling bottleneck + deployment pain. Team has 3 years of monolith experience, basic CI/CD, Datadog for monitoring but no distributed tracing. Fallacy evaluation: aware of #1 and #2, unaware of #3-#8. Operational readiness: 2/6 (monitoring, basic CI/CD). Simpler alternatives: modular monolith partially solves deployment but not the Black Friday scaling.
Output: **Conditional Go.** Extract payment module as first service (strangler fig pattern). Before extracting: implement distributed tracing, per-service CI/CD pipeline, and circuit breakers. Don't attempt full microservices — start with 2-3 services maximum.
**Scenario: Already distributed but struggling**
Trigger: "We went microservices 6 months ago and everything is on fire. Can't trace bugs, deployments are a nightmare, and our latency tripled."
Process: Diagnosed against fallacies: Fallacy #2 (latency) — chaining 8 synchronous calls, 95th percentile at 2 seconds. Fallacy #1 — no circuit breakers, cascading failures. Operational readiness: 1/6 (only basic logging). The team fell into the distributed monolith anti-pattern — services share a database and deploy in lockstep.
Output: Feasibility assessment showing the team wasn't ready. Recommended: consolidate back to 3-4 larger services (from 12), implement distributed tracing and circuit breakers, establish per-service databases before attempting fine-grained services again.
## References
- For the full 8 Fallacies with detailed mitigations, see [references/eight-fallacies.md](references/eight-fallacies.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
Create effective development checklists (code completion, unit/functional testing, software release) that teams will actually follow. Use this skill whenever...
---
name: development-checklist-generator
description: Create effective development checklists (code completion, unit/functional testing, software release) that teams will actually follow. Use this skill whenever the user needs to create a checklist for code review, testing, deployment, or release processes, wants to improve team quality by catching recurring mistakes, has a team that ignores existing checklists because they're too long, needs to define "definition of done" for development tasks, wants to reduce production incidents caused by human error, or asks about checklist best practices for software teams — even if they don't explicitly say "checklist."
version: 1.0.0
homepage: https://github.com/bookforge-ai/bookforge-skills/tree/main/books/fundamentals-of-software-architecture/skills/development-checklist-generator
metadata: {"openclaw":{"emoji":"📚","homepage":"https://github.com/bookforge-ai/bookforge-skills"}}
status: draft
source-books:
- id: fundamentals-of-software-architecture
title: "Fundamentals of Software Architecture"
authors: ["Mark Richards", "Neal Ford"]
chapters: [22]
tags: [software-architecture, architecture, checklists, quality, process, team-effectiveness, deployment]
depends-on: []
execution:
tier: 2
mode: full
inputs:
- type: codebase
description: "Optionally, a codebase to analyze for common issues that should be on checklists"
tools-required: [Read, Write]
tools-optional: [Grep, Bash]
mcps-required: []
environment: "Any agent environment. Codebase access optional but improves checklist specificity."
---
# Development Checklist Generator
## When to Use
You need to create or improve development checklists that catch common mistakes without becoming burdensome process overhead. Typical triggers:
- Recurring bugs or production incidents caused by the same types of mistakes
- A team that has no formal "definition of done" for code completion
- Deployment failures caused by forgotten steps (wrong config, missing migration, stale cache)
- An existing checklist that nobody follows because it's too long or procedural
- The user wants to reduce reliance on manual quality checks
Before starting, verify:
- What type of checklist is needed (code completion, testing, release)?
- What problems is the team currently experiencing?
## Context
### Required Context (must have before proceeding)
- **Checklist type:** What kind of checklist is needed?
-> Check prompt for: "code review," "testing," "deployment," "release," "definition of done"
-> If still missing, ask: "What type of checklist do you need — code completion, testing, or software release?"
- **Current pain points:** What problems is the team experiencing?
-> Check prompt for: bugs, incidents, deployment failures, missed steps, recurring issues
-> If still missing, ask: "What specific problems or recurring mistakes are you trying to prevent?"
### Observable Context (gather from environment)
- **Existing processes:** Does the team have any existing checklists or quality processes?
-> Check prompt for: "existing checklist," "current process," "QA process," "CI/CD pipeline"
-> If unavailable: assume no existing checklists
- **Team size and culture:** How large is the team and how process-tolerant are they?
-> Check prompt for: team size, complaints about process, "too many meetings/processes"
-> If unavailable: assume moderate process tolerance
- **Technology stack:** What technologies does the team use?
-> Check prompt for: languages, frameworks, databases, deployment tools
-> If unavailable: create technology-agnostic checklist items
- **Codebase patterns:** If codebase is available, scan for common issues
-> Look for: hardcoded values, missing error handling, inconsistent logging, configuration patterns
-> If unavailable: use generic best-practice items
### Default Assumptions
- If checklist type unclear -> create all three types (code completion, testing, release) starting with what addresses the pain points
- If team culture unknown -> err on the side of shorter checklists (5-10 items)
- If no existing process -> start fresh with minimal viable checklist
### Sufficiency Threshold
```
SUFFICIENT when ALL of these are true:
- Checklist type is known
- At least 2-3 specific pain points or recurring issues are described
- Technology context is available (from prompt or codebase)
PROCEED WITH DEFAULTS when:
- Checklist type is known
- General pain points can be inferred
- Technology-agnostic items will be useful
MUST ASK when:
- Neither the checklist type nor the problems are described
- The request is too vague to produce anything actionable
```
## Process
### Step 1: Distinguish Checklists from Procedures
**ACTION:** Verify that what the user needs is actually a checklist, not a procedure. If the user's current "checklist" is procedural, explain the difference and restructure it.
**WHY:** A checklist is a set of independent verification items that can be checked in any order. A procedure is a sequence of dependent steps that must be done in order. Procedures should NOT be in a checklist because items can't be verified until prior items complete. A "checklist" for creating a database table that says "1. Fill out request form, 2. Submit form, 3. Verify table created" is a procedure — the table can't be verified if the form hasn't been submitted. Conflating the two creates unusable checklists that people skip entirely.
**The test:** Can each item be independently verified regardless of order?
- YES -> it belongs on a checklist
- NO -> it's a procedural step and should be a workflow, not a checklist
**Additionally:** Simple, well-known processes that are executed frequently without error do NOT need a checklist. Checklists are for error-prone or infrequently-performed processes where items are commonly missed or skipped.
### Step 2: Apply Checklist Design Principles
**ACTION:** Design the checklist following the core principles that determine whether teams will actually use it. For detailed templates, see [references/checklist-templates.md](references/checklist-templates.md).
**WHY:** Architects have found through experience that checklists make development teams more effective — but only when designed correctly. The law of diminishing returns applies: the more checklists an architect creates, the less likely developers will use them. Checklist adoption depends on brevity, relevance, and the perception that the items actually prevent real problems.
**Principles:**
1. **Keep it small** — Developers will NOT follow large checklists. The more items, the more likely developers rubber-stamp everything. Target 5-10 items maximum per checklist. If you need more items, split into multiple purpose-specific checklists.
2. **Automate what you can** — Any item that can be checked automatically should NOT be on a human checklist. If your linter can catch formatting issues, don't put "check formatting" on the checklist. Reserve the checklist for things that REQUIRE human judgment.
3. **State the obvious** — Don't worry about stating the obvious in a checklist. The obvious items are the ones most commonly skipped or missed. If "remove hardcoded API keys" feels too obvious for the checklist, remember that every production credential leak started with someone thinking it was too obvious to check.
4. **No procedural flows** — Every item must be independently verifiable. If item B depends on item A, you have a procedure, not a checklist.
5. **Focus on error-prone areas** — Checklists are for things that go wrong, not things that always go right. If the team never forgets database migrations, don't put it on the checklist. If they regularly forget to update configuration files, that's a checklist item.
### Step 3: Generate the Appropriate Checklist Type
**ACTION:** Create the checklist based on the identified type, pain points, and technology context.
**WHY:** Each checklist type serves a different purpose in the development lifecycle. Code completion checklists define "done." Testing checklists ensure coverage of commonly missed test scenarios. Release checklists prevent deployment disasters. Creating the wrong type for the problem doesn't help.
**Code Completion Checklist (Definition of Done):**
Items to consider including:
- Coding and formatting standards not caught by automated tools
- Frequently overlooked items (absorbed exceptions, missing null validation)
- Project-specific standards (naming conventions, logging patterns)
- Special team instructions or procedures
- Security considerations (no hardcoded secrets, input validation)
**Unit and Functional Testing Checklist:**
Items to consider including:
- Edge cases specific to the domain (empty collections, boundary values, null inputs)
- Error handling paths (what happens when the dependency fails?)
- Performance-sensitive paths (are they tested under load?)
- Integration points (are external service failures simulated?)
- Data integrity scenarios (concurrent writes, transaction boundaries)
**Software Release Checklist:**
Items to consider including:
- Configuration verification (correct config for target environment)
- Database migration status (have migrations been run?)
- Cache invalidation (is stale data being served?)
- Feature flag states (are new features properly toggled?)
- Rollback plan (is there a verified rollback procedure?)
- Monitoring verification (are alerts and dashboards updated?)
### Step 4: Plan for Checklist Compliance (Hawthorne Effect)
**ACTION:** Include a compliance strategy for ensuring the team actually uses the checklist.
**WHY:** One of the biggest challenges with checklists is getting developers to actually use them rather than rubber-stamping all items as complete. Developers who are rushed will mark all items as "done" without actually performing the checks. The Hawthorne Effect provides a solution: people who know they are being observed tend to do the right thing. By letting the team know that checklists will be occasionally spot-checked for correctness, compliance increases significantly — even if the spot-checks rarely happen.
**Compliance strategies:**
1. **Communicate the why** — Explain to the team WHY each checklist item matters. Have team members read "The Checklist Manifesto" by Atul Gawande. Make sure each person understands the reasoning behind each item.
2. **Collaborate on creation** — Have developers help create the checklist items. People follow rules they helped create. Items imposed from above get resisted.
3. **Apply the Hawthorne Effect** — Let the team know that checklists will be verified periodically. The architect or tech lead occasionally spot-checks completed checklists for correctness. The knowledge that spot-checks happen (even rarely) dramatically improves honest completion.
4. **Iterate based on feedback** — Remove items that are always done correctly (they don't need a checklist). Add items when new recurring problems emerge. Keep the checklist alive and relevant.
### Step 5: Format and Deliver the Checklist
**ACTION:** Produce the checklist in a format that integrates with the team's workflow (Markdown for PRs, JIRA template, Confluence page, etc.).
## Inputs
- Checklist type needed (code completion, testing, release, or all)
- Recurring problems or pain points
- Technology stack
- Optionally: existing checklists to improve, codebase to analyze, team size and process tolerance
## Outputs
### Development Checklist
```markdown
# {Checklist Type} Checklist
> **Purpose:** {what this checklist prevents}
> **When to use:** {at what point in the workflow}
> **Target:** {5-10 items, independent verification}
## Items
- [ ] **{Item name}** — {why this matters}
- [ ] **{Item name}** — {why this matters}
- [ ] **{Item name}** — {why this matters}
...
## Compliance
- Spot-checked by: {role}
- Frequency: {how often spot-checks occur}
## Last Updated: {date}
## Items Removed (no longer needed): {list items graduated out}
## Items Added (new recurring issues): {list new items with date added}
```
## Key Principles
- **Small checklists get used, large ones get ignored** — WHY: The law of diminishing returns applies. Each additional checklist item reduces the probability that any item gets genuine attention. A 5-item checklist gets 90% compliance. A 50-item checklist gets 10% compliance and 90% rubber-stamping. If you need 50 items, you need 5 checklists of 10 items each, used at different stages.
- **Automate everything automatable** — WHY: Human attention is the scarcest resource. Spending it on checks that a linter, static analyzer, or CI pipeline could perform is wasteful. Reserve the checklist for judgment calls: "Does this error handling cover the failure modes that matter for this service?" A linter can't answer that.
- **Checklists are not procedures** — WHY: When procedural steps are placed on a checklist, the checklist becomes unusable because items can't be verified until prior items complete. This creates a false sense of security — the team "completed the checklist" but actually just followed a procedure. Procedures belong in runbooks; checklists belong in quality gates.
- **State the obvious because it gets skipped** — WHY: "Did you remove hardcoded credentials from the codebase?" feels too obvious to put on a checklist. But every credential leak in history started with someone who thought it was too obvious to check. The obvious items are the most commonly missed precisely BECAUSE everyone assumes someone else checked them.
- **The Hawthorne Effect is your compliance tool** — WHY: Knowing that checklists will be spot-checked changes behavior even when spot-checks rarely happen. This isn't about distrust — it's about human nature. People do better work when they know someone will look at it. Security cameras don't need to record to be effective; they just need to be visible.
- **Involve the team in checklist creation** — WHY: People follow rules they helped create and resist rules imposed on them. When developers contribute checklist items based on their own experience of what goes wrong, the checklist becomes a shared tool rather than an imposed burden.
## Examples
**Scenario: Creating a code completion checklist for common bugs**
Trigger: "Our team keeps shipping bugs that could be caught with basic checks — missing null validation, hardcoded config values, forgotten log statements."
Process: Identified the three recurring issues. Created a focused 7-item code completion checklist targeting exactly these patterns plus related items the team likely hasn't considered. Verified each item is independently checkable and not automatable (if the language has a null safety feature, that item shouldn't be on the checklist). Included "why this matters" for each item to support the Hawthorne Effect — developers who understand the reasoning comply more honestly. Recommended the tech lead perform random spot-checks on 1 PR per week.
Output: 7-item code completion checklist: (1) No hardcoded configuration values, (2) Null/empty checks on external inputs, (3) Error handling produces actionable log messages, (4) No absorbed exceptions (catch blocks that swallow errors silently), (5) New configuration keys added to all environment configs, (6) Sensitive data excluded from log output, (7) New dependencies justified in PR description.
**Scenario: Creating a release checklist after production incidents**
Trigger: "We've had 3 production incidents caused by deployment mistakes — wrong config file, missing database migration, stale cache."
Process: Each incident maps directly to a checklist item. Created a 6-item release checklist targeting the exact failure modes. Verified that the config check can't be automated (if it can, it should be a CI step, not a checklist item). For the cache item, checked whether cache invalidation can be part of the deployment script. Recommended adding the checklist as a required sign-off step in the deployment pipeline.
Output: 6-item release checklist: (1) Configuration file matches target environment, (2) Database migrations have been run and verified, (3) Cache invalidation performed or scheduled, (4) Feature flags set to correct state for this release, (5) Rollback procedure documented and tested, (6) Monitoring dashboards and alerts verified for new components.
**Scenario: Fixing an existing 50-item checklist that nobody follows**
Trigger: "I want to create a testing checklist but my team complains they already have too many processes. They ignore our existing 50-item QA checklist."
Process: The problem isn't the team — it's the 50-item checklist. Applied the design principles: first, separated procedural items (which belong in a workflow document, not a checklist) from genuine verification items. Found 20 items were procedural. Of the remaining 30, identified 12 that could be automated (linting, formatting, basic test coverage). That left 18 genuine checklist items — still too many. Grouped them into 3 focused checklists of 6 items each: code completion, testing, and release. Each checklist is used at a different stage, so no developer sees more than 6 items at a time. Recommended sunsetting the 50-item checklist and introducing the 3 focused checklists with team input on final items.
Output: Three focused checklists (6 items each) replacing the original 50-item checklist, with a migration plan and team workshop agenda for collaborative refinement.
## References
- For ready-to-use checklist templates by type, with customization instructions, see [references/checklist-templates.md](references/checklist-templates.md)
## License
This skill is licensed under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
Source: [BookForge](https://github.com/bookforge-ai/bookforge-skills) — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
## Related BookForge Skills
This skill is standalone. Browse more BookForge skills: [bookforge-skills](https://github.com/bookforge-ai/bookforge-skills)
FILE:references/checklist-templates.md
# Checklist Templates Reference
Ready-to-use templates for each checklist type, with customization instructions.
## Template 1: Developer Code Completion Checklist
Use this checklist when a developer declares code is "done." It defines the definition of done for code completion.
### Base Template
```markdown
# Code Completion Checklist
> Complete this checklist before marking your PR/MR as ready for review.
> Every unchecked item needs a comment explaining why it doesn't apply.
- [ ] **No hardcoded configuration values** — All environment-specific values (URLs, ports, keys, connection strings) are externalized to config files or environment variables. WHY: Hardcoded values cause deployment failures when promoting to staging/production.
- [ ] **Input validation on external boundaries** — All inputs from users, APIs, queues, and file uploads are validated. Null checks, type checks, and range checks are present. WHY: Most security vulnerabilities and runtime crashes originate from unvalidated external input.
- [ ] **Error handling produces actionable information** — Catch blocks log the context needed to diagnose the issue (what was attempted, what input caused the failure, what the system state was). WHY: "NullPointerException at line 42" wastes hours of debugging time. "Failed to process order #12345 for customer ABC because payment service returned null" saves hours.
- [ ] **No absorbed exceptions** — No empty catch blocks. No catch-and-ignore patterns. Every exception is either handled, logged, or re-thrown. WHY: Absorbed exceptions create silent failures that are nearly impossible to diagnose. The system appears to work but data is inconsistent or incomplete.
- [ ] **Sensitive data excluded from logs** — Passwords, tokens, API keys, PII, and credit card numbers are NOT written to log files. WHY: Log files are often less protected than databases. Logging sensitive data creates an unmonitored data leak.
- [ ] **New configuration keys present in all environment configs** — If you added a new config key, it exists in dev, staging, and production config files (even if the value differs). WHY: Missing config keys cause startup failures in environments that weren't tested.
- [ ] **Code changes are covered by tests** — New logic has unit tests. Changed logic has updated tests. Edge cases identified during development are tested. WHY: Untested code changes are the primary source of regression bugs.
```
### Customization Instructions
1. **Remove items your tooling handles** — If your linter catches formatting issues, don't add a formatting item. If your CI pipeline validates configuration, remove the config item.
2. **Add team-specific items** — What recurring bugs does YOUR team ship? Add items that target those specific patterns.
3. **Add project-specific items** — Does your project have unique requirements (HIPAA compliance, PCI, accessibility)? Add verification items for those.
4. **Keep it under 10 items** — If you exceed 10, split into "Code Quality" and "Security" checklists used at different stages.
## Template 2: Unit and Functional Testing Checklist
Use this checklist when reviewing test coverage for a feature or component.
### Base Template
```markdown
# Testing Checklist
> Complete this checklist before marking tests as complete.
> Focus on scenarios that are commonly missed, not scenarios your test framework handles automatically.
- [ ] **Happy path tested** — The primary use case works as expected with valid inputs. WHY: Baseline verification that the feature does what it's supposed to do.
- [ ] **Empty/null inputs handled** — Tests cover: null values, empty strings, empty collections, zero values, and missing optional parameters. WHY: Boundary conditions are the most common source of production bugs because developers test with realistic data, not edge cases.
- [ ] **Error paths tested** — Tests verify behavior when dependencies fail: database timeouts, external API errors, network failures, invalid responses. WHY: Error handling code is the least-tested and most-executed code in production. Systems spend more time handling errors than processing happy paths.
- [ ] **Concurrent access scenarios considered** — If the code modifies shared state, tests cover concurrent writes, race conditions, and transaction isolation. WHY: Concurrency bugs are the hardest to reproduce and the most damaging in production.
- [ ] **Boundary values tested** — Tests cover: maximum allowed values, minimum allowed values, values just above/below limits, and values at pagination boundaries. WHY: Boundary errors (off-by-one, overflow, truncation) are systematic — they affect every record at the boundary, not just occasional ones.
- [ ] **Integration points tested with failure simulation** — External service calls are tested with: slow responses, error responses, malformed responses, and complete unavailability. WHY: External services WILL fail. The question is whether your system degrades gracefully or crashes.
- [ ] **Test data is realistic** — Tests use data that resembles production: international characters, long strings, special characters in names, realistic date ranges. WHY: Tests with "test123" and "John Doe" pass but fail in production with "O'Brien" or "Muller" or "Yamamoto Takeshi."
```
### Customization Instructions
1. **Add domain-specific edge cases** — Financial: rounding errors, currency conversion. Healthcare: HIPAA data handling. E-commerce: inventory race conditions.
2. **Remove items your test framework covers** — If you use property-based testing that auto-generates edge cases, you may not need the boundary values item.
3. **Add performance thresholds** — If performance matters, add: "Response time under expected load stays below {threshold}."
## Template 3: Software Release Checklist
Use this checklist before every production deployment.
### Base Template
```markdown
# Software Release Checklist
> Complete this checklist before initiating production deployment.
> Every unchecked item requires sign-off from the release lead explaining why it's acceptable.
- [ ] **Configuration verified for target environment** — Config files for the target environment have been reviewed. New configuration keys are present. Values are appropriate for production (not dev/staging values). WHY: Configuration mismatches are the #1 cause of deployment failures that pass all automated tests.
- [ ] **Database migrations verified** — All pending migrations have been run in staging successfully. Migration is idempotent (running twice doesn't cause errors). Rollback migration exists and has been tested. WHY: Failed migrations can corrupt production data and are the hardest deployment failures to recover from.
- [ ] **Cache state addressed** — Caches that hold data affected by this release have been identified. Invalidation strategy is defined (manual flush, TTL expiry, versioned keys). WHY: Stale cache data causes users to see old behavior after deployment, creating confusion and bug reports for "bugs" that are actually cache issues.
- [ ] **Feature flags configured correctly** — New features behind feature flags are set to the correct state (enabled/disabled) for this release. Kill switches are verified. WHY: Features accidentally enabled in production before they're ready cause user-facing incidents.
- [ ] **Rollback plan documented** — A specific rollback procedure exists for this release. The rollback has been tested in staging. The team knows who is authorized to trigger rollback and under what conditions. WHY: When a production deployment fails at 2 AM, there's no time to figure out rollback procedures. They must be pre-defined and pre-tested.
- [ ] **Monitoring and alerting updated** — Dashboards include metrics for new components. Alert thresholds are set for new services. On-call team has been notified of the release. WHY: Deploying new components without monitoring means failures are discovered by users, not by engineers.
```
### Customization Instructions
1. **Add security items if relevant** — "Security scan passed" or "Penetration test results reviewed."
2. **Add compliance items** — SOC2, HIPAA, PCI requirements for releases.
3. **Add communication items** — "Release notes published" or "Customer success team notified."
4. **Adapt to deployment model** — Blue/green deployments may need different items than rolling deployments.
## Checklist Design Rules Summary
| Rule | Good Example | Bad Example |
|------|-------------|-------------|
| **Independent items** | "No hardcoded config values" (can check anytime) | "Submit config form, then verify table" (dependent) |
| **Not automatable** | "Error messages provide diagnostic context" (needs judgment) | "Code passes linting" (automate this) |
| **Specific** | "Null checks on all external API response fields" | "Good error handling" |
| **States the obvious** | "No credentials in code" | (omitted because "everyone knows") |
| **Small** | 7 items per checklist | 50 items in one mega-checklist |
| **Has WHY** | "WHY: Silent failures are impossible to diagnose" | (no explanation, item seems arbitrary) |
## Compliance Strategy: The Hawthorne Effect
### What It Is
The Hawthorne Effect: when people know they are being observed or monitored, their behavior changes and they generally do the right thing.
### How to Apply It
1. **Announce spot-checks** — Tell the team that completed checklists will be randomly verified for accuracy. The announcement matters more than the frequency.
2. **Actually spot-check occasionally** — The tech lead or architect reviews 1-2 completed checklists per week, verifying that checked items were actually done.
3. **Discuss findings constructively** — When a spot-check reveals rubber-stamping, treat it as a coaching moment, not a punishment. "I noticed the config check was marked done but the staging config is missing the new REDIS_URL key. Let's talk about how to make this check easier."
4. **Use visible monitoring** — Website monitoring software, build dashboards, code quality dashboards — all serve as visible "cameras" that remind the team their work is observed.
### What NOT to Do
- Don't create a surveillance culture — the goal is awareness, not fear
- Don't punish people for honest mistakes — the checklist exists to catch mistakes
- Don't spot-check every checklist — that defeats the purpose and creates resentment
- Do make the spot-check process transparent and collaborative
## When NOT to Use Checklists
Not everything needs a checklist. Avoid checklists for:
1. **Simple, well-known processes** — If the team does it correctly every time without thinking, a checklist adds friction without value.
2. **Purely procedural workflows** — Step-by-step procedures belong in runbooks, not checklists.
3. **Fully automatable checks** — If a CI pipeline can verify it, automate it instead of putting it on a human checklist.
4. **Things that change frequently** — Items that change every sprint create maintenance burden. If the checklist needs weekly updates, the items are too specific.