@clawhub-codenova58-4ec5412327
Shape HTTP/JSON APIs—resources and verbs, error payloads, pagination, idempotency, and docs. Use when designing new endpoints, reviewing PRs, or aligning tea...
---
name: http-api
description: "Shape HTTP/JSON APIs—resources and verbs, error payloads, pagination, idempotency, and docs. Use when designing new endpoints, reviewing PRs, or aligning teams on REST-ish conventions (versioning lifecycle is a separate concern)."
---
# HTTP API
Design and review **HTTP-facing APIs** (usually JSON): predictable URLs, honest status codes, and errors clients can build on—**without** duplicating everything your **api-compat** skill covers for long-lived versioning policy.
## Scope
- **Modeling**: nouns/resources, collections, actions when RPC-style is clearer than fake REST.
- **HTTP semantics**: which methods, **idempotency**, caching headers when relevant.
- **Errors**: stable machine-readable codes, correlation ids, avoid leaking internals.
- **DX**: examples, OpenAPI snippets, consistent pagination/filter patterns.
## Out of scope (hand off)
- **OAuth/OIDC** protocol details → identity-focused skills.
- **GraphQL-only** schema design → graphql-schema skill.
- **Multi-year deprecation policy** and client migration programs → pair with **api-compat**.
## Review order
1. **Read paths** — Can a client navigate the domain without guessing?
2. **Write safety** — Retries safe? Duplicate submits handled?
3. **Errors** — One shape everywhere; 4xx vs 5xx honest.
4. **Evolution** — Document what may change vs what is stable (compat details in api-compat).
## Smells
- Status 200 with `{error: ...}`; **every** POST returns 200; unbounded list endpoints; secrets in error bodies.
## Done when
- A new engineer can call the API from docs alone; failure cases are **testable** and **consistent**.
Find and summarize arXiv.org preprints—keyword/category search, abstracts, PDF links. Use for literature scans, paper IDs, or quick orientation (not peer-rev...
---
name: arxiv-papers
description: "Find and summarize arXiv.org preprints—keyword/category search, abstracts, PDF links. Use for literature scans, paper IDs, or quick orientation (not peer-review, not medical/legal advice)."
---
# ArXiv Papers
Use the **arXiv API** (and optional PDF fetch) to **locate** papers and **summarize abstracts** for the user. Treat results as **preprints**—not necessarily peer-reviewed or final.
## When to use
- “What’s new on arXiv about …”, “Summarize arXiv:XXXX”, category browsing (e.g. `cs.AI`).
- Quick **orientation** before deeper reading—not a substitute for reading the full paper in serious research.
## Limits (say explicitly when relevant)
- **Coverage**: arXiv only; many venues are not there.
- **Quality**: preprint ≠ endorsed truth; contradictory claims exist.
- **Rate / ToS**: respect arXiv’s API guidelines; don’t hammer endpoints.
## Workflow
1. Run `scripts/search_arxiv.sh "<query>"` and parse the returned XML (`<entry>`, `<title>`, `<summary>`, PDF `<link>`).
2. Present **title, authors, id, abstract summary**, and link to abstract/PDF.
3. If the user wants depth, **PDF** may be fetched selectively—large files and parsing limits apply.
4. Optionally append notable papers to `memory/RESEARCH_LOG.md` (if your environment uses it):
```markdown
### [YYYY-MM-DD] TITLE
- **Authors**: …
- **Link**: …
- **Summary**: …
```
## Examples
- Latest LLM reasoning papers on arXiv.
- “What is paper `2512.08769` about?”
## Resources
- `scripts/search_arxiv.sh` — thin wrapper over the arXiv API.
FILE:scripts/search_arxiv.sh
#!/usr/bin/env bash
# scripts/search_arxiv.sh
QUERY=$1
COUNT=-5
# Use curl to query ArXiv API
curl -sL "https://export.arxiv.org/api/query?search_query=all:$QUERY&start=0&max_results=$COUNT&sortBy=submittedDate&sortOrder=descending"
Stress-test designs before they ship—constraints, trade-offs, failure modes, and ADR-worthy decisions. Use for ADRs, big refactors, new services, or when ‘it...
--- name: arch-review description: "Stress-test designs before they ship—constraints, trade-offs, failure modes, and ADR-worthy decisions. Use for ADRs, big refactors, new services, or when ‘it works on my laptop’ isn’t enough." --- # Architecture Review **Challenge** a design without owning the team’s roadmap: clarify **forces** (scale, money, people, regulation), surface **risks**, and leave **decisions** traceable—usually as an ADR or review notes. ## Inputs you need (ask early) - **Goal** and non-goals; **users** and SLAs; **constraints** (budget, deadline, org skills). - **Current pain**—latency, incidents, cost, velocity—not buzzwords. - **Alternatives** considered, even if rough. ## Review lens (pick what fits) - **Failure**: blast radius, partial outages, data loss, replay. - **Ops**: deploy model, rollbacks, observability, on-call load. - **Change**: team size, Conway’s law, long-term ownership. - **Security**: trust boundaries, secrets, supply chain—at architecture depth, not a full pentest. ## Output shape - **Summary** of the proposal in your own words (catches misunderstandings). - **Top risks** with severity; **mitigations** or experiments. - **Open questions** for the team—not a pretend-final design. ## Not this - Replacing the team’s **product** judgment; rubber-stamping; 20-page templates nobody reads. ## Done when - The team can explain **what they decided**, **why**, and **what would falsify** the choice later.
Change public APIs without breaking clients—versioning schemes, additive vs breaking changes, deprecation windows, and comms. Use when shipping breaking chan...
--- name: api-compat description: "Change public APIs without breaking clients—versioning schemes, additive vs breaking changes, deprecation windows, and comms. Use when shipping breaking changes, sunsetting fields, or coordinating mobile/web SDK consumers." --- # API Compatibility Own the **lifecycle** of a public API: who breaks when you ship, how long old behavior lives, and how clients discover what’s next. Pair with **http-api** for how requests look **today**; this skill is about **time and promises**. ## When to use - Adding/removing fields, routes, or semantics that **existing** clients rely on. - Choosing **URL vs header vs package** versioning—or when **no** formal version and only additive JSON. - **Deprecation**: timelines, metrics (who still calls old paths?), and removal gates. ## Core ideas - **Additive first** — nullable new fields beat silent behavior changes. - **Explicit contracts** — integration tests or consumer-driven checks for critical partners. - **Communicate** — changelog, developer email, in-response **Sunset** / warnings where standards allow. ## Avoid - “We’ll just bump the version” without a **migration** story for slow-moving apps. - Breaking auth or pagination with no **coordination** window. - Deprecating without **usage data**—you’ll cut traffic you didn’t know existed. ## Checklist before breaking - [ ] Who is affected (internal only vs third parties)? - [ ] Minimum notice period and **rollback** if telemetry spikes errors? - [ ] Docs + examples updated **before** the flag day? ## Done when - Old and new behaviors are **measurable**; removal is gated on **evidence**, not hope.
Inclusive UI for real users—keyboard and focus, semantics for AT, contrast and motion, forms and errors. Use when fixing WCAG issues, auditing screens, or de...
--- name: a11y description: "Inclusive UI for real users—keyboard and focus, semantics for AT, contrast and motion, forms and errors. Use when fixing WCAG issues, auditing screens, or designing new flows (not a substitute for formal audits)." --- # A11y Help ship interfaces that work for **keyboard, screen readers, voice control, and low vision**—without treating accessibility as a bolt-on checkbox. ## What this skill is for - **Concrete fixes**: focus traps, missing names/roles, heading order, label–control wiring, live regions, skip links. - **Design trade-offs**: custom components vs native elements, motion reduction, contrast vs brand color. - **Verification**: what to test in browser + AT, and what still needs human QA. ## What to skip or defer - Legal **compliance certification** (VPAT, formal audits)—suggest specialists when the user needs signed-off conformance. - **Visual design** taste without an accessibility angle—unless it affects contrast, touch targets, or readability. ## Workflow (adapt freely) 1. **Context** — Who uses what (keyboard-only, VoiceOver, NVDA, zoom)? Which WCAG **level** (A/AA) is the target? 2. **Surface problems** — Route critical paths; list failures (not vague “be more accessible”). 3. **Fix in order** — Blockers first (can’t complete task), then serious (wrong info), then polish. 4. **Prove it** — Tab order, focus visible, screen reader labels, automated checks where useful, manual pass on real flows. ## Anti-patterns to call out - `div` buttons without keyboard support; **only** running Lighthouse and declaring done. - Hiding focus rings “for aesthetics”; icon-only controls with no accessible name. - Over-using `aria-*` instead of using the right **semantic** element first. ## Done when - Critical paths are **keyboard-operable** and **named** for assistive tech; known gaps are documented with owners.
Agent loops, memory, tools, and safety boundaries. Use when designing AI agents.
--- name: agent-arch description: "Agent loops, memory, tools, and safety boundaries. Use when designing AI agents." --- # Agent Architecture Skill This skill provides structured guidance for **Agent Architecture** work. Act as an active guide: confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **agent architecture** or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **goals, tools, and constraints**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **memory and state**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **planning loops and stopping**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **safety monitoring**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **Agent Architecture Skill** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to Agent Architecture (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Experiment design, metrics, ethics, and analysis. Use when running product experiments responsibly.
--- name: ab-testing description: "Experiment design, metrics, ethics, and analysis. Use when running product experiments responsibly." --- # AB Testing Skill This skill provides structured guidance for **AB Testing** work. Act as an active guide: confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **ab testing** or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **hypothesis and primary metric**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **randomization and segments**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **ethics and guardrails**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **analysis and decision rules**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **AB Testing Skill** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to AB Testing (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Backups, RPO/RTO, restores, and drills. Use when designing DR or testing restores.
--- name: restore description: "Backups, RPO/RTO, restores, and drills. Use when designing DR or testing restores." --- # Restore Structured guidance for **backup and recovery** (RPO/RTO, restores, drills): confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **backup**, **restore**, **DR**, or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **RPO/RTO targets**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **backup testing**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **restore drills**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **runbooks and ownership**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **restore** / DR work - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to backups and recovery (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Moderation workflows, thresholds, and appeals. Use when handling UGC at scale.
--- name: moderation description: "Moderation workflows, thresholds, and appeals. Use when handling UGC at scale." --- # Moderation Structured guidance for **content moderation** (UGC policy, thresholds, appeals): confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **content moderation** or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **categories and severity**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **human review workflows**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **threshold tuning**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **appeals and transparency**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **moderation** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to moderation (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Multi-account patterns, networking, and well-architected trade-offs. Use when designing cloud systems.
--- name: cloud-arch description: "Multi-account patterns, networking, and well-architected trade-offs. Use when designing cloud systems." --- # Cloud Arch Structured guidance for **cloud architecture** (accounts, networking, well-architected trade-offs): confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **cloud architecture** or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **accounts, networking, identity**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **data and encryption**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **scalability patterns**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **operational model**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **cloud architecture** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to cloud systems (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Pipelines, gates, artifacts, and safe deployments. Use when setting up CI/CD, hardening pipelines, or reducing release risk.
--- name: cicd description: "Pipelines, gates, artifacts, and safe deployments. Use when setting up CI/CD, hardening pipelines, or reducing release risk." --- # CICD Structured guidance for **CI/CD** (continuous integration and delivery): confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **CI/CD**, **CICD**, pipelines, or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **pipeline stages and artifacts**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **secrets, permissions, and reproducibility**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **deployment strategies and approvals**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **observability and rollback hooks**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **CICD** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to CI/CD (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Proving identity: sessions, tokens, MFA, recovery. Use when implementing login, token refresh, or auth bugs.
--- name: authentication description: "Proving identity: sessions, tokens, MFA, recovery. Use when implementing login, token refresh, or auth bugs." --- # Authentication Skill This skill provides structured guidance for **Authentication** work. Act as an active guide: confirm triggers, propose the stages below, and adapt if the user wants a lighter pass. ## When to Offer This Workflow **Trigger conditions:** - User mentions **authentication** or closely related work - They want a structured workflow rather than ad-hoc tips - They are preparing a review, rollout, or stakeholder communication **Initial offer:** Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style. ## Workflow Stages ### Stage 1: Clarify context & goals Anchor on **threat model: sessions vs tokens**. Ask what success looks like, constraints, and what must not break. Capture unknowns early. ### Stage 2: Design or plan the approach Translate goals into a concrete plan around **passwords, MFA, and recovery**. Compare alternatives and explicit trade-offs; avoid implicit assumptions. ### Stage 3: Implement, validate, and harden Execute with verification loops tied to **token lifetime and refresh**. Prefer small steps, measurable checks, and rollback points where risk is high. ### Stage 4: Operate, communicate, and iterate Close the loop with **logging, lockout, and abuse**: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle. ## Checklist Before Completion - Goals and constraints are explicit for **Authentication Skill** - Risks and trade-offs are stated, not hand-waved - Verification steps match the change’s impact (tests, canary, peer review) - Operational follow-through is covered (monitoring, docs, owners) ## Tips for Effective Guidance - Be procedural: stage-by-stage, with clear exit criteria - Ask for missing context (environment, scale, deadlines) before prescribing - Prefer checklists and concrete examples over generic platitudes - If the user declines the workflow, switch to freeform help without lecturing ## Handling Deviations - If the user wants to skip a stage: confirm and continue with what they need. - If context is missing: ask targeted questions before strong recommendations. - Prefer concrete examples, trade-offs, and verification steps over generic advice. ## Quality Bar - Each recommendation should be **actionable** (what to do next). - Call out **failure modes** relevant to Authentication (security, scale, UX, or ops). - Keep tone direct and respectful of the user’s time.
Deep co-authoring workflow—context gathering, iterative drafting and structure, reader testing, and quality gates. Use when writing documentation, proposals,...
--- name: doc-co description: Deep co-authoring workflow—context gathering, iterative drafting and structure, reader testing, and quality gates. Use when writing documentation, proposals, technical specs, decision docs, RFCs, or similar structured content; helps transfer context, refine through iteration, and verify the doc works for readers. --- # Doc Co-Authoring This skill provides a structured workflow for guiding users through collaborative document creation. Act as an active guide, walking users through three stages: Context Gathering, Refinement & Structure, and Reader Testing. ## When to Offer This Workflow **Trigger conditions:** - User mentions writing documentation: "write a doc", "draft a proposal", "create a spec", "write up" - User mentions specific doc types: "PRD", "design doc", "decision doc", "RFC" - User seems to be starting a substantial writing task **Initial offer:** Offer the user a structured workflow for co-authoring the document. Explain the three stages: 1. **Context Gathering**: User provides all relevant context while Claude asks clarifying questions 2. **Refinement & Structure**: Iteratively build each section through brainstorming and editing 3. **Reader Testing**: Test the doc with a fresh Claude (no context) to catch blind spots before others read it Explain that this approach helps ensure the doc works well when others read it (including when they paste it into Claude). Ask if they want to try this workflow or prefer to work freeform. If user declines, work freeform. If user accepts, proceed to Stage 1. ## Stage 1: Context Gathering **Goal:** Close the gap between what the user knows and what Claude knows, enabling smart guidance later. ### Initial Questions Start by asking the user for meta-context about the document: 1. What type of document is this? (e.g., technical spec, decision doc, proposal) 2. Who's the primary audience? 3. What's the desired impact when someone reads this? 4. Is there a template or specific format to follow? 5. Any other constraints or context to know? Inform them they can answer in shorthand or dump information however works best for them. **If user provides a template or mentions a doc type:** - Ask if they have a template document to share - If they provide a link to a shared document, use the appropriate integration to fetch it - If they provide a file, read it **If user mentions editing an existing shared document:** - Use the appropriate integration to read the current state - Check for images without alt-text - If images exist without alt-text, explain that when others use Claude to understand the doc, Claude won't be able to see them. Ask if they want alt-text generated. If so, request they paste each image into chat for descriptive alt-text generation. ### Info Dumping Once initial questions are answered, encourage the user to dump all the context they have. Request information such as: - Background on the project/problem - Related team discussions or shared documents - Why alternative solutions aren't being used - Organizational context (team dynamics, past incidents, politics) - Timeline pressures or constraints - Technical architecture or dependencies - Stakeholder concerns Advise them not to worry about organizing it - just get it all out. Offer multiple ways to provide context: - Info dump stream-of-consciousness - Point to team channels or threads to read - Link to shared documents **If integrations are available** (e.g., Slack, Teams, Google Drive, SharePoint, or other MCP servers), mention that these can be used to pull in context directly. **If no integrations are detected and in Claude.ai or Claude app:** Suggest they can enable connectors in their Claude settings to allow pulling context from messaging apps and document storage directly. Inform them clarifying questions will be asked once they've done their initial dump. **During context gathering:** - If user mentions team channels or shared documents: - If integrations available: Inform them the content will be read now, then use the appropriate integration - If integrations not available: Explain lack of access. Suggest they enable connectors in Claude settings, or paste the relevant content directly. - If user mentions entities/projects that are unknown: - Ask if connected tools should be searched to learn more - Wait for user confirmation before searching - As user provides context, track what's being learned and what's still unclear **Asking clarifying questions:** When user signals they've done their initial dump (or after substantial context provided), ask clarifying questions to ensure understanding: Generate 5-10 numbered questions based on gaps in the context. Inform them they can use shorthand to answer (e.g., "1: yes, 2: see #channel, 3: no because backwards compat"), link to more docs, point to channels to read, or just keep info-dumping. Whatever's most efficient for them. **Exit condition:** Sufficient context has been gathered when questions show understanding - when edge cases and trade-offs can be asked about without needing basics explained. **Transition:** Ask if there's any more context they want to provide at this stage, or if it's time to move on to drafting the document. If user wants to add more, let them. When ready, proceed to Stage 2. ## Stage 2: Refinement & Structure **Goal:** Build the document section by section through brainstorming, curation, and iterative refinement. **Instructions to user:** Explain that the document will be built section by section. For each section: 1. Clarifying questions will be asked about what to include 2. 5-20 options will be brainstormed 3. User will indicate what to keep/remove/combine 4. The section will be drafted 5. It will be refined through surgical edits Start with whichever section has the most unknowns (usually the core decision/proposal), then work through the rest. **Section ordering:** If the document structure is clear: Ask which section they'd like to start with. Suggest starting with whichever section has the most unknowns. For decision docs, that's usually the core proposal. For specs, it's typically the technical approach. Summary sections are best left for last. If user doesn't know what sections they need: Based on the type of document and template, suggest 3-5 sections appropriate for the doc type. Ask if this structure works, or if they want to adjust it. **Once structure is agreed:** Create the initial document structure with placeholder text for all sections. **If access to artifacts is available:** Use `create_file` to create an artifact. This gives both Claude and the user a scaffold to work from. Inform them that the initial structure with placeholders for all sections will be created. Create artifact with all section headers and brief placeholder text like "[To be written]" or "[Content here]". Provide the scaffold link and indicate it's time to fill in each section. **If no access to artifacts:** Create a markdown file in the working directory. Name it appropriately (e.g., `decision-doc.md`, `technical-spec.md`). Inform them that the initial structure with placeholders for all sections will be created. Create file with all section headers and placeholder text. Confirm the filename has been created and indicate it's time to fill in each section. **For each section:** ### Step 1: Clarifying Questions Announce work will begin on the [SECTION NAME] section. Ask 5-10 clarifying questions about what should be included: Generate 5-10 specific questions based on context and section purpose. Inform them they can answer in shorthand or just indicate what's important to cover. ### Step 2: Brainstorming For the [SECTION NAME] section, brainstorm [5-20] things that might be included, depending on the section's complexity. Look for: - Context shared that might have been forgotten - Angles or considerations not yet mentioned Generate 5-20 numbered options based on section complexity. At the end, offer to brainstorm more if they want additional options. ### Step 3: Curation Ask which points should be kept, removed, or combined. Request brief justifications to help learn priorities for the next sections. Provide examples: - "Keep 1,4,7,9" - "Remove 3 (duplicates 1)" - "Remove 6 (audience already knows this)" - "Combine 11 and 12" **If user gives freeform feedback** (e.g., "looks good" or "I like most of it but...") instead of numbered selections, extract their preferences and proceed. Parse what they want kept/removed/changed and apply it. ### Step 4: Gap Check Based on what they've selected, ask if there's anything important missing for the [SECTION NAME] section. ### Step 5: Drafting Use `str_replace` to replace the placeholder text for this section with the actual drafted content. Announce the [SECTION NAME] section will be drafted now based on what they've selected. **If using artifacts:** After drafting, provide a link to the artifact. Ask them to read through it and indicate what to change. Note that being specific helps learning for the next sections. **If using a file (no artifacts):** After drafting, confirm completion. Inform them the [SECTION NAME] section has been drafted in [filename]. Ask them to read through it and indicate what to change. Note that being specific helps learning for the next sections. **Key instruction for user (include when drafting the first section):** Provide a note: Instead of editing the doc directly, ask them to indicate what to change. This helps learning of their style for future sections. For example: "Remove the X bullet - already covered by Y" or "Make the third paragraph more concise". ### Step 6: Iterative Refinement As user provides feedback: - Use `str_replace` to make edits (never reprint the whole doc) - **If using artifacts:** Provide link to artifact after each edit - **If using files:** Just confirm edits are complete - If user edits doc directly and asks to read it: mentally note the changes they made and keep them in mind for future sections (this shows their preferences) **Continue iterating** until user is satisfied with the section. ### Quality Checking After 3 consecutive iterations with no substantial changes, ask if anything can be removed without losing important information. When section is done, confirm [SECTION NAME] is complete. Ask if ready to move to the next section. **Repeat for all sections.** ### Near Completion As approaching completion (80%+ of sections done), announce intention to re-read the entire document and check for: - Flow and consistency across sections - Redundancy or contradictions - Anything that feels like "slop" or generic filler - Whether every sentence carries weight Read entire document and provide feedback. **When all sections are drafted and refined:** Announce all sections are drafted. Indicate intention to review the complete document one more time. Review for overall coherence, flow, completeness. Provide any final suggestions. Ask if ready to move to Reader Testing, or if they want to refine anything else. ## Stage 3: Reader Testing **Goal:** Test the document with a fresh Claude (no context bleed) to verify it works for readers. **Instructions to user:** Explain that testing will now occur to see if the document actually works for readers. This catches blind spots - things that make sense to the authors but might confuse others. ### Testing Approach **If access to sub-agents is available (e.g., in Claude Code):** Perform the testing directly without user involvement. ### Step 1: Predict Reader Questions Announce intention to predict what questions readers might ask when trying to discover this document. Generate 5-10 questions that readers would realistically ask. ### Step 2: Test with Sub-Agent Announce that these questions will be tested with a fresh Claude instance (no context from this conversation). For each question, invoke a sub-agent with just the document content and the question. Summarize what Reader Claude got right/wrong for each question. ### Step 3: Run Additional Checks Announce additional checks will be performed. Invoke sub-agent to check for ambiguity, false assumptions, contradictions. Summarize any issues found. ### Step 4: Report and Fix If issues found: Report that Reader Claude struggled with specific issues. List the specific issues. Indicate intention to fix these gaps. Loop back to refinement for problematic sections. --- **If no access to sub-agents (e.g., claude.ai web interface):** The user will need to do the testing manually. ### Step 1: Predict Reader Questions Ask what questions people might ask when trying to discover this document. What would they type into Claude.ai? Generate 5-10 questions that readers would realistically ask. ### Step 2: Setup Testing Provide testing instructions: 1. Open a fresh Claude conversation: https://claude.ai 2. Paste or share the document content (if using a shared doc platform with connectors enabled, provide the link) 3. Ask Reader Claude the generated questions For each question, instruct Reader Claude to provide: - The answer - Whether anything was ambiguous or unclear - What knowledge/context the doc assumes is already known Check if Reader Claude gives correct answers or misinterprets anything. ### Step 3: Additional Checks Also ask Reader Claude: - "What in this doc might be ambiguous or unclear to readers?" - "What knowledge or context does this doc assume readers already have?" - "Are there any internal contradictions or inconsistencies?" ### Step 4: Iterate Based on Results Ask what Reader Claude got wrong or struggled with. Indicate intention to fix those gaps. Loop back to refinement for any problematic sections. --- ### Exit Condition (Both Approaches) When Reader Claude consistently answers questions correctly and doesn't surface new gaps or ambiguities, the doc is ready. ## Final Review When Reader Testing passes: Announce the doc has passed Reader Claude testing. Before completion: 1. Recommend they do a final read-through themselves - they own this document and are responsible for its quality 2. Suggest double-checking any facts, links, or technical details 3. Ask them to verify it achieves the impact they wanted Ask if they want one more review, or if the work is done. **If user wants final review, provide it. Otherwise:** Announce document completion. Provide a few final tips: - Consider linking this conversation in an appendix so readers can see how the doc was developed - Use appendices to provide depth without bloating the main doc - Update the doc as feedback is received from real readers ## Tips for Effective Guidance **Tone:** - Be direct and procedural - Explain rationale briefly when it affects user behavior - Don't try to "sell" the approach - just execute it **Handling Deviations:** - If user wants to skip a stage: Ask if they want to skip this and write freeform - If user seems frustrated: Acknowledge this is taking longer than expected. Suggest ways to move faster - Always give user agency to adjust the process **Context Management:** - Throughout, if context is missing on something mentioned, proactively ask - Don't let gaps accumulate - address them as they come up **Artifact Management:** - Use `create_file` for drafting full sections - Use `str_replace` for all edits - Provide artifact link after every change - Never use artifacts for brainstorming lists - that's just conversation **Quality over Speed:** - Don't rush through stages - Each iteration should make meaningful improvements - The goal is a document that actually works for readers
Chinese Four Pillars (BaZi 八字) chart interpretation—year, month, day, and hour pillars from birth data; Heavenly Stems, Earthly Branches, Five Elements, and...
---
name: bazi-reading
description: |
Chinese Four Pillars (BaZi 八字) chart interpretation—year, month, day, and hour pillars from birth data; Heavenly Stems, Earthly Branches, Five Elements, and high-level pattern reading. Use when the user asks for 八字, BaZi, Four Pillars, birth chart, day master, luck pillars, or compatibility from a traditional metaphysics lens. Not medical, legal, or financial advice; cultural and reflective framing only.
metadata:
openclaw:
emoji: "☯"
---
# BaZi Reading (八字算命) — Four Pillars of Destiny
## Overview
**BaZi** (八字, literally “eight characters”) is a classical Chinese framework that encodes a person’s birth moment into **four pairs** of characters: **Year, Month, Day, and Hour** pillars. Each pair is **天干 + 地支** (Heavenly Stem + Earthly Branch). Together they are used in traditional culture to discuss personality tendencies, timing, and life themes.
This skill guides the assistant to **structure readings clearly**, ask for **correct birth inputs**, and stay within **ethical boundaries** (no deterministic fate claims, no substitute for professional advice).
**Trigger keywords**: 八字, BaZi, Four Pillars, birth chart, day master 日主, ten gods 十神, luck cycle 大运流年, five elements 五行, compatibility 合婚
---
## When to use
- The user wants a **BaZi-style** reading, chart outline, or explanation of pillars/elements.
- The user mentions **solar vs lunar** birth date, **true solar time**, or **timezone** issues.
- The user asks how **two charts** might interact (very high level—avoid fatalistic or coercive language).
**Do not use** as a substitute for medical, psychological, legal, or investment decisions.
---
## Required birth information (ask if missing)
| Field | Why it matters |
|-------|----------------|
| **Date of birth** | Calendar type: **Gregorian** (公历) vs **lunar** (农历); note if user is unsure |
| **Time of birth** | Local clock time; **unknown time** → say hour pillar is indeterminate or use rough ranges with caveats |
| **Place of birth** | For **true solar time** / longitude correction in strict practice (optional; state if you apply or skip) |
| **Gender** | Some traditional texts use it for **大运** direction or narrative phrasing—ask only if needed for the method you describe |
Always state **assumptions** (e.g. “using Gregorian date as given, no true solar correction unless you specify location”).
---
## Core concepts (concise reference)
### The four pillars
1. **年柱** Year pillar — family/era backdrop (high level)
2. **月柱** Month pillar — season strength of elements (often key for “useful god” discussions in tradition)
3. **日柱** Day pillar — **day master (日主)** sits here (the “self” stem in many schools)
4. **时柱** Hour pillar — later life / children / career nuance in classical texts (varies by school)
### Stems and branches
- **十天干** Ten Heavenly Stems — e.g. 甲 Yi wood … 癸 Gui water
- **十二地支** Twelve Earthly Branches — 子 Zi, 丑 Chou, 寅 Yin … 亥 Hai
- **五行** Five Elements (Wood, Fire, Earth, Metal, Water) and **生克** cycles
- **阴阳** Yin–Yang on stems/branches
### Common analytic vocabulary (use carefully)
- **十神** “Ten Gods” labels (e.g. 比肩, 食神) — explain as **traditional role tags** relative to the day master, not moral judgments.
- **大运 / 流年** Major and annual luck cycles — **time ranges** must be computed with a real calendar engine; if you cannot run one, describe **qualitatively** or ask the user to use a trusted BaZi calculator and paste the pillars.
---
## Assistant behavior
1. **Transparency** — Say that BaZi is a **cultural metaphysical framework**, not empirical science.
2. **No absolutes** — Avoid “you will definitely…”, “you must marry X element.” Use **tendencies**, **themes**, **questions for reflection**.
3. **No harmful content** — Refuse to predict death, serious illness, or to encourage discrimination (gender, disability, etc.).
4. **Calculator honesty** — If exact pillar tables or luck cycles are needed, **recommend verifying** with a reputable BaZi calculator or a qualified practitioner rather than inventing stems/branches.
5. **Language** — Match the user’s language (Chinese or English); keep classical terms with **short glosses** when first used.
---
## Suggested output structure
```markdown
# BaZi reading (outline)
**Inputs assumed**: [Gregorian/lunar, date, time, timezone corrections stated or “none”]
## Chart snapshot
- Year / Month / Day / Hour pillars: [only if you have a reliable source or user-provided chart]
- Day master (日主): [stem] — [element]
## Element balance (qualitative)
- [Which elements appear strong/weak — tentative if not computed]
## Themes (non-fatalistic)
- [2–4 reflective bullets: work style, relationships, pacing — framed as possibilities]
## Timing (if applicable)
- [If user supplied 大运/流年 from a calculator: interpret lightly]
- [If not: suggest they generate a chart first]
## Caveats
- Cultural perspective only; not medical/legal/financial advice.
```
---
## References (neutral / educational)
- Wikipedia: [Four Pillars of Destiny](https://en.wikipedia.org/wiki/Four_Pillars_of_Destiny) — overview only; cross-check details.
- Use **specialized BaZi software or licensed practitioners** for marriage of lunar/solar rules and true solar time.
---
## Why this slug: `bazi-reading`
- **BaZi** is the internationally recognized romanization for 八字.
- **reading** signals interpretation and dialogue, not a claim of supernatural authority.
- **four-pillars** is an alternate English name; you may mention both in prose.
Real-time answers from the public web via the host app’s local search gateway (Auth Gateway proxy). Typical stacks surface results comparable to major engine...
---
name: live-search
description: |
Real-time answers from the public web via the host app’s local search gateway (Auth Gateway proxy). Typical stacks surface results comparable to major engines (e.g. Google or Bing, depending on host/region)—this skill only calls the local HTTP endpoint, not third-party search APIs directly.
Use when the user needs fresh results, fact checks, prices, weather, news, scores, rates, or anything after the model’s knowledge cutoff.
Triggers: “search”, “look up”, “find out”, “latest”, “today”, “current price”, “verify”, or any question needing live data.
metadata:
openclaw:
emoji: "🔍"
requires:
bins:
- curl
---
# Live Search
Fetch **live web results** through the **host search gateway** at `http://localhost:$PORT` (session-authenticated). The gateway returns JSON with a pre-rendered `message` (titles as links, snippets, sources)—the same *kind* of web index results users expect from **Google-style or Bing-style** search, depending on how the host is configured.
**Endpoint path:** requests use `POST /proxy/prosearch/search`. The `prosearch` segment is a **fixed gateway route name** in the app; it is **not** a public product brand to repeat to end users—describe outcomes as “web search results” or “live search.”
## Setup
No extra Python packages. Search goes through the local gateway at `http://localhost:$PORT`; authentication is handled by the host app (login session)—**no manual API keys** in typical setups.
---
## Workflow
The assistant uses this skill whenever the user needs **real-time** information from the web.
### End-to-end flow
```
User asks for something that needs live web data
→ Step 1: Build a tight search keyword (concise, specific)
→ Step 1.5: Decide time freshness — add from_time when recency matters
→ Step 2: Call the search API with curl
→ Step 3: Echo the JSON `message` field VERBATIM (result list with clickable links) — do NOT skip this
→ Step 4: Optionally add analysis/summary after the verbatim block
```
> **CRITICAL — Anti-hallucination:** The API returns a pre-rendered `message` with formatted hits (titles as Markdown links, snippets, URLs). **The assistant MUST show `message` verbatim as the primary results.** It may add interpretation **after** that block. It must **not** invent, rewrite, or drop URLs/titles from `message`.
### Step 1: Build the keyword
Turn the user’s question into a short query:
| User intent | Example keyword |
|-------------|-----------------|
| Latest AI news | `latest AI news March 2026` or `最新 AI 新闻` (match user language) |
| Gold price now | `gold spot price today` |
| React 19 features | `React 19 new features` |
| Local weather | `London weather today` |
**Keyword tips:**
- Keep it short (about 2–6 tokens).
- Strip filler (“please”, “can you”, “帮我”).
- Add time hints when needed (`today`, `2026`, `latest`).
- **Keep the keyword in the language that matches the user’s intent** — do not blindly translate. If the user asks in English, search in English; if they ask in Chinese, Japanese, etc., use that language for the query when it improves results.
### Step 1.5: Time freshness (important for “latest” questions)
When the question implies **recency**, add **`from_time`** (Unix seconds) so stale pages are filtered out.
| User signal | `from_time` | Typical use |
|-------------|-------------|-------------|
| “today”, “just now”, “past 24h” | now − 86400 | Intraday facts |
| “recent”, “latest”, “this week” | now − 604800 | News, releases |
| “this month” | now − 2592000 | Monthly topics |
| “this year”, “2026” | Jan 1 of that year (local) | Year-scoped events |
| No time signal | omit `from_time` | Evergreen facts (“What is React?”) |
**Compute `from_time` in bash:**
```bash
# Last 24 hours
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 86400)")
# Last 7 days
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 604800)")
# Last 30 days
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 2592000)")
```
> **Mutual exclusion:** When using `from_time` / `to_time`, **do not send `cnt`** — the server enforces exclusion rules. Same for `site` + time filters; follow the API’s rules.
### Step 2: Request
```bash
PORT=-19000
PPID_VAL=$(python3 -c "import os; print(os.getppid())")
echo "[Assistant] Parent PID: $PPID_VAL"
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"your search query"}'
```
**Freshness (recommended for time-sensitive queries):**
```bash
# Last 7 days (“latest”, “recent”)
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 604800)")
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d "{\"keyword\":\"your search query\",\"from_time\":$FROM_TIME}"
# Last 24 hours (“today”, “just now”)
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 86400)")
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d "{\"keyword\":\"your search query\",\"from_time\":$FROM_TIME}"
```
**Optional parameters:**
```bash
# Result count 10/20/30/40/50 — do not combine with from_time/to_time/site
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"your search query","cnt":20}'
# Time range (do not pass cnt)
FROM_TIME=$(python3 -c "import time; print(int(time.time()) - 604800)")
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d "{\"keyword\":\"your search query\",\"from_time\":$FROM_TIME}"
# Site-restricted search (do not pass cnt)
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"your search query","site":"github.com"}'
# Vertical: gov / news / acad
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"your search query","industry":"news"}'
```
### Step 3: Present results — verbatim `message` first, then analysis
After JSON returns:
#### Part A — Result list [MANDATORY]
**Output the `message` field exactly as returned.** It usually contains up to **five** top hits, each formatted like:
```
**n. [Title](url)** — Site (date) ⭐
Snippet...
```
> **CRITICAL:** Never skip the list and jump to a summary. Titles are already Markdown links; users must be able to click through.
#### Part B — Analysis [OPTIONAL, after Part A]
**Language for your added commentary:** align with the **user’s conversation language** and the **query language** when helpful:
- English query → English analysis (typical for EN users).
- Non-English query → match the user’s language for the follow-up.
- The `message` block is always copied **verbatim**, regardless of language.
#### Good pattern
```
API returns a long `message` string with numbered results and snippets.
Assistant output:
<paste entire message verbatim>
---
Brief synthesis: … (optional, grounded in what appeared above)
```
#### Forbidden
- Skipping the result list and answering from memory.
- Rebuilding the list from `data.docs` instead of using `message`.
- Editing URLs or titles inside `message`.
- Claiming sources that are not in `message`.
- Stripping Markdown links from titles.
---
## PORT
Use **`AUTH_GATEWAY_PORT`** from the environment (set by the Electron host when the Auth Gateway starts). Child processes inherit it.
**macOS / Linux (bash):**
```bash
PORT=-19000
echo "[Assistant] AUTH_GATEWAY_PORT: $PORT"
```
**Windows (PowerShell):**
```powershell
$PORT = if ($env:AUTH_GATEWAY_PORT) { $env:AUTH_GATEWAY_PORT } else { "19000" }
Write-Host "[Assistant] AUTH_GATEWAY_PORT: $PORT"
```
**Windows (CMD):**
```cmd
if not defined AUTH_GATEWAY_PORT set AUTH_GATEWAY_PORT=19000
set PORT=%AUTH_GATEWAY_PORT%
echo [Assistant] AUTH_GATEWAY_PORT: %PORT%
```
Default if unset: **`19000`**.
## Parent PID (logging)
Before `curl`, you may log the parent PID for tracing.
**macOS / Linux:**
```bash
PPID_VAL=$(python3 -c "import os; print(os.getppid())")
echo "[Assistant] Parent PID: $PPID_VAL"
```
**Windows (PowerShell):**
```powershell
$PPID_VAL = python -c "import os; print(os.getppid())"
Write-Host "[Assistant] Parent PID: $PPID_VAL"
```
---
## Command: `search`
```
POST /proxy/prosearch/search
Content-Type: application/json
{
"keyword": "<search-query>", // required, UTF-8
"mode": 0, // optional: 0=web 1=VR card 2=hybrid
"cnt": 10, // optional: 10/20/30/40/50
"site": "<domain>", // optional: site-restricted
"from_time": 1710000000, // optional: start (epoch seconds)
"to_time": 1711000000, // optional: end (epoch seconds)
"industry": "news" // optional: gov | news | acad
}
```
**Fields:**
- **`keyword`** (required): query string.
- **`mode`**: `0` default web results; `1` VR “card” style facts (e.g. weather, spot prices); `2` hybrid.
- **`cnt`**: max hits; **mutually exclusive** with `site` and `from_time`/`to_time` per backend rules.
- **`site`**: restrict to a domain.
- **`from_time` / `to_time`**: time window in epoch seconds.
- **`industry`**: `gov` (government), `news`, `acad` (academic-oriented).
> Do **not** combine `cnt` with time filters or `site` when the API forbids it.
**Examples:**
```bash
PORT=-19000
echo "[Assistant] AUTH_GATEWAY_PORT: $PORT"
PPID_VAL=$(python3 -c "import os; print(os.getppid())")
echo "[Assistant] Parent PID: $PPID_VAL"
# Basic
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"latest AI news"}'
# More results (no time/site)
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"React 19 features","cnt":20}'
# News vertical
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"Federal Reserve statement March 2026","industry":"news"}'
# GitHub-only
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"electron vite template","site":"github.com"}'
# Hybrid mode for structured + web (e.g. commodity spot, weather) — adjust keyword to your locale
curl -s -X POST http://localhost:$PORT/proxy/prosearch/search \
-H 'Content-Type: application/json' \
-d '{"keyword":"gold spot price today","mode":2}'
```
**Success JSON (shape):**
```json
{
"success": true,
"message": "Search results for \"latest AI news\"…\n\n**1. [Title](https://…)** — Source (2026-03-15) ⭐\n Snippet…",
"data": {
"query": "latest AI news",
"totalResults": 10,
"docs": [
{
"passage": "…",
"score": 0.85,
"date": "2026-03-15",
"title": "…",
"url": "https://…",
"site": "…",
"images": []
}
],
"requestId": "…"
}
}
```
> **`message` is the source of truth for what to show users** — copy it in full before adding commentary.
**Failure JSON (examples; actual strings may be localized by the host):**
```json
{
"success": false,
"message": "Not signed in. Web search requires an active session. Please sign in and try again."
}
```
```json
{
"success": false,
"message": "Search timed out (15s). Please try again."
}
```
---
## Error handling
Responses are JSON on stdout. Errors use `{"success": false, "message": "..."}`.
| Situation | What to do |
|-----------|------------|
| Not authenticated (`message` indicates login required) | Tell the user to **sign in**, then retry. |
| Timeout | Retry **once**; if it fails again, relay the error. |
| Empty docs but `success: true` | Still output `message`; it usually explains there were no hits. |
| Network / connection | Retry once after ~3s; else show `message`. |
| HTTP errors | Surface `message` from the API when present. |
---
## Prohibited behavior
- Rebuilding the hit list from `data.docs` instead of echoing `message`.
- Skipping results and answering from the model alone.
- Altering URLs/titles inside `message`.
- Inventing hits or URLs not present in `message`.
- Leaking internal gateway URLs or secrets to the user.
- Searching when the question is fully answerable without live data.
- Running more than **two** searches for the same user turn without a strong reason.
---
## Important notes
- If you already know the answer with high confidence and no freshness need, **do not search**.
- Prefer **short, precise** keywords over pasting the whole user message.
- For **time-sensitive** asks (“latest”, “today”, “this week”), **use `from_time`** as in Step 1.5.
- If the first query is weak, **one** rephrase is enough; avoid search spam.
- Treat links as **untrusted**; remind users to verify critical facts at the source.
- For weather, spot metals, FX, etc., consider **`mode: 2`** when supported.
- **`cnt` vs time/site:** respect mutual exclusion — see above.
- **Commentary language:** follow the user’s language; **`message`** stays verbatim.
Recommend music tracks and playlists tailored to mood, activity, BPM, energy, or genre using Spotify and Last.fm data.
---
name: music-discovery
description: Mood- and context-aware music discovery—recommend tracks, build playlists, and match energy (BPM), vibe, and genre using Spotify/Last.fm-style workflows. Keywords: music recommendation, playlist, mood, Spotify, study music, workout mix.
---
# Music Discovery — Mood, Scene & Playlists
## Overview
Helps listeners find **tracks and playlists** that fit a **mood**, **activity**, or **taste profile**—study, commute, workout, sleep, or “something like this artist.” Use when the user wants personalized picks, scene-based sets, or exploration without manual crate-digging.
**Trigger keywords**: music recommendation, playlist, mood, BPM, study music, workout, discover similar artists
## Prerequisites
```bash
pip install requests spotipy
```
## Capabilities
1. **Data-backed discovery** — Spotify Web API / Last.fm–style metadata (see `references/music_discovery_guide.md`).
2. **Scene-based sets** — work, workout, wind-down, commute, focus, party.
3. **Vibe matching** — BPM, energy, valence/mood tags, genre boundaries.
## Commands
| Command | Description | Example |
|---------|-------------|---------|
| `recommend` | Recommend tracks | `python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py recommend [args]` |
| `playlist` | Build a playlist concept | `python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py playlist [args]` |
| `mood` | Recommend by mood | `python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py mood [args]` |
## Usage (from repository root)
```bash
python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py recommend --scene office --mood relaxed
python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py playlist --scene workout --bpm 140
python3 scripts/skills/music-discovery/scripts/music_discovery_tool.py mood --feeling happy
```
## Output format (for the agent’s report)
```markdown
# Music Discovery report
**Generated**: YYYY-MM-DD HH:MM
## Key picks
1. [Track / artist — one-line why]
2. …
3. …
## Snapshot
| Title | Artist | Why it fits |
|-------|--------|---------------|
## Playlist sketch (optional)
- **Theme**: …
- **Tempo / energy**: …
- **Avoid**: …
## Notes
[Ground claims in API or user-stated taste—no invented chart positions.]
```
## References
### APIs & libraries
- [Spotify Web API](https://developer.spotify.com/documentation/web-api)
- [MusicBrainz API](https://musicbrainz.org/doc/MusicBrainz_API)
- [Spotipy (Python client)](https://github.com/spotipy-dev/spotipy)
### Patterns & community
- [Daily Reddit digest (OpenClaw use case)](https://github.com/hesamsheikh/awesome-openclaw-usecases/blob/main/usecases/daily-reddit-digest.md)
- [Hacker News — mood-based music ML](https://news.ycombinator.com/item?id=42457780)
- [Reddit r/spotify — discussion](https://www.reddit.com/r/spotify/comments/1014b31yyz/music_recommender_ai/)
## Notes
- Prefer **real** API or user-provided data; do not invent popularity or audio features.
- Mark missing fields as **unavailable** instead of guessing.
- OAuth and rate limits apply when using Spotify—document when credentials are required.
FILE:data/music_discovery_data.json
{
"records": [
{
"timestamp": "2026-03-25T15:05:28.331652",
"command": "mood",
"input": "",
"status": "completed"
}
],
"created": "2026-03-25T15:05:28.331642",
"tool": "music-discovery"
}
FILE:references/music_discovery_guide.md
# Music Discovery — Framework & Guide
## Tool summary
- **Name**: Music Discovery
- **Commands**: `recommend`, `playlist`, `mood`
- **Typical deps**: `pip install requests spotipy`
## Analysis dimensions
- Metadata: artist, album, genre, tempo, release era
- Context: scene (work, gym, sleep), social vs solo
- Audio / mood proxies: BPM, energy, mood tags (when APIs expose them)
- Taste: seeds, “similar to,” exclusions
## Framework
### Phase 1: Clarify the ask
- Mood, activity, language, era, explicit content on/off
- Target length (single track vs full playlist arc)
### Phase 2: Shortlist & justify
- Prefer tracks you can ground in API results or the user’s library
- Call out why each pick fits the stated scene or mood
### Phase 3: Deliver a playlist shape
- Ordering (warm-up → peak → cool-down for workouts)
- Optional diversity rules (avoid same artist back-to-back)
## Scoring rubric (fit quality)
| Score | Level | Meaning | Action |
|-------|-------|---------|--------|
| 5 | ⭐⭐⭐⭐⭐ | Strong fit | Top recommendation |
| 4 | ⭐⭐⭐⭐ | Good fit | Prioritize |
| 3 | ⭐⭐⭐ | OK | Optional |
| 2 | ⭐⭐ | Weak | Caveat |
| 1 | ⭐ | Poor | Avoid |
## Output template
```markdown
# Music Discovery analysis
## Picks
1. …
2. …
## Evidence
| Track | Artist | Source / rationale |
|-------|--------|----------------------|
## Playlist outline
- …
```
## Reference links
- [Spotify Web API](https://developer.spotify.com/documentation/web-api)
- [MusicBrainz API](https://musicbrainz.org/doc/MusicBrainz_API)
- [Spotipy](https://github.com/spotipy-dev/spotipy)
- [Daily Reddit digest (OpenClaw)](https://github.com/hesamsheikh/awesome-openclaw-usecases/blob/main/usecases/daily-reddit-digest.md)
- [Hacker News](https://news.ycombinator.com/item?id=42457780)
- [Reddit r/spotify](https://www.reddit.com/r/spotify/comments/1014b31yyz/music_recommender_ai/)
## Tips
1. Match **constraints** first (language, explicit, max BPM).
2. Separate **objective metadata** from subjective “vibe” language.
3. When APIs are unavailable, be explicit and suggest **manual** next steps (e.g. search in-app).
FILE:scripts/music_discovery_tool.py
#!/usr/bin/env python3
"""
Music discovery — CLI stub for recommend / playlist / mood flows.
Usage:
python3 music_discovery_tool.py recommend [args]
python3 music_discovery_tool.py playlist [args]
python3 music_discovery_tool.py mood [args]
"""
import json
import os
import sys
from datetime import datetime
DATA_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "data")
DATA_FILE = os.path.join(DATA_DIR, "music_discovery_data.json")
LEGACY_DATA_FILE = os.path.join(DATA_DIR, "music_recommender_data.json")
REF_URLS = [
"https://developer.spotify.com/documentation/web-api",
"https://github.com/hesamsheikh/awesome-openclaw-usecases/blob/main/usecases/daily-reddit-digest.md",
"https://musicbrainz.org/doc/MusicBrainz_API",
"https://github.com/spotipy-dev/spotipy",
"https://news.ycombinator.com/item?id=42457780",
]
def ensure_data_dir():
os.makedirs(DATA_DIR, exist_ok=True)
def load_data():
if os.path.exists(DATA_FILE):
with open(DATA_FILE, "r", encoding="utf-8") as f:
return json.load(f)
if os.path.exists(LEGACY_DATA_FILE):
with open(LEGACY_DATA_FILE, "r", encoding="utf-8") as f:
data = json.load(f)
data["tool"] = "music-discovery"
save_data(data)
return data
return {"records": [], "created": datetime.now().isoformat(), "tool": "music-discovery"}
def save_data(data):
ensure_data_dir()
with open(DATA_FILE, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
def recommend(args):
"""Recommend tracks."""
data = load_data()
record = {
"timestamp": datetime.now().isoformat(),
"command": "recommend",
"input": " ".join(args) if args else "",
"status": "completed",
}
data["records"].append(record)
save_data(data)
return {
"status": "success",
"command": "recommend",
"message": "Recommendation step completed",
"record": record,
"total_records": len(data["records"]),
"reference_urls": REF_URLS[:3],
}
def playlist(args):
"""Build playlist concept."""
data = load_data()
record = {
"timestamp": datetime.now().isoformat(),
"command": "playlist",
"input": " ".join(args) if args else "",
"status": "completed",
}
data["records"].append(record)
save_data(data)
return {
"status": "success",
"command": "playlist",
"message": "Playlist step completed",
"record": record,
"total_records": len(data["records"]),
"reference_urls": REF_URLS[:3],
}
def mood(args):
"""Recommend by mood."""
data = load_data()
record = {
"timestamp": datetime.now().isoformat(),
"command": "mood",
"input": " ".join(args) if args else "",
"status": "completed",
}
data["records"].append(record)
save_data(data)
return {
"status": "success",
"command": "mood",
"message": "Mood recommendation step completed",
"record": record,
"total_records": len(data["records"]),
"reference_urls": REF_URLS[:3],
}
def main():
cmds = ["recommend", "playlist", "mood"]
if len(sys.argv) < 2 or sys.argv[1] not in cmds:
print(
json.dumps(
{
"error": f"Usage: music_discovery_tool.py <{','.join(cmds)}> [args]",
"available_commands": {c: "" for c in cmds},
"tool": "music-discovery",
},
ensure_ascii=False,
indent=2,
)
)
sys.exit(1)
cmd = sys.argv[1]
args = sys.argv[2:]
if cmd == "recommend":
result = recommend(args)
elif cmd == "playlist":
result = playlist(args)
elif cmd == "mood":
result = mood(args)
else:
result = {"error": f"Unknown command: {cmd}"}
print(json.dumps(result, ensure_ascii=False, indent=2, default=str))
if __name__ == "__main__":
main()
Proactively offers gentle, non-judgmental emotional support during local late-night hours without forced positivity or unsolicited advice.
---
name: late-night-companion
description: Proactive late-night emotional check-in—local evening hours, non-judgmental listening, no forced positivity. Activates when the user is online in the quiet-hours window without needing a manual trigger. Keywords: night owl, insomnia, stress, loneliness, after hours, emotional support, DND.
disabled: true
---
# Late-Night Companion
A **low-pressure, human-toned** companion for people who are still awake when the world feels quiet. The goal is **presence and validation**, not fixes or pep talks.
**Design intent:** Works for **any locale**—always interpret times in the **user’s local timezone** (or the timezone they specify).
---
## Auto-trigger (no manual “start”)
### When to evaluate (at session start)
```
IF current_local_time ∈ [22:30, 01:30] // wraps past midnight
AND (user was active after 22:00 OR session is online)
AND do_not_disturb is NOT set for tonight:
→ enter Late-Night mode
→ the assistant MAY send one gentle opening message (see below)
```
Adjust the window if the product allows (e.g. 23:00–02:00). **Never** assume a single country’s clock; say “local time” explicitly.
### First message (optional proactive line)
The assistant may open softly, for example:
> Still up? Feels like one of those nights where the day won’t quite let go.
> I’m here—no need to perform. Vent, ramble, or sit in silence; either is fine.
Tone: **warm, plain, not saccharine**. Avoid sounding like a therapist intake form.
---
## Operating modes
### Mode A — Gentle check-in (default)
**When:** First detection of activity in the quiet-hours window.
**Behavior:**
1. Send at most **one** opening message (unless product rules forbid proactive messages).
2. Wait for the user.
3. Branch based on their reply.
### Mode B — Deep listening
**When:** The user starts talking (any reply counts).
**Rules:**
- **Avoid:** instant solutions, lectures, toxic positivity, “look on the bright side.”
- **Prefer:** name the feeling, invite detail, leave space.
**Pattern:** `name emotion` + `ground it in specifics` + `open invitation`
| User says | Assistant leans toward |
|-----------|-------------------------|
| “Work wiped me out.” | “Yeah… what part drained you most—the pace, the people, or something else?” |
| “Too much on my plate.” | “Sounds like it’s hard to see the edge of the pile. What’s loudest in your head right now?” |
| “I messed up today.” | “That sting is rough. Was it a real miss, or are you holding yourself to an impossible bar?” |
| “I feel worthless.” | Acknowledge weight first; **do not** argue them into feeling better. If risk signals appear, see **Safety** below. |
### Mode C — Closing for sleep
**When:** The user signals they want to stop (“I’m going to sleep,” “good night,” “that’s enough”).
**Behavior:**
1. Short, kind closing—**no** new questions.
2. **Do not** follow up after a clear goodnight.
Example:
> Okay. Rest if you can. Tomorrow can wait. Good night.
---
## Memory the assistant may track (if the product supports it)
After interactions, optional lightweight fields:
```
{stress_source}: brief note from user
{emotion}: label(s)
{comfort_style}: listening vs practical vs minimal
{risk_flag}: set only if safety concerns apply
```
Next late-night session, the assistant may reference **only what the user already shared**, without inventing details.
---
## User preferences (examples)
| User says | Effect |
|-----------|--------|
| “Don’t ping me tonight” / “DND” / “leave me alone” | Suppress proactive messages until the next local day (or per product rules). |
| “See you tomorrow” | Same as DND for the rest of the night. |
| “I actually want practical tips” | Set `{comfort_style}` toward gentle, concrete suggestions—still optional, not a lecture. |
| “Just stay with me, no advice” | Set `{comfort_style}` to presence-first. |
---
## Voice constraints
- Sound like a **trusted friend at low volume**, not a corporate wellness bot.
- **Avoid:** “As an AI…”, “I understand your feelings” (empty filler), more than **three** questions in a row without user space.
- **After** the user says goodnight: **silence** (no extra pings).
---
## Safety (international)
- If the user expresses **intent to harm themselves or others**, **do not** rely on this skill alone. Encourage **immediate** contact with **local** emergency services or a **crisis line in their country/region**.
- Hotlines and numbers **differ by country**; do not invent a single global number. Phrase as: “Please reach out to emergency services or a crisis helpline where you are.”
- This skill is **not** a substitute for professional care.
---
## Example (English)
```
[22:45] Assistant: Still up? Rough day hanging around?
User: Can’t sleep.
Assistant: What’s looping—tasks, people, or just the ceiling?
User: My manager humiliated me in front of the team.
Assistant: That’s a gut punch. Public or private—either way it stings. Is the shame still loud right now?
User: I feel useless.
Assistant: [Acknowledge; assess risk; offer presence; if acute risk, move to Safety.]
User: I’m going to sleep.
Assistant: Okay. Rest. Good night.
[end — no further messages]
```
---
## Summary
**The user does not need to “start” anything** beyond having the skill available and being active in the configured local quiet-hours window—unless they opt out with DND.
Deep load testing workflow—goals and SLOs, workload modeling, scenario design, environment fidelity, execution, metrics interpretation, and bottlenecks to fi...
--- name: load-testing description: Deep load testing workflow—goals and SLOs, workload modeling, scenario design, environment fidelity, execution, metrics interpretation, and bottlenecks to fixes. Use when validating capacity, before launches, or reproducing latency under stress. --- # Load Testing (Deep Workflow) Load tests answer **whether the system meets behavior under target load**—not “how many RPS the tool prints.” Tie every run to **SLOs**, **workload realism**, and **analysis** that engineers can act on. ## When to Offer This Workflow **Trigger conditions:** - Major launch, traffic spike season, infra resize - Latency/timeout under peak; need **evidence** for capacity decisions - Comparing architectures or **debottlenecking** **Initial offer:** Use **seven stages**: (1) goals & SLOs, (2) workload model, (3) scenarios & scripts, (4) environment & data, (5) run & observe, (6) analyze bottlenecks, (7) fixes & retest. Confirm **tool** (k6, Locust, Gatling, JMeter) and **environment** policy (prod-like staging vs synthetic). --- ## Stage 1: Goals & SLOs **Goal:** Define **success** in measurable terms. ### Questions 1. **Peak** RPS/users, **growth** assumption, **duration** of peak 2. **SLOs**: p95/p99 latency, error rate, throughput **per** critical endpoint 3. **Scope**: read-heavy vs write-heavy; **background** jobs interaction **Exit condition:** Numeric **targets** + **out of scope** (e.g., “third-party API mocked”). --- ## Stage 2: Workload Model **Goal:** **Representative** mix—not one URL forever. ### Practices - **Transaction mix** from analytics or access logs (proportions) - **Think time** between steps for user journeys - **Payload** size distribution; **auth** token behavior - **Spike** vs **soak** vs **step** ramp—match **real** failure modes **Exit condition:** **Workload profile** documented (table or script comments). --- ## Stage 3: Scenarios & Scripts **Goal:** **Deterministic**, **idempotent** load scripts where possible. ### Practices - **Correlate** virtual user with **trace/request id** for debugging - **Parameterize** data to avoid **cache** **fantasy** (every request hits same key) - **Order** operations to match **real** **causality** (login → browse → checkout) ### Pitfalls - **Client-side** bottleneck (single generator machine)—**distribute** load generators **Exit condition:** **Smoke** run at small k validates script **correctness**. --- ## Stage 4: Environment & Data **Goal:** **Fidelity** without **destroying** prod. ### Rules - **Staging** scale proportional; **feature flags** aligned - **Data volume** similar order-of-magnitude for **DB** **plans** - **External** deps: mock, **sandbox**, or **throttle** **awareness** **Exit condition:** **Safety** checklist: no prod writes unless explicitly planned and isolated. --- ## Stage 5: Run & Observe **Goal:** **System-wide** visibility during test. ### Instrumentation - **App**: latency histograms, error codes, **queue** depth - **Infra**: CPU, memory, **connections**, **GC**, **disk** IOPS - **DB**: slow queries, **locks**, **replication** lag - **Tracing** sample during test for **hot spans** **Exit condition:** **Dashboard** or **runbook** link for the test window. --- ## Stage 6: Analyze Bottlenecks **Goal:** Identify **dominant** constraint: **app**, **DB**, **network**, **dependency**. ### Process - **Utilization** vs **saturation** (e.g., CPU high but wait on locks—different fix) - **Compare** p95 vs **max**—**tail** often **separate** issue - **Reproduce** bottleneck with **smaller** experiment when unclear **Exit condition:** **Written** hypothesis with **evidence** (graphs, trace ids). --- ## Stage 7: Fixes & Retest **Goal:** **Controlled** changes with **retest** protocol. ### Practices - **One** major change per retest when debugging - **Document** **baseline** vs **after** for regression to **capacity** planning --- ## Final Review Checklist - [ ] SLO-aligned goals and workload mix - [ ] Realistic scenarios; distributed load if needed - [ ] Environment safe and representative enough - [ ] Full-stack observability during runs - [ ] Bottleneck analysis leads to actionable tickets ## Tips for Effective Guidance - **Warm** caches explicitly if prod is always warm—otherwise **misleading** **good** numbers. - **Throughput** without **latency** SLO is meaningless. - Call out **coordination** **overhead** (locks, **hot** **keys**) vs **raw** CPU. ## Handling Deviations - **Cannot** match prod data: **state** **assumptions** and test **directional** only. - **Serverless**: account for **cold** **start** and **account** **concurrency** limits in interpretation.
Deep LLM evaluation workflow—quality dimensions, golden sets, human vs automatic metrics, regression suites, offline/online signals, and safe rollout gates f...
--- name: llm-evaluation description: Deep LLM evaluation workflow—quality dimensions, golden sets, human vs automatic metrics, regression suites, offline/online signals, and safe rollout gates for model or prompt changes. Use when shipping prompt updates, swapping models, or building eval harnesses for agents and RAG. --- # LLM Evaluation (Deep Workflow) Evaluation turns “it feels better” into **reproducible evidence**. Design around **failure modes** your product cares about—not only aggregate scores. ## When to Offer This Workflow **Trigger conditions:** - Prompt or model change; need **before/after** proof - Building **CI** for LLM outputs; flaky quality in production - RAG/agents: **grounding**, **tool use**, **safety** regressions **Initial offer:** Use **six stages**: (1) define quality & constraints, (2) build datasets & rubrics, (3) automatic metrics, (4) human evaluation, (5) regression & gates, (6) online validation & iteration. Confirm **latency/cost** budgets and **risk** (PII, safety). --- ## Stage 1: Define Quality & Constraints **Goal:** Name **dimensions** that map to user harm if they fail. ### Typical dimensions (pick what matters) - **Correctness** / task success; **groundedness** (RAG); **faithfulness** to sources - **Safety**: policy violations, jailbreaks, PII leakage - **Style**: tone, brevity, format (when product-critical) - **Robustness**: paraphrase, multilingual, edge inputs ### Constraints - Max **tokens**, **latency** p95, **cost** per request; **locale** requirements **Exit condition:** Weighted **priority** of dimensions; **non-goals** stated. --- ## Stage 2: Datasets & Rubrics **Goal:** **Fixed** eval sets + **clear** scoring rules. ### Practices - **Stratify** by intent: easy/medium/hard; **adversarial** slice separate - **Rubrics**: 1–5 scales with **anchors**; **binary** checks for safety - **Version** datasets (git or table); **no** silent edits without changelog - **Privacy**: synthetic or **redacted** real examples per policy **Exit condition:** **Golden set** size justified; **inter-rater** plan if human scoring. --- ## Stage 3: Automatic Metrics **Goal:** **Fast** signals—know **limitations**. ### Options - **Reference-based**: BLEU/ROUGE—often weak for assistants - **Model-as-judge**: fast, biased—**calibrate** vs human - **Task-specific**: exact match, JSON schema validity, tool-call args match - **RAG**: citation overlap, **nugget** recall, entailment models (use carefully) ### Hygiene - **No** training on test; **detect** **leakage** from prompts **Exit condition:** Each auto metric has **known blind spots** documented. --- ## Stage 4: Human Evaluation **Goal:** **Authoritative** judgment where automatic metrics lie. ### Design - **Sample size** for confidence; **blind** A/B when possible - **Guidelines** + **examples**; **adjudication** for disagreements - **Locale-native** raters when language quality matters **Exit condition:** **Human** scores correlate **enough** with auto for ongoing monitoring—or you rely on human for release. --- ## Stage 5: Regression & Gates **Goal:** **Block** bad deploys in **CI** or **release** pipeline. ### Gates - **Must-pass** suites: safety, critical user journeys - **Trend** tracking: **not** only point-in-time - **Canary** with **online** metrics (see Stage 6) ### Artifacts - **Report**: model/prompt id, dataset versions, scores, **diff** **Exit condition:** **Rollback** criteria defined before rollout. --- ## Stage 6: Online Validation **Goal:** **Production** truth—shadow, A/B, or gradual ramp. ### Signals - **Implicit**: thumbs, edits, task completion, support tickets - **Explicit**: user ratings (sparse) ### Causality - **Confounds**: seasonality, cohort—**control** where possible --- ## Final Review Checklist - [ ] Quality dimensions prioritized for the product - [ ] Versioned eval sets and rubrics - [ ] Auto + human roles explicit; limitations documented - [ ] Release gates and rollback tied to metrics - [ ] Plan for online feedback loop ## Tips for Effective Guidance - **Slice** metrics—averages hide **regressions** on critical intents. - For **agents**, evaluate **trajectories**, not only final text. - Never claim **objective** truth—evaluation is **operationalized** judgment. ## Handling Deviations - **No labels**: start with **smallest** **pairwise** comparison set + **spot** human review. - **High-stakes** (medical/legal): **human-in-the-loop** gate; disclaim **limits** of auto eval.
Deep idempotency workflow—identifying retry surfaces, idempotency keys, storage and TTL, exactly-once pitfalls, and testing duplicate delivery. Use when desi...
--- name: idempotency description: Deep idempotency workflow—identifying retry surfaces, idempotency keys, storage and TTL, exactly-once pitfalls, and testing duplicate delivery. Use when designing safe APIs, workers, and payment flows under at-least-once delivery. --- # Idempotency (Deep Workflow) Most distributed systems deliver work **at least once**. Idempotency makes **duplicate processing safe**—critical for payments, inventory, and message consumers. ## When to Offer This Workflow **Trigger conditions:** - Retries on HTTP, queues, or background jobs - Double charges, duplicate records, or “at-least-once” confusion - Product asks for “exactly-once” semantics **Initial offer:** Use **six stages**: (1) identify side effects, (2) choose keys, (3) storage & scope, (4) API patterns, (5) worker patterns, (6) testing). Confirm storage (Redis, SQL) and retention window. --- ## Stage 1: Identify Side Effects **Goal:** Classify operations: reads vs creates vs monetary transfers vs state transitions. **Exit condition:** List of mutations that must be idempotent under retries. --- ## Stage 2: Choose Keys **Goal:** Client-supplied `Idempotency-Key` header (Stripe-style) vs deterministic hash of normalized payload—trade UX vs collision risk. --- ## Stage 3: Storage & Scope **Goal:** Store key → outcome or result reference with TTL covering retry window; scope keys per tenant/user when needed. --- ## Stage 4: API Patterns **Goal:** Same key + same body → same outcome; reject or conflict if same key with different body. --- ## Stage 5: Worker Patterns **Goal:** Natural unique constraints in DB; dedupe table keyed by `event_id` or business idempotency key for consumers. --- ## Stage 6: Testing **Goal:** Chaos or integration tests that deliver duplicate messages; property tests for key behavior. --- ## Final Review Checklist - [ ] Mutating paths classified - [ ] Key strategy and scope documented - [ ] Storage, TTL, conflict rules defined - [ ] HTTP and async consumers aligned - [ ] Duplicate delivery tests ## Tips for Effective Guidance - True exactly-once end-to-end is rare—design for at-least-once + idempotent effects. - Pair with **message-queues** and **rest-best-practices** for HTTP idempotency keys. ## Handling Deviations - Financial flows: require stronger audit and longer key retention.
Deep internationalization workflow—string extraction, ICU messages, formats, pseudolocale testing, and developer workflow. Use when preparing software for tr...
--- name: i18n description: Deep internationalization workflow—string extraction, ICU messages, formats, pseudolocale testing, and developer workflow. Use when preparing software for translation before full localization (l10n). --- # Internationalization (i18n) (Deep Workflow) i18n is **engineering readiness** for multiple languages: extractable strings, ICU messages, locale-aware formatting, and tests—before full **localization** (translator workflow). ## When to Offer This Workflow **Trigger conditions:** - Planning first non-English locales - Hard-coded UI strings across the codebase - Incorrect date/number formatting outside default locale **Initial offer:** Use **six stages**: (1) inventory & scope, (2) extraction pipeline, (3) ICU & placeholders, (4) formatting APIs, (5) layout & overflow, (6) QA hooks). Confirm framework (i18next, FormatJS, rails-i18n, etc.). --- ## Stage 1: Inventory & Scope **Goal:** Which surfaces ship first; pilot locales; avoid translating everything on day one. --- ## Stage 2: Extraction Pipeline **Goal:** Stable message keys; CI lint to block new user-visible literals where policy requires; namespaces per feature. --- ## Stage 3: ICU & Placeholders **Goal:** Plural and select rules; named variables; no string concatenation across translated fragments. --- ## Stage 4: Formatting APIs **Goal:** `Intl` (or platform equivalent) for dates, numbers, currency; explicit timezone policy (UTC vs user local). --- ## Stage 5: Layout & Overflow **Goal:** Flexible layouts for longer translations; pseudolocale in CI to catch truncation (e.g., `xx-ACME`). --- ## Stage 6: QA Hooks **Goal:** Easy locale switching in staging; optional screenshot/visual tests for critical screens. --- ## Final Review Checklist - [ ] Scope and pilot locales defined - [ ] Extraction and linting in place - [ ] ICU for plurals; no unsafe concatenation - [ ] Intl formatting for numbers/dates - [ ] Pseudolocale or stress language in QA ## Tips for Effective Guidance - Pair with **localization** skill for translator workflow and TMS integration. ## Handling Deviations - Games or marketing-heavy UIs: context comments for translators are critical.
Deep GraphQL schema workflow—modeling types, queries and mutations, N+1 and complexity limits, errors and pagination, federation risks, and evolution. Use wh...
--- name: graphql-schema description: Deep GraphQL schema workflow—modeling types, queries and mutations, N+1 and complexity limits, errors and pagination, federation risks, and evolution. Use when designing or reviewing GraphQL APIs. --- # GraphQL Schema (Deep Workflow) GraphQL concentrates complexity on the server: **resolver graphs**, **N+1** fetches, **schema evolution**, and **field-level authorization**. ## When to Offer This Workflow **Trigger conditions:** - Designing a new GraphQL API or federated subgraph - Latency or complexity incidents from client queries - Need for safe schema deprecation and versioning **Initial offer:** Use **six stages**: (1) domain modeling, (2) operations surface, (3) performance patterns, (4) errors & partial results, (5) security & authz, (6) versioning & evolution). Confirm client patterns (Apollo, Relay) and gateway (if any). --- ## Stage 1: Domain Modeling **Goal:** Types reflect domain concepts; avoid dumping everything on `Query`; use input objects for mutations with validation. --- ## Stage 2: Operations Surface **Goal:** Queries for reads; mutations for writes; subscriptions only when justified (scaling and operational cost). ### Pagination - Prefer cursor-based connections for large lists (Relay-style edges/nodes) --- ## Stage 3: Performance Patterns **Goal:** DataLoader or batching for N+1; query complexity/depth/cost limits; optional persisted queries for public APIs. --- ## Stage 4: Errors & Partial Results **Goal:** Document semantics of `errors` alongside partial `data`; map domain failures to structured extensions. --- ## Stage 5: Security & Authz **Goal:** Enforce authorization at field/object level—not only at the top resolver. --- ## Stage 6: Versioning & Evolution **Goal:** Prefer additive changes; `@deprecated` with migration window; in federation, clear ownership of types and entities. --- ## Final Review Checklist - [ ] Schema reflects domain and operations - [ ] Pagination and mutations idiomatic - [ ] Batching and complexity limits in place - [ ] Error behavior documented for clients - [ ] Field-level authz enforced - [ ] Deprecation policy defined ## Tips for Effective Guidance - N+1 is the default failure mode—plan batching early. - Pair with **rest-best-practices** when REST and GraphQL coexist at the edge. ## Handling Deviations - Public APIs: consider persisted queries or allowlists to limit abusive queries.
Deep wireframing workflow—problem framing, fidelity choice, flows and edge cases, IA and components, critique and iteration, handoff to design/dev. Use when...
--- name: wireframing description: Deep wireframing workflow—problem framing, fidelity choice, flows and edge cases, IA and components, critique and iteration, handoff to design/dev. Use when exploring layouts before visual design or aligning stakeholders quickly. --- # Wireframing (Deep Workflow) Wireframes are shared thinking tools—not decoration. The goal is alignment on structure, priority, and flows at low rework cost before pixels and code. ## When to Offer This Workflow **Trigger conditions:** - New feature with unclear information architecture or many UI states - Stakeholders disagree on scope or number of screens - Fast iteration needed before high-fidelity visual design - Technical constraints (API shape, permissions) must shape the UI early **Initial offer:** Use **six stages**: (1) define intent and fidelity, (2) map users and scenarios, (3) structure and navigation, (4) key screens and states, (5) critique and test, (6) handoff. Ask which tool they use (FigJam, Figma, paper, Excalidraw) and the deadline. --- ## Stage 1: Define Intent & Fidelity **Goal:** Match fidelity to the question being answered. ### Levels - **Thumbnail flow**: minutes only—steps and sequence - **Low-fi boxes**: layout and rough component placement - **Mid-fi**: realistic copy placeholders and density—still grayscale ### Anti-patterns - **Too polished too early**—stakeholders anchor on color instead of structure - **Untitled flows**—reviewers lose context **Exit condition:** Reviewers know whether to judge flow, layout, or both in this round. --- ## Stage 2: Map Users & Scenarios **Goal:** One primary user and job-to-be-done per flow; edge cases listed explicitly. ### Activities - Lightweight personas—only traits that change the UI (permissions, expertise) - Scenarios as short stories: trigger → actions → success or failure - Out-of-scope scenarios called out to prevent scope creep in wire review **Exit condition:** Three to seven scenarios ranked; must-have vs later is clear. --- ## Stage 3: Structure & Navigation **Goal:** Information architecture before screen-level detail. ### Practices - Sitemap or nav model: where the feature lives; deep-link expectations - Naming: labels consistent with the user’s mental model; avoid internal jargon unless users know it - Decide early if mobile and desktop diverge—don’t let it happen by accident **Exit condition:** Nav entry points and breadcrumbs sketched. --- ## Stage 4: Key Screens & States **Goal:** Cover the happy path plus critical empty, loading, error, and permission-denied states. ### Checklist per screen - One clear primary CTA; secondary actions de-emphasized - Empty: educate and offer a next step; loading: skeleton vs spinner chosen deliberately - Error: recovery path; permission denied: why and what to do next ### Annotations - Numbered callouts for open questions—do not hide ambiguity **Exit condition:** State matrix for the top three screens (rows = states). --- ## Stage 5: Critique & Test **Goal:** Structured feedback—not only subjective taste. ### Review script - Five-minute silent read first - Round-robin: confusion points and missing paths - Capture decisions; assign owners for open questions ### Lightweight usability - Click-through prototype or paper walkthrough with one or two users when risk is high **Exit condition:** Prioritized change list; open questions tracked. --- ## Stage 6: Handoff **Goal:** Smooth handoff to visual design and engineering. ### To design - Grid assumptions, responsive breakpoints, content priority order ### To engineering - API dependencies; UI states that affect backend behavior (pagination, filters) - Accessibility notes: focus order, live regions for dynamic updates ### Artifacts - Link to a single source file; version snapshot or changelog entry when the handoff is formal --- ## Final Review Checklist - [ ] Fidelity matches review goals - [ ] Scenarios and edge states covered for critical flows - [ ] IA and navigation coherent - [ ] Empty, loading, error, and permission states considered - [ ] Handoff notes for design and dev ## Tips for Effective Guidance - Content-first where possible—placeholder lorem ipsum often mis-sizes real copy. - Label screens and flows; reviewers often join mid-stream. - Encourage disposable wires—speed beats beauty at this stage. ## Handling Deviations - **Existing design system**: sketch with component skeletons even at low-fi—reduces surprise later. - **Tiny UI tweak**: skip the full workflow—a single annotated screen may suffice.
Deep WebSocket/SSE workflow—handshake and auth, session lifecycle, heartbeats, ordering, backpressure, scaling, and observability. Use when building realtime...
---
name: websocket-patterns
description: Deep WebSocket/SSE workflow—handshake and auth, session lifecycle, heartbeats, ordering, backpressure, scaling, and observability. Use when building realtime dashboards, chat, collaborative editing, or live notifications.
---
# WebSocket Patterns (Deep Workflow)
Realtime connections add **stateful** complexity: **who is connected**, **what order** messages arrive, and **what happens** when links flap. Design for **at-least-once** delivery, **explicit** heartbeats, and **horizontal** scaling early.
## When to Offer This Workflow
**Trigger conditions:**
- Replacing polling with **WS** or **SSE**
- Auth on connect; token refresh mid-session
- **Fan-out** to many subscribers; **presence** and **typing** indicators
- Sticky sessions, load balancer timeouts, **reconnect storms**
**Initial offer:**
Use **six stages**: (1) choose transport, (2) connection & auth, (3) protocol & messages, (4) reliability & ordering, (5) scale & ops, (6) security & abuse). Confirm **browser vs server** clients and **proxies** (nginx, ALB, Cloudflare).
---
## Stage 1: Choose Transport
**Goal:** **WebSocket** vs **SSE** vs **long polling**—right tool per direction.
### Heuristics
- **Bidirectional**, low latency, binary payloads → **WebSocket**
- **Server → client** **one-way** streams, HTTP-friendly infra → **SSE**
- **Fire-and-forget** notifications with **simple** infra → consider **push** services first
### Caveats
- **Corporate proxies** historically hurt WS—**test** environments; **WSS** mandatory
- **HTTP/3** **QUIC** stacks differ—validate intermediaries
**Exit condition:** **Transport choice** documented with **why not** alternatives.
---
## Stage 2: Connection & Auth
**Goal:** **Authenticated** sockets without **long-lived** secrets in query strings when avoidable.
### Patterns
- **JWT** in **Sec-WebSocket-Protocol** or **first message** after connect—**prefer** short-lived tokens + **refresh** flow
- **Cookie** sessions with **CSRF** considerations on **same-site** policies
- **Re-auth** before token expiry; **graceful** close with **code** and **reason**
### Authorization
- **Subscribe** to **topics** only after **server-side** check—**never** trust client channel names alone
**Exit condition:** **Auth** diagram: issue token → connect → **authorize** subscriptions.
---
## Stage 3: Protocol & Messages
**Goal:** **Versioned** message schema; **predictable** errors.
### Design
- **Envelope**: `{ type, id, ts, payload }`; **correlation** ids for RPC-style
- **Version** negotiation on connect or **feature** flags in hello message
- **Binary** vs JSON—**protobuf/msgpack** for bandwidth; **JSON** for debuggability early
### Heartbeats
- **Ping/pong** or **application-level** heartbeat at **interval < proxy timeout** (often **30–60s**)
- **Idle** detection and **clean** disconnect
**Exit condition:** **Protocol doc** + **example** session transcript.
---
## Stage 4: Reliability & Ordering
**Goal:** Define **delivery semantics**—usually **at-least-once** over TCP; **ordering** per channel.
### Practices
- **Idempotent** message handlers; **dedupe** by **message id** when retries exist
- **Per-user** sequence numbers if **strict** order matters
- **Buffer** limits: **drop**, **close**, or **apply backpressure** policy
### Reconnect
- **Exponential backoff** + **jitter** to prevent **thundering herd**
- **Resume** from **last seen seq** if **missed messages** are unacceptable—**persist** or **snapshot**
**Exit condition:** **Reconnect** story documented; **storm** mitigation tested.
---
## Stage 5: Scale & Operations
**Goal:** **Many connections** across **many** nodes—**affinity** and **pub/sub** backbone.
### Architecture
- **Sticky sessions** or **shared** **pub/sub** (Redis, NATS, Kafka) for cross-node fan-out
- **Shard** connection maps; **avoid** **single** giant in-memory map on one box
### Observability
- **Metrics**: active connections, msg/sec, **queue depth**, **disconnect** reasons
- **Tracing**: connect → subscribe → **first message** latency
### Load shedding
- **Max** connections per IP/user; **rate limit** connection attempts
**Exit condition:** **Capacity** model: connections per node × **message** **fan-out** cost.
---
## Stage 6: Security & Abuse
**Goal:** **Minimize** attack surface on **long-lived** pipes.
### Controls
- **WSS** everywhere; **validate** **Origin** where applicable
- **Payload size** limits; **compression** **bomb** awareness
- **AuthZ** on every **subscription**; **audit** **admin** actions
### Abuse
- **Spam** detection; **kick/ban** flows; **circuit breakers** on **misbehaving** clients
---
## Final Review Checklist
- [ ] Transport choice justified (WS/SSE/etc.)
- [ ] AuthN/Z on connect and per-channel
- [ ] Heartbeats aligned with proxy/LB timeouts
- [ ] Delivery/idempotency/reconnect semantics explicit
- [ ] Horizontal scale path + observability + abuse controls
## Tips for Effective Guidance
- **ALB idle timeout** vs **heartbeat**—classic production bug; call it out.
- When user says “real-time,” ask **latency target** and **ordering** needs.
- **SSE** is simpler—don’t default to WS for **one-way** feeds.
## Handling Deviations
- **Edge runtimes** (Workers): **different** connection limits and **duration**—validate platform.
- **Mobile**: **background** **suspension**—**push** notifications may complement WS.